Commit Graph

4629 Commits

Author SHA1 Message Date
Breno Leitao b2f82565f2 powerpc: Wire up io_pgetevents
Wire up io_pgetevents system call on powerpc.

io_pgetevents is a new syscall to read asynchronous I/O events from the
completion queue.

Tested with libaio branch aio-poll[1] and the io_pgetevents test (#22) passed
on both ppc64 LE and BE modes.

[1] https://pagure.io/libaio/branch/aio-poll

CC: Christoph Hellwig <hch@lst.de>
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-23 21:43:21 +10:00
Mark Rutland b3a2a05f91 atomics/treewide: Make conditional inc/dec ops optional
The conditional inc/dec ops differ for atomic_t and atomic64_t:

- atomic_inc_unless_positive() is optional for atomic_t, and doesn't exist for atomic64_t.
- atomic_dec_unless_negative() is optional for atomic_t, and doesn't exist for atomic64_t.
- atomic_dec_if_positive is optional for atomic_t, and is mandatory for atomic64_t.

Let's make these consistently optional for both. At the same time, let's
clean up the existing fallbacks to use atomic_try_cmpxchg().

The instrumented atomics are updated accordingly.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/20180621121321.4761-18-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 14:25:24 +02:00
Mark Rutland 9837559d8e atomics/treewide: Make unconditional inc/dec ops optional
Many of the inc/dec ops are mandatory, but for most architectures inc/dec are
simply trivial wrappers around their corresponding add/sub ops.

Let's make all the inc/dec ops optional, so that we can get rid of these
boilerplate wrappers.

The instrumented atomics are updated accordingly.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Palmer Dabbelt <palmer@sifive.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/20180621121321.4761-17-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 14:25:24 +02:00
Mark Rutland 18cc1814d4 atomics/treewide: Make test ops optional
Some of the atomics return the result of a test applied after the atomic
operation, and almost all architectures implement these as trivial
wrappers around the underlying atomic. Specifically:

 * <atomic>_inc_and_test(v)    is (<atomic>_inc_return(v)    == 0)
 * <atomic>_dec_and_test(v)    is (<atomic>_dec_return(v)    == 0)
 * <atomic>_sub_and_test(i, v) is (<atomic>_sub_return(i, v) == 0)
 * <atomic>_add_negative(i, v) is (<atomic>_add_return(i, v)  < 0)

Rather than have these definitions duplicated in all architectures, with
minor inconsistencies in formatting and documentation, let's make these
operations optional, with default fallbacks as above. Implementations
must now provide a preprocessor symbol.

The instrumented atomics are updated accordingly.

Both x86 and m68k have custom implementations, which are left as-is,
given preprocessor symbols to avoid being overridden.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Palmer Dabbelt <palmer@sifive.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/20180621121321.4761-16-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 14:25:24 +02:00
Mark Rutland 4f44b4b2b3 atomics/powerpc: Define atomic64_fetch_add_unless()
As a step towards unifying the atomic/atomic64/atomic_long APIs, this
patch converts the arch/powerpc implementation of atomic64_add_unless()
into an implementation of atomic64_fetch_add_unless().

A wrapper in <linux/atomic.h> will build atomic_add_unless() atop of
this, provided it is given a preprocessor definition.

No functional change is intended as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/20180621121321.4761-13-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 14:25:24 +02:00
Mark Rutland eccc2da8c0 atomics/treewide: Make atomic_fetch_add_unless() optional
Several architectures these have a near-identical implementation based
on atomic_read() and atomic_cmpxchg() which we can instead define in
<linux/atomic.h>, so let's do so, using something close to the existing
x86 implementation with try_cmpxchg().

Where an architecture provides its own atomic_fetch_add_unless(), it
must define a preprocessor symbol for it. The instrumented atomics are
updated accordingly.

Note that arch/arc's existing atomic_fetch_add_unless() had redundant
barriers, as these are already present in its atomic_cmpxchg()
implementation.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Palmer Dabbelt <palmer@sifive.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineet Gupta <vgupta@synopsys.com>
Link: https://lore.kernel.org/lkml/20180621121321.4761-7-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 14:22:33 +02:00
Mark Rutland bef828204a atomics/treewide: Make atomic64_inc_not_zero() optional
We define a trivial fallback for atomic_inc_not_zero(), but don't do
the same for atomic64_inc_not_zero(), leading most architectures to
define the same boilerplate.

Let's add a fallback in <linux/atomic.h>, and remove the redundant
implementations. Note that atomic64_add_unless() is always defined in
<linux/atomic.h>, and promotes its arguments to the requisite types, so
we need not do this explicitly.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Palmer Dabbelt <palmer@sifive.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/20180621121321.4761-6-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 14:22:33 +02:00
Mark Rutland bfc18e389c atomics/treewide: Rename __atomic_add_unless() => atomic_fetch_add_unless()
While __atomic_add_unless() was originally intended as a building-block
for atomic_add_unless(), it's now used in a number of places around the
kernel. It's the only common atomic operation named __atomic*(), rather
than atomic_*(), and for consistency it would be better named
atomic_fetch_add_unless().

This lack of consistency is slightly confusing, and gets in the way of
scripting atomics. Given that, let's clean things up and promote it to
an official part of the atomics API, in the form of
atomic_fetch_add_unless().

This patch converts definitions and invocations over to the new name,
including the instrumented version, using the following script:

  ----
  git grep -w __atomic_add_unless | while read line; do
  sed -i '{s/\<__atomic_add_unless\>/atomic_fetch_add_unless/}' "${line%%:*}";
  done
  git grep -w __arch_atomic_add_unless | while read line; do
  sed -i '{s/\<__arch_atomic_add_unless\>/arch_atomic_fetch_add_unless/}' "${line%%:*}";
  done
  ----

Note that we do not have atomic{64,_long}_fetch_add_unless(), which will
be introduced by later patches.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Palmer Dabbelt <palmer@sifive.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/20180621121321.4761-2-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 14:22:32 +02:00
Masami Hiramatsu 6e5fd3a298 powerpc/kprobes: Don't call the ->break_handler() in powerpc kprobes code
Don't call the ->break_handler() from the powerpc kprobes code,
because it was only used by jprobes which got removed.

This also removes skip_singlestep() and embeds it in the
caller, kprobe_ftrace_handler(), which simplifies regs->nip
operation around there.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-arch@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/lkml/152942477127.15209.8982613703787878618.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 12:33:15 +02:00
Masami Hiramatsu c530e2f02e powerpc/kprobes: Remove jprobe powerpc implementation
Remove arch dependent setjump/longjump functions
and unused fields in kprobe_ctlblk for jprobes
from arch/powerpc. This also reverts commits
related __is_active_jprobe() function.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-arch@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/lkml/152942445234.15209.12868722778364739753.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 12:33:08 +02:00
Aneesh Kumar K.V fadd03c615 powerpc/mm/hash/4k: Free hugetlb page table caches correctly.
With 4k page size for hugetlb we allocate hugepage directories from its on slab
cache. With patch 0c4d26802 ("powerpc/book3s64/mm: Simplify the rcu callback for page table free")
we missed to free these allocated hugepd tables.

Update pgtable_free to handle hugetlb hugepd directory table.

Fixes: 0c4d268029 ("powerpc/book3s64/mm: Simplify the rcu callback for page table free")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Add CONFIG_HUGETLB_PAGE guard to fix build break]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-20 09:13:25 +10:00
Michael Ellerman e08ecba17b powerpc/64s: Fix build failures with CONFIG_NMI_IPI=n
I broke the build when CONFIG_NMI_IPI=n with my recent commit to add
arch_trigger_cpumask_backtrace(), eg:

  stacktrace.c:(.text+0x1b0): undefined reference to `.smp_send_safe_nmi_ipi'

We should rework the CONFIG symbols here in future to avoid these
double barrelled ifdefs but for now they fix the build.

Fixes: 5cc05910f2 ("powerpc/64s: Wire up arch_trigger_cpumask_backtrace()")
Reported-by: Christophe LEROY <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-19 23:03:50 +10:00
Paolo Bonzini 09027ab73b Merge tag 'kvm-ppc-next-4.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD 2018-06-14 17:42:54 +02:00
Linus Torvalds be779f03d5 Kbuild updates for v4.18 (2nd)
- fix some bugs introduced by the recent Kconfig syntax extension
 
  - add some symbols about compiler information in Kconfig, such as
    CC_IS_GCC, CC_IS_CLANG, GCC_VERSION, etc.
 
  - test compiler capability for the stack protector in Kconfig, and
    clean-up Makefile
 
  - test compiler capability for GCC-plugins in Kconfig, and clean-up
    Makefile
 
  - allow to enable GCC-plugins for COMPILE_TEST
 
  - test compiler capability for KCOV in Kconfig and correct dependency
 
  - remove auto-detect mode of the GCOV format, which is now more nicely
    handled in Kconfig
 
  - test compiler capability for mprofile-kernel on PowerPC, and
    clean-up Makefile
 
  - misc cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJbISvEAAoJED2LAQed4NsGEsoQAKBHMqUM9yQo0LdVMnDMCLQI
 Xsjyqzr0ySp6YiuF+cobwDs49sggt7/8EX+OnrP/sLlAhY0QrNGI1ulhwpFx1Ewa
 xFxz5kF/1jDwC+AjngXcK5Dr9nGSSMfT3wQhLGKjMkKSypbz2QyTrfMOfHGYSzU1
 gD8RMWYXxKoJFmIaqmpLz7PDfWKPzhSOZo7BflPjAGXdlpfSV9cQvu+TkJ12qvSp
 KZ2uHUgLz95NnltSuGtN71X8so7w4eTYAvkJ5bOeOpYsZSVYRq4Exvwe0Y0dbwie
 WDpcRC5KrQOlIFxRUUSGn5cDsaW9yYJJAwMG6Dr8qJ66QlgY5GqOKXxXX+ARa7WU
 7GkeAZ11n5dArjjdSjfClh8CwDiZNpJmAUbahm+feQfUfq9nbs+0JX6bOG5ZE+nt
 3iE0ZoSGDjxD5Pjy4u+NtQM0JCpieuz3JNxqVbAVm0Ua5q8niwSEneixyrNmjkBF
 1tV+qsMYus7AFwdGuDRXzBhVY7hd931H34czA3FUZZqwcClFVoJiygI++s62mVXx
 w9kYi8Ades/W6dt7c7XGjmqYTDgnTolLaYY5vggpEeLOzc1QPW6iKt9tpREi6Zzm
 n+y586YsIo0vjTMfRcfmGZUPG3CJeqL2UDslYmG8PgMQ6/eaAHBDXECLrAkGGPlG
 aIPZcMam5BQxhmSJc19c
 =VABv
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v4.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull more Kbuild updates from Masahiro Yamada:

 - fix some bugs introduced by the recent Kconfig syntax extension

 - add some symbols about compiler information in Kconfig, such as
   CC_IS_GCC, CC_IS_CLANG, GCC_VERSION, etc.

 - test compiler capability for the stack protector in Kconfig, and
   clean-up Makefile

 - test compiler capability for GCC-plugins in Kconfig, and clean-up
   Makefile

 - allow to enable GCC-plugins for COMPILE_TEST

 - test compiler capability for KCOV in Kconfig and correct dependency

 - remove auto-detect mode of the GCOV format, which is now more nicely
   handled in Kconfig

 - test compiler capability for mprofile-kernel on PowerPC, and clean-up
   Makefile

 - misc cleanups

* tag 'kbuild-v4.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  linux/linkage.h: replace VMLINUX_SYMBOL_STR() with __stringify()
  kconfig: fix localmodconfig
  sh: remove no-op macro VMLINUX_SYMBOL()
  powerpc/kbuild: move -mprofile-kernel check to Kconfig
  Documentation: kconfig: add recommended way to describe compiler support
  gcc-plugins: disable GCC_PLUGIN_STRUCTLEAK_BYREF_ALL for COMPILE_TEST
  gcc-plugins: allow to enable GCC_PLUGINS for COMPILE_TEST
  gcc-plugins: test plugin support in Kconfig and clean up Makefile
  gcc-plugins: move GCC version check for PowerPC to Kconfig
  kcov: test compiler capability in Kconfig and correct dependency
  gcov: remove CONFIG_GCOV_FORMAT_AUTODETECT
  arm64: move GCC version check for ARCH_SUPPORTS_INT128 to Kconfig
  kconfig: add CC_IS_CLANG and CLANG_VERSION
  kconfig: add CC_IS_GCC and GCC_VERSION
  stack-protector: test compiler capability in Kconfig and drop AUTO mode
  kbuild: fix endless syncconfig in case arch Makefile sets CROSS_COMPILE
2018-06-13 08:40:34 -07:00
Nicholas Piggin abba759796 powerpc/kbuild: move -mprofile-kernel check to Kconfig
This eliminates the workaround that requires disabling
-mprofile-kernel by default in Kconfig.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-06-11 09:16:29 +09:00
Linus Torvalds d82991a868 Merge branch 'core-rseq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull restartable sequence support from Thomas Gleixner:
 "The restartable sequences syscall (finally):

  After a lot of back and forth discussion and massive delays caused by
  the speculative distraction of maintainers, the core set of
  restartable sequences has finally reached a consensus.

  It comes with the basic non disputed core implementation along with
  support for arm, powerpc and x86 and a full set of selftests

  It was exposed to linux-next earlier this week, so it does not fully
  comply with the merge window requirements, but there is really no
  point to drag it out for yet another cycle"

* 'core-rseq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rseq/selftests: Provide Makefile, scripts, gitignore
  rseq/selftests: Provide parametrized tests
  rseq/selftests: Provide basic percpu ops test
  rseq/selftests: Provide basic test
  rseq/selftests: Provide rseq library
  selftests/lib.mk: Introduce OVERRIDE_TARGETS
  powerpc: Wire up restartable sequences system call
  powerpc: Add syscall detection for restartable sequences
  powerpc: Add support for restartable sequences
  x86: Wire up restartable sequence system call
  x86: Add support for restartable sequences
  arm: Wire up restartable sequences system call
  arm: Add syscall detection for restartable sequences
  arm: Add restartable sequences support
  rseq: Introduce restartable sequences system call
  uapi/headers: Provide types_32_64.h
2018-06-10 10:17:09 -07:00
Laurent Dufour 3010a5ea66 mm: introduce ARCH_HAS_PTE_SPECIAL
Currently the PTE special supports is turned on in per architecture
header files.  Most of the time, it is defined in
arch/*/include/asm/pgtable.h depending or not on some other per
architecture static definition.

This patch introduce a new configuration variable to manage this
directly in the Kconfig files.  It would later replace
__HAVE_ARCH_PTE_SPECIAL.

Here notes for some architecture where the definition of
__HAVE_ARCH_PTE_SPECIAL is not obvious:

arm
 __HAVE_ARCH_PTE_SPECIAL which is currently defined in
arch/arm/include/asm/pgtable-3level.h which is included by
arch/arm/include/asm/pgtable.h when CONFIG_ARM_LPAE is set.
So select ARCH_HAS_PTE_SPECIAL if ARM_LPAE.

powerpc
__HAVE_ARCH_PTE_SPECIAL is defined in 2 files:
 - arch/powerpc/include/asm/book3s/64/pgtable.h
 - arch/powerpc/include/asm/pte-common.h
The first one is included if (PPC_BOOK3S & PPC64) while the second is
included in all the other cases.
So select ARCH_HAS_PTE_SPECIAL all the time.

sparc:
__HAVE_ARCH_PTE_SPECIAL is defined if defined(__sparc__) &&
defined(__arch64__) which are defined through the compiler in
sparc/Makefile if !SPARC32 which I assume to be if SPARC64.
So select ARCH_HAS_PTE_SPECIAL if SPARC64

There is no functional change introduced by this patch.

Link: http://lkml.kernel.org/r/1523433816-14460-2-git-send-email-ldufour@linux.vnet.ibm.com
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Suggested-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <albert@sifive.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Christophe LEROY <christophe.leroy@c-s.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:35 -07:00
Linus Torvalds c90fca951e powerpc updates for 4.18
Notable changes:
 
  - Support for split PMD page table lock on 64-bit Book3S (Power8/9).
 
  - Add support for HAVE_RELIABLE_STACKTRACE, so we properly support live
    patching again.
 
  - Add support for patching barrier_nospec in copy_from_user() and syscall entry.
 
  - A couple of fixes for our data breakpoints on Book3S.
 
  - A series from Nick optimising TLB/mm handling with the Radix MMU.
 
  - Numerous small cleanups to squash sparse/gcc warnings from Mathieu Malaterre.
 
  - Several series optimising various parts of the 32-bit code from Christophe Leroy.
 
  - Removal of support for two old machines, "SBC834xE" and "C2K" ("GEFanuc,C2K"),
    which is why the diffstat has so many deletions.
 
 And many other small improvements & fixes.
 
 There's a few out-of-area changes. Some minor ftrace changes OK'ed by Steve, and
 a fix to our powernv cpuidle driver. Then there's a series touching mm, x86 and
 fs/proc/task_mmu.c, which cleans up some details around pkey support. It was
 ack'ed/reviewed by Ingo & Dave and has been in next for several weeks.
 
 Thanks to:
   Akshay Adiga, Alastair D'Silva, Alexey Kardashevskiy, Al Viro, Andrew
   Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Arnd Bergmann, Balbir Singh,
   Cédric Le Goater, Christophe Leroy, Christophe Lombard, Colin Ian King, Dave
   Hansen, Fabio Estevam, Finn Thain, Frederic Barrat, Gautham R. Shenoy, Haren
   Myneni, Hari Bathini, Ingo Molnar, Jonathan Neuschäfer, Josh Poimboeuf,
   Kamalesh Babulal, Madhavan Srinivasan, Mahesh Salgaonkar, Mark Greer, Mathieu
   Malaterre, Matthew Wilcox, Michael Neuling, Michal Suchanek, Naveen N. Rao,
   Nicholas Piggin, Nicolai Stange, Olof Johansson, Paul Gortmaker, Paul
   Mackerras, Peter Rosin, Pridhiviraj Paidipeddi, Ram Pai, Rashmica Gupta, Ravi
   Bangoria, Russell Currey, Sam Bobroff, Samuel Mendoza-Jonas, Segher
   Boessenkool, Shilpasri G Bhat, Simon Guo, Souptick Joarder, Stewart Smith,
   Thiago Jung Bauermann, Torsten Duwe, Vaibhav Jain, Wei Yongjun, Wolfram Sang,
   Yisheng Xie, YueHaibing.
 -----BEGIN PGP SIGNATURE-----
 
 iQIwBAABCAAaBQJbGQKBExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYBq
 TRAAioK7rz5xYMkxaM3Ng3ybobEeNAwQqOolz98xvmnB9SfDWNuc99vf8cGu0/fQ
 zc8AKZ5RcnwipOjyGlxW9oa1ZhVq0xtYnQPiYLEKMdLQmh5D+C7+KpvAd1UElweg
 ub40/xDySWfMujfuMSF9JDCWPIXyojt4Xg5nJKIVRrAm/3YMe/+i5Am7NWHuMCEb
 aQmZtlYW5Mz81XY0968hjpUO6eKFRmsaM7yFAhGTXx6+oLRpGj1PZB4AwdRIKS2L
 Ak7q/VgxtE4W+s3a0GK2s+eXIhGKeFuX9AVnx3nti+8/K1OqrqhDcLMUC/9JpCpv
 EvOtO7dxPnZujHjdu4Eai/xNoo4h6zRy7bWqve9LoBM40CP5jljKzu1lwqqb5yO0
 jC7/aXhgiSIxxcRJLjoI/TYpZPu40MifrkydmczykdPyPCnMIWEJDcj4KsRL/9Y8
 9SSbJzRNC/SgQNTbUYPZFFi6G0QaMmlcbCb628k8QT+Gn3Xkdf/ZtxzqEyoF4Irq
 46kFBsiSSK4Bu0rVlcUtJQLgdqytWULO6NKEYnD67laxYcgQd8pGFQ8SjZhRZLgU
 q5LA3HIWhoAI4M0wZhOnKXO6JfiQ1UbO8gUJLsWsfF0Fk5KAcdm+4kb4jbI1H4Qk
 Vol9WNRZwEllyaiqScZN9RuVVuH0GPOZeEH1dtWK+uWi0lM=
 =ZlBf
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "Notable changes:

   - Support for split PMD page table lock on 64-bit Book3S (Power8/9).

   - Add support for HAVE_RELIABLE_STACKTRACE, so we properly support
     live patching again.

   - Add support for patching barrier_nospec in copy_from_user() and
     syscall entry.

   - A couple of fixes for our data breakpoints on Book3S.

   - A series from Nick optimising TLB/mm handling with the Radix MMU.

   - Numerous small cleanups to squash sparse/gcc warnings from Mathieu
     Malaterre.

   - Several series optimising various parts of the 32-bit code from
     Christophe Leroy.

   - Removal of support for two old machines, "SBC834xE" and "C2K"
     ("GEFanuc,C2K"), which is why the diffstat has so many deletions.

  And many other small improvements & fixes.

  There's a few out-of-area changes. Some minor ftrace changes OK'ed by
  Steve, and a fix to our powernv cpuidle driver. Then there's a series
  touching mm, x86 and fs/proc/task_mmu.c, which cleans up some details
  around pkey support. It was ack'ed/reviewed by Ingo & Dave and has
  been in next for several weeks.

  Thanks to: Akshay Adiga, Alastair D'Silva, Alexey Kardashevskiy, Al
  Viro, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Arnd
  Bergmann, Balbir Singh, Cédric Le Goater, Christophe Leroy, Christophe
  Lombard, Colin Ian King, Dave Hansen, Fabio Estevam, Finn Thain,
  Frederic Barrat, Gautham R. Shenoy, Haren Myneni, Hari Bathini, Ingo
  Molnar, Jonathan Neuschäfer, Josh Poimboeuf, Kamalesh Babulal,
  Madhavan Srinivasan, Mahesh Salgaonkar, Mark Greer, Mathieu Malaterre,
  Matthew Wilcox, Michael Neuling, Michal Suchanek, Naveen N. Rao,
  Nicholas Piggin, Nicolai Stange, Olof Johansson, Paul Gortmaker, Paul
  Mackerras, Peter Rosin, Pridhiviraj Paidipeddi, Ram Pai, Rashmica
  Gupta, Ravi Bangoria, Russell Currey, Sam Bobroff, Samuel
  Mendoza-Jonas, Segher Boessenkool, Shilpasri G Bhat, Simon Guo,
  Souptick Joarder, Stewart Smith, Thiago Jung Bauermann, Torsten Duwe,
  Vaibhav Jain, Wei Yongjun, Wolfram Sang, Yisheng Xie, YueHaibing"

* tag 'powerpc-4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (251 commits)
  powerpc/64s/radix: Fix missing ptesync in flush_cache_vmap
  cpuidle: powernv: Fix promotion from snooze if next state disabled
  powerpc: fix build failure by disabling attribute-alias warning in pci_32
  ocxl: Fix missing unlock on error in afu_ioctl_enable_p9_wait()
  powerpc-opal: fix spelling mistake "Uniterrupted" -> "Uninterrupted"
  powerpc: fix spelling mistake: "Usupported" -> "Unsupported"
  powerpc/pkeys: Detach execute_only key on !PROT_EXEC
  powerpc/powernv: copy/paste - Mask SO bit in CR
  powerpc: Remove core support for Marvell mv64x60 hostbridges
  powerpc/boot: Remove core support for Marvell mv64x60 hostbridges
  powerpc/boot: Remove support for Marvell mv64x60 i2c controller
  powerpc/boot: Remove support for Marvell MPSC serial controller
  powerpc/embedded6xx: Remove C2K board support
  powerpc/lib: optimise PPC32 memcmp
  powerpc/lib: optimise 32 bits __clear_user()
  powerpc/time: inline arch_vtime_task_switch()
  powerpc/Makefile: set -mcpu=860 flag for the 8xx
  powerpc: Implement csum_ipv6_magic in assembly
  powerpc/32: Optimise __csum_partial()
  powerpc/lib: Adjust .balign inside string functions for PPC32
  ...
2018-06-07 10:23:33 -07:00
Boqun Feng bb862b021d powerpc: Wire up restartable sequences system call
Wire up the rseq system call on powerpc.

This provides an ABI improving the speed of a user-space getcpu
operation on powerpc by skipping the getcpu system call on the fast
path, as well as improving the speed of user-space operations on per-cpu
data compared to using load-reservation/store-conditional atomics.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Watson <davejwatson@fb.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Chris Lameter <cl@linux.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andrew Hunter <ahh@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Maurer <bmaurer@fb.com>
Cc: linux-api@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20180602124408.8430-11-mathieu.desnoyers@efficios.com
2018-06-06 11:58:34 +02:00
Nicholas Piggin ff5bc793e4 powerpc/64s/radix: Fix missing ptesync in flush_cache_vmap
There is a typo in f1cb8f9beb ("powerpc/64s/radix: avoid ptesync after
set_pte and ptep_set_access_flags") config ifdef, which results in the
necessary ptesync not being issued after vmalloc.

This causes random kernel faults in module load, bpf load, anywhere
that vmalloc mappings are used.

After correcting the code, this survives a guest kernel booting
hundreds of times where previously there would be a crash every few
boots (I haven't noticed the crash on host, perhaps due to different
TLB and page table walking behaviour in hardware).

A memory clobber is also added to the flush, just to be sure it won't
be reordered with the pte set or the subsequent mapping access.

Fixes: f1cb8f9beb ("powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-06 18:50:53 +10:00
Linus Torvalds abf7dba7c4 Char/Misc driver patches for 4.18-rc1
Here is the "big" char and misc driver patches for 4.18-rc1.
 
 It's not a lot of stuff here, but there are some highlights:
 	- coreboot driver updates
 	- soundwire driver updates
 	- android binder updates
 	- fpga big sync, mostly documentation
 	- lots of minor driver updates
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWxbXfQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymwCACdFbUy2kWwrpZWSfSBpawfrs75lLMAmwVOe+62
 9aDsDWzDVUEFxF20qiE6
 =CMJ3
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver updates from Greg KH:
 "Here is the "big" char and misc driver patches for 4.18-rc1.

  It's not a lot of stuff here, but there are some highlights:

   - coreboot driver updates

   - soundwire driver updates

   - android binder updates

   - fpga big sync, mostly documentation

   - lots of minor driver updates

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (81 commits)
  vmw_balloon: fixing double free when batching mode is off
  MAINTAINERS: Add driver-api/fpga path
  fpga: clarify that unregister functions also free
  documentation: fpga: move fpga-region.txt to driver-api
  documentation: fpga: add bridge document to driver-api
  documentation: fpga: move fpga-mgr.txt to driver-api
  Documentation: fpga: move fpga overview to driver-api
  fpga: region: kernel-doc fixes
  fpga: bridge: kernel-doc fixes
  fpga: mgr: kernel-doc fixes
  fpga: use SPDX
  fpga: region: change api, add fpga_region_create/free
  fpga: bridge: change api, don't use drvdata
  fpga: manager: change api, don't use drvdata
  fpga: region: don't use drvdata in common fpga code
  Drivers: hv: vmbus: Removed an unnecessary cast from void *
  ver_linux: Drop redundant calls to system() to test if file is readable
  ver_linux: Move stderr redirection from function parameter to function body
  misc: IBM Virtual Management Channel Driver (VMC)
  rpmsg: Correct support for MODULE_DEVICE_TABLE()
  ...
2018-06-05 16:20:22 -07:00
Linus Torvalds ba252f16e4 Merge branch 'timers-2038-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull time/Y2038 updates from Thomas Gleixner:

 - Consolidate SySV IPC UAPI headers

 - Convert SySV IPC to the new COMPAT_32BIT_TIME mechanism

 - Cleanup the core interfaces and standardize on the ktime_get_* naming
   convention.

 - Convert the X86 platform ops to timespec64

 - Remove the ugly temporary timespec64 hack

* 'timers-2038-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
  x86: Convert x86_platform_ops to timespec64
  timekeeping: Add more coarse clocktai/boottime interfaces
  timekeeping: Add ktime_get_coarse_with_offset
  timekeeping: Standardize on ktime_get_*() naming
  timekeeping: Clean up ktime_get_real_ts64
  timekeeping: Remove timespec64 hack
  y2038: ipc: Redirect ipc(SEMTIMEDOP, ...) to compat_ksys_semtimedop
  y2038: ipc: Enable COMPAT_32BIT_TIME
  y2038: ipc: Use __kernel_timespec
  y2038: ipc: Report long times to user space
  y2038: ipc: Use ktime_get_real_seconds consistently
  y2038: xtensa: Extend sysvipc data structures
  y2038: powerpc: Extend sysvipc data structures
  y2038: sparc: Extend sysvipc data structures
  y2038: parisc: Extend sysvipc data structures
  y2038: mips: Extend sysvipc data structures
  y2038: arm64: Extend sysvipc compat data structures
  y2038: s390: Remove unneeded ipc uapi header files
  y2038: ia64: Remove unneeded ipc uapi header files
  y2038: alpha: Remove unneeded ipc uapi header files
  ...
2018-06-04 21:02:18 -07:00
Linus Torvalds 0bbcce5d1e Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timers and timekeeping updates from Thomas Gleixner:

 - Core infrastucture work for Y2038 to address the COMPAT interfaces:

     + Add a new Y2038 safe __kernel_timespec and use it in the core
       code

     + Introduce config switches which allow to control the various
       compat mechanisms

     + Use the new config switch in the posix timer code to control the
       32bit compat syscall implementation.

 - Prevent bogus selection of CPU local clocksources which causes an
   endless reselection loop

 - Remove the extra kthread in the clocksource code which has no value
   and just adds another level of indirection

 - The usual bunch of trivial updates, cleanups and fixlets all over the
   place

 - More SPDX conversions

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  clocksource/drivers/mxs_timer: Switch to SPDX identifier
  clocksource/drivers/timer-imx-tpm: Switch to SPDX identifier
  clocksource/drivers/timer-imx-gpt: Switch to SPDX identifier
  clocksource/drivers/timer-imx-gpt: Remove outdated file path
  clocksource/drivers/arc_timer: Add comments about locking while read GFRC
  clocksource/drivers/mips-gic-timer: Add pr_fmt and reword pr_* messages
  clocksource/drivers/sprd: Fix Kconfig dependency
  clocksource: Move inline keyword to the beginning of function declarations
  timer_list: Remove unused function pointer typedef
  timers: Adjust a kernel-doc comment
  tick: Prefer a lower rating device only if it's CPU local device
  clocksource: Remove kthread
  time: Change nanosleep to safe __kernel_* types
  time: Change types to new y2038 safe __kernel_* types
  time: Fix get_timespec64() for y2038 safe compat interfaces
  time: Add new y2038 safe __kernel_timespec
  posix-timers: Make compat syscalls depend on CONFIG_COMPAT_32BIT_TIME
  time: Introduce CONFIG_COMPAT_32BIT_TIME
  time: Introduce CONFIG_64BIT_TIME in architectures
  compat: Enable compat_get/put_timespec64 always
  ...
2018-06-04 20:27:54 -07:00
Linus Torvalds db020be9f7 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:

 - Consolidation of softirq pending:

   The softirq mask and its accessors/mutators have many implementations
   scattered around many architectures. Most do the same things
   consisting in a field in a per-cpu struct (often irq_cpustat_t)
   accessed through per-cpu ops. We can provide instead a generic
   efficient version that most of them can use. In fact s390 is the only
   exception because the field is stored in lowcore.

 - Support for level!?! triggered MSI (ARM)

   Over the past couple of years, we've seen some SoCs coming up with
   ways of signalling level interrupts using a new flavor of MSIs, where
   the MSI controller uses two distinct messages: one that raises a
   virtual line, and one that lowers it. The target MSI controller is in
   charge of maintaining the state of the line.

   This allows for a much simplified HW signal routing (no need to have
   hundreds of discrete lines to signal level interrupts if you already
   have a memory bus), but results in a departure from the current idea
   the kernel has of MSIs.

 - Support for Meson-AXG GPIO irqchip

 - Large stm32 irqchip rework (suspend/resume, hierarchical domains)

 - More SPDX conversions

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
  ARM: dts: stm32: Add exti support to stm32mp157 pinctrl
  ARM: dts: stm32: Add exti support for stm32mp157c
  pinctrl/stm32: Add irq_eoi for stm32gpio irqchip
  irqchip/stm32: Add suspend/resume support for hierarchy domain
  irqchip/stm32: Add stm32mp1 support with hierarchy domain
  irqchip/stm32: Prepare common functions
  irqchip/stm32: Add host and driver data structures
  irqchip/stm32: Add suspend support
  irqchip/stm32: Add falling pending register support
  irqchip/stm32: Checkpatch fix
  irqchip/stm32: Optimizes and cleans up stm32-exti irq_domain
  irqchip/meson-gpio: Add support for Meson-AXG SoCs
  dt-bindings: interrupt-controller: New binding for Meson-AXG SoC
  dt-bindings: interrupt-controller: Fix the double quotes
  softirq/s390: Move default mutators of overwritten softirq mask to s390
  softirq/x86: Switch to generic local_softirq_pending() implementation
  softirq/sparc: Switch to generic local_softirq_pending() implementation
  softirq/powerpc: Switch to generic local_softirq_pending() implementation
  softirq/parisc: Switch to generic local_softirq_pending() implementation
  softirq/ia64: Switch to generic local_softirq_pending() implementation
  ...
2018-06-04 19:59:22 -07:00
Linus Torvalds 93e95fa574 Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull siginfo updates from Eric Biederman:
 "This set of changes close the known issues with setting si_code to an
  invalid value, and with not fully initializing struct siginfo. There
  remains work to do on nds32, arc, unicore32, powerpc, arm, arm64, ia64
  and x86 to get the code that generates siginfo into a simpler and more
  maintainable state. Most of that work involves refactoring the signal
  handling code and thus careful code review.

  Also not included is the work to shrink the in kernel version of
  struct siginfo. That depends on getting the number of places that
  directly manipulate struct siginfo under control, as it requires the
  introduction of struct kernel_siginfo for the in kernel things.

  Overall this set of changes looks like it is making good progress, and
  with a little luck I will be wrapping up the siginfo work next
  development cycle"

* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
  signal/sh: Stop gcc warning about an impossible case in do_divide_error
  signal/mips: Report FPE_FLTUNK for undiagnosed floating point exceptions
  signal/um: More carefully relay signals in relay_signal.
  signal: Extend siginfo_layout with SIL_FAULT_{MCEERR|BNDERR|PKUERR}
  signal: Remove unncessary #ifdef SEGV_PKUERR in 32bit compat code
  signal/signalfd: Add support for SIGSYS
  signal/signalfd: Remove __put_user from signalfd_copyinfo
  signal/xtensa: Use force_sig_fault where appropriate
  signal/xtensa: Consistenly use SIGBUS in do_unaligned_user
  signal/um: Use force_sig_fault where appropriate
  signal/sparc: Use force_sig_fault where appropriate
  signal/sparc: Use send_sig_fault where appropriate
  signal/sh: Use force_sig_fault where appropriate
  signal/s390: Use force_sig_fault where appropriate
  signal/riscv: Replace do_trap_siginfo with force_sig_fault
  signal/riscv: Use force_sig_fault where appropriate
  signal/parisc: Use force_sig_fault where appropriate
  signal/parisc: Use force_sig_mceerr where appropriate
  signal/openrisc: Use force_sig_fault where appropriate
  signal/nios2: Use force_sig_fault where appropriate
  ...
2018-06-04 15:23:48 -07:00
Linus Torvalds e5a594643a dma-mapping updates for 4.18:
- replaceme the force_dma flag with a dma_configure bus method.
    (Nipun Gupta, although one patch is іncorrectly attributed to me
     due to a git rebase bug)
  - use GFP_DMA32 more agressively in dma-direct. (Takashi Iwai)
  - remove PCI_DMA_BUS_IS_PHYS and rely on the dma-mapping API to do the
    right thing for bounce buffering.
  - move dma-debug initialization to common code, and apply a few cleanups
    to the dma-debug code.
  - cleanup the Kconfig mess around swiotlb selection
  - swiotlb comment fixup (Yisheng Xie)
  - a trivial swiotlb fix. (Dan Carpenter)
  - support swiotlb on RISC-V. (based on a patch from Palmer Dabbelt)
  - add a new generic dma-noncoherent dma_map_ops implementation and use
    it for arc, c6x and nds32.
  - improve scatterlist validity checking in dma-debug. (Robin Murphy)
  - add a struct device quirk to limit the dma-mask to 32-bit due to
    bridge/system issues, and switch x86 to use it instead of a local
    hack for VIA bridges.
  - handle devices without a dma_mask more gracefully in the dma-direct
    code.
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlsU1hwLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYPraxAAocC7JiFKW133/VugCtGA1x9uE8DPHealtsWTAeEq
 KOOB3GxWMU2hKqQ4km5tcfdWoGJvvab6hmDXcitzZGi2JajO7Ae0FwIy3yvxSIKm
 iH/ON7c4sJt8gKrXYsLVylmwDaimNs4a6xfODoCRgnWuovI2QrrZzupnlzPNsiOC
 lv8ezzcW+Ay/gvDD/r72psO+w3QELETif/OzR/qTOtvLrVabM06eHmPQ8Wb98smu
 /UPMMv6/3XwQnxpxpdyqN+p/gUdneXithzT261wTeZ+8gDXmcWBwHGcMBCimcoBi
 FklW52moazIPIsTysqoNlVFsLGJTeS4p2D3BLAp5NwWYsLv+zHUVZsI1JY/8u5Ox
 mM11LIfvu9JtUzaqD9SvxlxIeLhhYZZGnUoV3bQAkpHSQhN/xp2YXd5NWSo5ac2O
 dch83+laZkZgd6ryw6USpt/YTPM/UHBYy7IeGGHX/PbmAke0ZlvA6Rae7kA5DG59
 7GaLdwQyrHp8uGFgwze8P+R4POSk1ly73HHLBT/pFKnDD7niWCPAnBzuuEQGJs00
 0zuyWLQyzOj1l6HCAcMNyGnYSsMp8Fx0fvEmKR/EYs8O83eJKXi6L9aizMZx4v1J
 0wTolUWH6SIIdz474YmewhG5YOLY7mfe9E8aNr8zJFdwRZqwaALKoteRGUxa3f6e
 zUE=
 =6Acj
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-4.18' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping updates from Christoph Hellwig:

 - replace the force_dma flag with a dma_configure bus method. (Nipun
   Gupta, although one patch is іncorrectly attributed to me due to a
   git rebase bug)

 - use GFP_DMA32 more agressively in dma-direct. (Takashi Iwai)

 - remove PCI_DMA_BUS_IS_PHYS and rely on the dma-mapping API to do the
   right thing for bounce buffering.

 - move dma-debug initialization to common code, and apply a few
   cleanups to the dma-debug code.

 - cleanup the Kconfig mess around swiotlb selection

 - swiotlb comment fixup (Yisheng Xie)

 - a trivial swiotlb fix. (Dan Carpenter)

 - support swiotlb on RISC-V. (based on a patch from Palmer Dabbelt)

 - add a new generic dma-noncoherent dma_map_ops implementation and use
   it for arc, c6x and nds32.

 - improve scatterlist validity checking in dma-debug. (Robin Murphy)

 - add a struct device quirk to limit the dma-mask to 32-bit due to
   bridge/system issues, and switch x86 to use it instead of a local
   hack for VIA bridges.

 - handle devices without a dma_mask more gracefully in the dma-direct
   code.

* tag 'dma-mapping-4.18' of git://git.infradead.org/users/hch/dma-mapping: (48 commits)
  dma-direct: don't crash on device without dma_mask
  nds32: use generic dma_noncoherent_ops
  nds32: implement the unmap_sg DMA operation
  nds32: consolidate DMA cache maintainance routines
  x86/pci-dma: switch the VIA 32-bit DMA quirk to use the struct device flag
  x86/pci-dma: remove the explicit nodac and allowdac option
  x86/pci-dma: remove the experimental forcesac boot option
  Documentation/x86: remove a stray reference to pci-nommu.c
  core, dma-direct: add a flag 32-bit dma limits
  dma-mapping: remove unused gfp_t parameter to arch_dma_alloc_attrs
  dma-debug: check scatterlist segments
  c6x: use generic dma_noncoherent_ops
  arc: use generic dma_noncoherent_ops
  arc: fix arc_dma_{map,unmap}_page
  arc: fix arc_dma_sync_sg_for_{cpu,device}
  arc: simplify arc_dma_sync_single_for_{cpu,device}
  dma-mapping: provide a generic dma-noncoherent implementation
  dma-mapping: simplify Kconfig dependencies
  riscv: add swiotlb support
  riscv: only enable ZONE_DMA32 for 64-bit
  ...
2018-06-04 10:58:12 -07:00
Christophe Leroy 60f1d2893e powerpc/time: inline arch_vtime_task_switch()
arch_vtime_task_switch() is a small function which is called
only from vtime_common_task_switch(), so it is worth inlining

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-04 00:39:20 +10:00
Christophe Leroy e9c4943a10 powerpc: Implement csum_ipv6_magic in assembly
The generic csum_ipv6_magic() generates a pretty bad result

00000000 <csum_ipv6_magic>: (PPC32)
   0:	81 23 00 00 	lwz     r9,0(r3)
   4:	81 03 00 04 	lwz     r8,4(r3)
   8:	7c e7 4a 14 	add     r7,r7,r9
   c:	7d 29 38 10 	subfc   r9,r9,r7
  10:	7d 4a 51 10 	subfe   r10,r10,r10
  14:	7d 27 42 14 	add     r9,r7,r8
  18:	7d 2a 48 50 	subf    r9,r10,r9
  1c:	80 e3 00 08 	lwz     r7,8(r3)
  20:	7d 08 48 10 	subfc   r8,r8,r9
  24:	7d 4a 51 10 	subfe   r10,r10,r10
  28:	7d 29 3a 14 	add     r9,r9,r7
  2c:	81 03 00 0c 	lwz     r8,12(r3)
  30:	7d 2a 48 50 	subf    r9,r10,r9
  34:	7c e7 48 10 	subfc   r7,r7,r9
  38:	7d 4a 51 10 	subfe   r10,r10,r10
  3c:	7d 29 42 14 	add     r9,r9,r8
  40:	7d 2a 48 50 	subf    r9,r10,r9
  44:	80 e4 00 00 	lwz     r7,0(r4)
  48:	7d 08 48 10 	subfc   r8,r8,r9
  4c:	7d 4a 51 10 	subfe   r10,r10,r10
  50:	7d 29 3a 14 	add     r9,r9,r7
  54:	7d 2a 48 50 	subf    r9,r10,r9
  58:	81 04 00 04 	lwz     r8,4(r4)
  5c:	7c e7 48 10 	subfc   r7,r7,r9
  60:	7d 4a 51 10 	subfe   r10,r10,r10
  64:	7d 29 42 14 	add     r9,r9,r8
  68:	7d 2a 48 50 	subf    r9,r10,r9
  6c:	80 e4 00 08 	lwz     r7,8(r4)
  70:	7d 08 48 10 	subfc   r8,r8,r9
  74:	7d 4a 51 10 	subfe   r10,r10,r10
  78:	7d 29 3a 14 	add     r9,r9,r7
  7c:	7d 2a 48 50 	subf    r9,r10,r9
  80:	81 04 00 0c 	lwz     r8,12(r4)
  84:	7c e7 48 10 	subfc   r7,r7,r9
  88:	7d 4a 51 10 	subfe   r10,r10,r10
  8c:	7d 29 42 14 	add     r9,r9,r8
  90:	7d 2a 48 50 	subf    r9,r10,r9
  94:	7d 08 48 10 	subfc   r8,r8,r9
  98:	7d 4a 51 10 	subfe   r10,r10,r10
  9c:	7d 29 2a 14 	add     r9,r9,r5
  a0:	7d 2a 48 50 	subf    r9,r10,r9
  a4:	7c a5 48 10 	subfc   r5,r5,r9
  a8:	7c 63 19 10 	subfe   r3,r3,r3
  ac:	7d 29 32 14 	add     r9,r9,r6
  b0:	7d 23 48 50 	subf    r9,r3,r9
  b4:	7c c6 48 10 	subfc   r6,r6,r9
  b8:	7c 63 19 10 	subfe   r3,r3,r3
  bc:	7c 63 48 50 	subf    r3,r3,r9
  c0:	54 6a 80 3e 	rotlwi  r10,r3,16
  c4:	7c 63 52 14 	add     r3,r3,r10
  c8:	7c 63 18 f8 	not     r3,r3
  cc:	54 63 84 3e 	rlwinm  r3,r3,16,16,31
  d0:	4e 80 00 20 	blr

0000000000000000 <.csum_ipv6_magic>: (PPC64)
   0:	81 23 00 00 	lwz     r9,0(r3)
   4:	80 03 00 04 	lwz     r0,4(r3)
   8:	81 63 00 08 	lwz     r11,8(r3)
   c:	7c e7 4a 14 	add     r7,r7,r9
  10:	7f 89 38 40 	cmplw   cr7,r9,r7
  14:	7d 47 02 14 	add     r10,r7,r0
  18:	7d 30 10 26 	mfocrf  r9,1
  1c:	55 29 f7 fe 	rlwinm  r9,r9,30,31,31
  20:	7d 4a 4a 14 	add     r10,r10,r9
  24:	7f 80 50 40 	cmplw   cr7,r0,r10
  28:	7d 2a 5a 14 	add     r9,r10,r11
  2c:	80 03 00 0c 	lwz     r0,12(r3)
  30:	81 44 00 00 	lwz     r10,0(r4)
  34:	7d 10 10 26 	mfocrf  r8,1
  38:	55 08 f7 fe 	rlwinm  r8,r8,30,31,31
  3c:	7d 29 42 14 	add     r9,r9,r8
  40:	81 04 00 04 	lwz     r8,4(r4)
  44:	7f 8b 48 40 	cmplw   cr7,r11,r9
  48:	7d 29 02 14 	add     r9,r9,r0
  4c:	7d 70 10 26 	mfocrf  r11,1
  50:	55 6b f7 fe 	rlwinm  r11,r11,30,31,31
  54:	7d 29 5a 14 	add     r9,r9,r11
  58:	7f 80 48 40 	cmplw   cr7,r0,r9
  5c:	7d 29 52 14 	add     r9,r9,r10
  60:	7c 10 10 26 	mfocrf  r0,1
  64:	54 00 f7 fe 	rlwinm  r0,r0,30,31,31
  68:	7d 69 02 14 	add     r11,r9,r0
  6c:	7f 8a 58 40 	cmplw   cr7,r10,r11
  70:	7c 0b 42 14 	add     r0,r11,r8
  74:	81 44 00 08 	lwz     r10,8(r4)
  78:	7c f0 10 26 	mfocrf  r7,1
  7c:	54 e7 f7 fe 	rlwinm  r7,r7,30,31,31
  80:	7c 00 3a 14 	add     r0,r0,r7
  84:	7f 88 00 40 	cmplw   cr7,r8,r0
  88:	7d 20 52 14 	add     r9,r0,r10
  8c:	80 04 00 0c 	lwz     r0,12(r4)
  90:	7d 70 10 26 	mfocrf  r11,1
  94:	55 6b f7 fe 	rlwinm  r11,r11,30,31,31
  98:	7d 29 5a 14 	add     r9,r9,r11
  9c:	7f 8a 48 40 	cmplw   cr7,r10,r9
  a0:	7d 29 02 14 	add     r9,r9,r0
  a4:	7d 70 10 26 	mfocrf  r11,1
  a8:	55 6b f7 fe 	rlwinm  r11,r11,30,31,31
  ac:	7d 29 5a 14 	add     r9,r9,r11
  b0:	7f 80 48 40 	cmplw   cr7,r0,r9
  b4:	7d 29 2a 14 	add     r9,r9,r5
  b8:	7c 10 10 26 	mfocrf  r0,1
  bc:	54 00 f7 fe 	rlwinm  r0,r0,30,31,31
  c0:	7d 29 02 14 	add     r9,r9,r0
  c4:	7f 85 48 40 	cmplw   cr7,r5,r9
  c8:	7c 09 32 14 	add     r0,r9,r6
  cc:	7d 50 10 26 	mfocrf  r10,1
  d0:	55 4a f7 fe 	rlwinm  r10,r10,30,31,31
  d4:	7c 00 52 14 	add     r0,r0,r10
  d8:	7f 80 30 40 	cmplw   cr7,r0,r6
  dc:	7d 30 10 26 	mfocrf  r9,1
  e0:	55 29 ef fe 	rlwinm  r9,r9,29,31,31
  e4:	7c 09 02 14 	add     r0,r9,r0
  e8:	54 03 80 3e 	rotlwi  r3,r0,16
  ec:	7c 03 02 14 	add     r0,r3,r0
  f0:	7c 03 00 f8 	not     r3,r0
  f4:	78 63 84 22 	rldicl  r3,r3,48,48
  f8:	4e 80 00 20 	blr

This patch implements it in assembly for both PPC32 and PPC64

Link: https://github.com/linuxppc/linux/issues/9
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-04 00:39:19 +10:00
Christophe Leroy 1128bb7813 powerpc/lib: Adjust .balign inside string functions for PPC32
commit 87a156fb18 ("Align hot loops of some string functions")
degraded the performance of string functions by adding useless
nops

A simple benchmark on an 8xx calling 100000x a memchr() that
matches the first byte runs in 41668 TB ticks before this patch
and in 35986 TB ticks after this patch. So this gives an
improvement of approx 10%

Another benchmark doing the same with a memchr() matching the 128th
byte runs in 1011365 TB ticks before this patch and 1005682 TB ticks
after this patch, so regardless on the number of loops, removing
those useless nops improves the test by 5683 TB ticks.

Fixes: 87a156fb18 ("Align hot loops of some string functions")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-04 00:39:19 +10:00
Christophe Leroy 55a0edf083 powerpc/64: optimises from64to32()
The current implementation of from64to32() gives a poor result:

0000000000000270 <.from64to32>:
 270:	38 00 ff ff 	li      r0,-1
 274:	78 69 00 22 	rldicl  r9,r3,32,32
 278:	78 00 00 20 	clrldi  r0,r0,32
 27c:	7c 60 00 38 	and     r0,r3,r0
 280:	7c 09 02 14 	add     r0,r9,r0
 284:	78 09 00 22 	rldicl  r9,r0,32,32
 288:	7c 00 4a 14 	add     r0,r0,r9
 28c:	78 03 00 20 	clrldi  r3,r0,32
 290:	4e 80 00 20 	blr

This patch modifies from64to32() to operate in the same
spirit as csum_fold()

It swaps the two 32-bit halves of sum then it adds it with the
unswapped sum. If there is a carry from adding the two 32-bit halves,
it will carry from the lower half into the upper half, giving us the
correct sum in the upper half.

The resulting code is:

0000000000000260 <.from64to32>:
 260:	78 60 00 02 	rotldi  r0,r3,32
 264:	7c 60 1a 14 	add     r3,r0,r3
 268:	78 63 00 22 	rldicl  r3,r3,32,32
 26c:	4e 80 00 20 	blr

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-04 00:39:16 +10:00
Ravi Bangoria e6684d07e4 powerpc/sstep: Introduce GETTYPE macro
Replace 'op->type & INSTR_TYPE_MASK' expression with GETTYPE(op->type)
macro.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 21:19:40 +10:00
Michael Ellerman ddf35cf376 powerpc: Use barrier_nospec in copy_from_user()
Based on the x86 commit doing the same.

See commit 304ec1b050 ("x86/uaccess: Use __uaccess_begin_nospec()
and uaccess_try_nospec") and b3bbfb3fb5 ("x86: Introduce
__uaccess_begin_nospec() and uaccess_try_nospec") for more detail.

In all cases we are ordering the load from the potentially
user-controlled pointer vs a previous branch based on an access_ok()
check or similar.

Base on a patch from Michal Suchanek.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:45 +10:00
Michal Suchanek cb3d6759a9 powerpc/64s: Enable barrier_nospec based on firmware settings
Check what firmware told us and enable/disable the barrier_nospec as
appropriate.

We err on the side of enabling the barrier, as it's no-op on older
systems, see the comment for more detail.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:45 +10:00
Michal Suchanek 815069ca57 powerpc/64s: Patch barrier_nospec in modules
Note that unlike RFI which is patched only in kernel the nospec state
reflects settings at the time the module was loaded.

Iterating all modules and re-patching every time the settings change
is not implemented.

Based on lwsync patching.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:44 +10:00
Michal Suchanek 2eea7f067f powerpc/64s: Add support for ori barrier_nospec patching
Based on the RFI patching. This is required to be able to disable the
speculation barrier.

Only one barrier type is supported and it does nothing when the
firmware does not enable it. Also re-patching modules is not supported
So the only meaningful thing that can be done is patching out the
speculation barrier at boot when the user says it is not wanted.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:44 +10:00
Michal Suchanek a6b3964ad7 powerpc/64s: Add barrier_nospec
A no-op form of ori (or immediate of 0 into r31 and the result stored
in r31) has been re-tasked as a speculation barrier. The instruction
only acts as a barrier on newer machines with appropriate firmware
support. On older CPUs it remains a harmless no-op.

Implement barrier_nospec using this instruction.

mpe: The semantics of the instruction are believed to be that it
prevents execution of subsequent instructions until preceding branches
have been fully resolved and are no longer executing speculatively.
There is no further documentation available at this time.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:44 +10:00
Michael Ellerman 5cc05910f2 powerpc/64s: Wire up arch_trigger_cpumask_backtrace()
This allows eg. the RCU stall detector, or the soft/hardlockup
detectors to trigger a backtrace on all CPUs.

We implement this by sending a "safe" NMI, which will actually only
send an IPI. Unfortunately the generic code prints "NMI", so that's a
little confusing but we can probably live with it.

If one of the CPUs doesn't respond to the IPI, we then print some info
from it's paca and do a backtrace based on its saved_r1.

Example output:

  INFO: rcu_sched detected stalls on CPUs/tasks:
  	2-...0: (0 ticks this GP) idle=1be/1/4611686018427387904 softirq=1055/1055 fqs=25735
  	(detected by 4, t=58847 jiffies, g=58, c=57, q=1258)
  Sending NMI from CPU 4 to CPUs 2:
  CPU 2 didn't respond to backtrace IPI, inspecting paca.
  irq_soft_mask: 0x01 in_mce: 0 in_nmi: 0 current: 3623 (bash)
  Back trace of paca->saved_r1 (0xc0000000e1c83ba0) (possibly stale):
  Call Trace:
  [c0000000e1c83ba0] [0000000000000014] 0x14 (unreliable)
  [c0000000e1c83bc0] [c000000000765798] lkdtm_do_action+0x48/0x80
  [c0000000e1c83bf0] [c000000000765a40] direct_entry+0x110/0x1b0
  [c0000000e1c83c90] [c00000000058e650] full_proxy_write+0x90/0xe0
  [c0000000e1c83ce0] [c0000000003aae3c] __vfs_write+0x6c/0x1f0
  [c0000000e1c83d80] [c0000000003ab214] vfs_write+0xd4/0x240
  [c0000000e1c83dd0] [c0000000003ab5cc] ksys_write+0x6c/0x110
  [c0000000e1c83e30] [c00000000000b860] system_call+0x58/0x6c

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
2018-06-03 20:43:43 +10:00
Michael Ellerman 6ba55716a2 powerpc/nmi: Add an API for sending "safe" NMIs
Currently the options we have for sending NMIs are not necessarily
safe, that is they can potentially interrupt a CPU in a
non-recoverable region of code, meaning the kernel must then panic().

But we'd like to use smp_send_nmi_ipi() to do cross-CPU calls in
situations where we don't want to risk a panic(), because it doesn't
have the requirement that interrupts must be enabled like
smp_call_function().

So add an API for the caller to indicate that it wants to use the NMI
infrastructure, but doesn't want to do anything "unsafe".

Currently that is implemented by not actually calling cause_nmi_ipi(),
instead falling back to an IPI. In future we can pass the safe
parameter down to cause_nmi_ipi() and the individual backends can
potentially take it into account before deciding what to do.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
2018-06-03 20:43:43 +10:00
Michael Ellerman 7b08729cb2 powerpc/64: Save stack pointer when we hard disable interrupts
A CPU that gets stuck with interrupts hard disable can be difficult to
debug, as on some platforms we have no way to interrupt the CPU to
find out what it's doing.

A stop-gap is to have the CPU save it's stack pointer (r1) in its paca
when it hard disables interrupts. That way if we can't interrupt it,
we can at least trace the stack based on where it last disabled
interrupts.

In some cases that will be total junk, but the stack trace code should
handle that. In the simple case of a CPU that disable interrupts and
then gets stuck in a loop, the stack trace should be informative.

We could clear the saved stack pointer when we enable interrupts, but
that loses information which could be useful if we have nothing else
to go on.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
2018-06-03 20:43:42 +10:00
Michael Ellerman 3e3786801b powerpc: Check address limit on user-mode return (TIF_FSCHECK)
set_fs() sets the addr_limit, which is used in access_ok() to
determine if an address is a user or kernel address.

Some code paths use set_fs() to temporarily elevate the addr_limit so
that kernel code can read/write kernel memory as if it were user
memory. That is fine as long as the code can't ever return to
userspace with the addr_limit still elevated.

If that did happen, then userspace can read/write kernel memory as if
it were user memory, eg. just with write(2). In case it's not clear,
that is very bad. It has also happened in the past due to bugs.

Commit 5ea0727b16 ("x86/syscalls: Check address limit on user-mode
return") added a mechanism to check the addr_limit value before
returning to userspace. Any call to set_fs() sets a thread flag,
TIF_FSCHECK, and if we see that on the return to userspace we go out
of line to check that the addr_limit value is not elevated.

For further info see the above commit, as well as:
  https://lwn.net/Articles/722267/
  https://bugs.chromium.org/p/project-zero/issues/detail?id=990

Verified to work on 64-bit Book3S using a POC that objdumps the system
call handler, and a modified lkdtm_CORRUPT_USER_DS() that doesn't kill
the caller.

Before:
  $ sudo ./test-tif-fscheck
  ...
  0000000000000000 <.data>:
         0:       e1 f7 8a 79     rldicl. r10,r12,30,63
         4:       80 03 82 40     bne     0x384
         8:       00 40 8a 71     andi.   r10,r12,16384
         c:       78 0b 2a 7c     mr      r10,r1
        10:       10 fd 21 38     addi    r1,r1,-752
        14:       08 00 c2 41     beq-    0x1c
        18:       58 09 2d e8     ld      r1,2392(r13)
        1c:       00 00 41 f9     std     r10,0(r1)
        20:       70 01 61 f9     std     r11,368(r1)
        24:       78 01 81 f9     std     r12,376(r1)
        28:       70 00 01 f8     std     r0,112(r1)
        2c:       78 00 41 f9     std     r10,120(r1)
        30:       20 00 82 41     beq     0x50
        34:       a6 42 4c 7d     mftb    r10

After:

  $ sudo ./test-tif-fscheck
  Killed

And in dmesg:
  Invalid address limit on user-mode return
  WARNING: CPU: 1 PID: 3689 at ../include/linux/syscalls.h:260 do_notify_resume+0x140/0x170
  ...
  NIP [c00000000001ee50] do_notify_resume+0x140/0x170
  LR [c00000000001ee4c] do_notify_resume+0x13c/0x170
  Call Trace:
    do_notify_resume+0x13c/0x170 (unreliable)
    ret_from_except_lite+0x70/0x74

Performance overhead is essentially zero in the usual case, because
the bit is checked as part of the existing _TIF_USER_WORK_MASK check.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:42 +10:00
Michael Ellerman ba0635fcbe powerpc: Rename thread_struct.fs to addr_limit
It's called 'fs' for historical reasons, it's named after the x86 'FS'
register. But we don't have to use that name for the member of
thread_struct, and in fact arch/x86 doesn't even call it 'fs' anymore.

So rename it to 'addr_limit', which better reflects what it's used
for, and is also the name used on other arches.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:42 +10:00
Sam Bobroff 309ed3a715 powerpc/eeh: Introduce eeh_for_each_pe()
Add a for_each-style macro for iterating through PEs without the
boilerplate required by a traversal function. eeh_pe_next() is now
exported, as it is now used directly in place.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:39 +10:00
Sam Bobroff d6c4932fbf powerpc/eeh: Strengthen types of eeh traversal functions
The traversal functions eeh_pe_traverse() and eeh_pe_dev_traverse()
both provide their first argument as void * but every single user casts
it to the expected type.

Change the type of the first parameter from void * to the appropriate
type, and clean up all uses.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:38 +10:00
Anju T Sudhakar 25af86b2ae powerpc/perf: Unregister thread-imc if core-imc not supported
Since thread-imc internally use the core-imc hardware infrastructure
and is depended on it, having thread-imc in the kernel in the
absence of core-imc is trivial. Patch disables thread-imc, if
core-imc is not registered.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:37 +10:00
Russell Currey 8a792262f3 powerpc/xive: Remove (almost) unused macros
The GETFIELD and SETFIELD macros in xive-regs.h aren't used except for
a single instance of GETFIELD, so replace that and remove them.

These macros are also defined in vas.h, so either those should be
eventually replaced or the macros moved into bitops.h.

Signed-off-by: Russell Currey <ruscur@russell.cc>
[mpe: Rewrite the assignment to 'he' to avoid ffs() etc.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:35 +10:00
Arnd Bergmann 34efabe418 powerpc: remove unused to_tm() helper
to_tm() is now completely unused, the only reference being in the
_dump_time() helper that is also unused. This removes both, leaving
the rest of the powerpc RTC code y2038 safe to as far as the hardware
supports.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:34 +10:00
Arnd Bergmann 5bfd643583 powerpc: use time64_t in read_persistent_clock
Looking through the remaining users of the deprecated mktime()
function, I found the powerpc rtc handlers, which use it in
place of rtc_tm_to_time64().

To clean this up, I'm changing over the read_persistent_clock()
function to the read_persistent_clock64() variant, and change
all the platform specific handlers along with it.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:33 +10:00
Nicholas Piggin 0cef77c779 powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask
When a single-threaded process has a non-local mm_cpumask, try to use
that point to flush the TLBs out of other CPUs in the cpumask.

An IPI is used for clearing remote CPUs for a few reasons:
- An IPI can end lazy TLB use of the mm, which is required to prevent
  TLB entries being created on the remote CPU. The alternative is to
  drop lazy TLB switching completely, which costs 7.5% in a context
  switch ping-pong test betwee a process and kernel idle thread.
- An IPI can have remote CPUs flush the entire PID, but the local CPU
  can flush a specific VA. tlbie would require over-flushing of the
  local CPU (where the process is running).
- A single threaded process that is migrated to a different CPU is
  likely to have a relatively small mm_cpumask, so IPI is reasonable.

No other thread can concurrently switch to this mm, because it must
have been given a reference to mm_users by the current thread before it
can use_mm. mm_users can be asynchronously incremented (by
mm_activate or mmget_not_zero), but those users must use remote mm
access and can't use_mm or access user address space. Existing code
makes the this assumption already, for example sparc64 has reset
mm_cpumask using this condition since the start of history, see
arch/sparc/kernel/smp_64.c.

This reduces tlbies for a kernel compile workload from 0.90M to 0.12M,
tlbiels are increased significantly due to the PID flushing for the
cleaning up remote CPUs, and increased local flushes (PID flushes take
128 tlbiels vs 1 tlbie).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:36 +10:00
Nicholas Piggin 85bcfaf69c powerpc/64s/radix: optimise pte_update
Implementing pte_update with pte_xchg (which uses cmpxchg) is
inefficient. A single larx/stcx. works fine, no need for the less
efficient cmpxchg sequence.

Then remove the memory barriers from the operation. There is a
requirement for TLB flushing to load mm_cpumask after the store
that reduces pte permissions, which is moved into the TLB flush
code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:36 +10:00
Nicholas Piggin f1cb8f9beb powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags
The ISA suggests ptesync after setting a pte, to prevent a table walk
initiated by a subsequent access from missing that store and causing a
spurious fault. This is an architectual allowance that allows an
implementation's page table walker to be incoherent with the store
queue.

However there is no correctness problem in taking a spurious fault in
userspace -- the kernel copes with these at any time, so the updated
pte will be found eventually. Spurious kernel faults on vmap memory
must be avoided, so a ptesync is put into flush_cache_vmap.

On POWER9 so far I have not found a measurable window where this can
result in more minor faults, so as an optimisation, remove the costly
ptesync from pte updates. If an implementation benefits from ptesync,
it would be better to add it back in update_mmu_cache, so it's not
done for things like fork(2).

fork --fork --exec benchmark improved 5.2% (12400->13100).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:36 +10:00
Nicholas Piggin f569bd94ef powerpc/64s/radix: make ptep_get_and_clear_full non-atomic for the full case
This matches other architectures, when we know there will be no
further accesses to the address (e.g., for teardown), page table
entries can be cleared non-atomically.

The comments about NMMU are bogus: all MMU notifiers (including NMMU)
are released at this point, with their TLBs flushed. An NMMU access at
this point would be a bug.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:35 +10:00
Nicholas Piggin 6d8278c414 powerpc/64s/radix: do not flush TLB on spurious fault
In the case of a spurious fault (which can happen due to a race with
another thread that changes the page table), the default Linux mm code
calls flush_tlb_page for that address. This is not required because
the pte will be re-fetched. Hash does not wire this up to a hardware
TLB flush for this reason. This patch avoids the flush for radix.

>From Power ISA v3.0B, p.1090:

    Setting a Reference or Change Bit or Upgrading Access Authority
    (PTE Subject to Atomic Hardware Updates)

    If the only change being made to a valid PTE that is subject to
    atomic hardware updates is to set the Refer- ence or Change bit to
    1 or to add access authorities, a simpler sequence suffices
    because the translation hardware will refetch the PTE if an access
    is attempted for which the only problems were reference and/or
    change bits needing to be set or insufficient access authority.

The nest MMU on POWER9 does not re-fetch the PTE after such an access
attempt before faulting, so address spaces with a coprocessor
attached will continue to flush in these cases.

This reduces tlbies for a kernel compile workload from 0.95M to 0.90M.

fork --fork --exec benchmark improved 0.5% (12300->12400).

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:35 +10:00
Aneesh Kumar K.V bd5050e38a powerpc/mm/radix: Change pte relax sequence to handle nest MMU hang
When relaxing access (read -> read_write update), pte needs to be marked invalid
to handle a nest MMU bug. We also need to do a tlb flush after the pte is
marked invalid before updating the pte with new access bits.

We also move tlb flush to platform specific __ptep_set_access_flags. This will
help us to gerid of unnecessary tlb flush on BOOK3S 64 later. We don't do that
in this patch. This also helps in avoiding multiple tlbies with coprocessor
attached.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:34 +10:00
Aneesh Kumar K.V e4c1112c3f powerpc/mm: Change function prototype
In later patch, we use the vma and psize to do tlb flush. Do the prototype
update in separate patch to make the review easy.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:34 +10:00
Aneesh Kumar K.V 044003b52a powerpc/mm/radix: Move function from radix.h to pgtable-radix.c
In later patch we will update them which require them to be moved
to pgtable-radix.c. Keeping the function in radix.h results in
compile warning as below.

./arch/powerpc/include/asm/book3s/64/radix.h: In function ‘radix__ptep_set_access_flags’:
./arch/powerpc/include/asm/book3s/64/radix.h:196:28: error: dereferencing pointer to incomplete type ‘struct vm_area_struct’
  struct mm_struct *mm = vma->vm_mm;
                            ^~
./arch/powerpc/include/asm/book3s/64/radix.h:204:6: error: implicit declaration of function ‘atomic_read’; did you mean ‘__atomic_load’? [-Werror=implicit-function-declaration]
      atomic_read(&mm->context.copros) > 0) {
      ^~~~~~~~~~~
      __atomic_load
./arch/powerpc/include/asm/book3s/64/radix.h:204:21: error: dereferencing pointer to incomplete type ‘struct mm_struct’
      atomic_read(&mm->context.copros) > 0) {

Instead of fixing header dependencies, we move the function to pgtable-radix.c
Also the function is now large to be a static inline . Doing the
move in separate patch helps in review.

No functional change in this patch. Only code movement.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:34 +10:00
Aneesh Kumar K.V f069ff396d powerpc/mm/hugetlb: Update huge_ptep_set_access_flags to call __ptep_set_access_flags directly
In a later patch, we want to update __ptep_set_access_flags take page size
arg. This makes ptep_set_access_flags only work with mmu_virtual_psize.
To simplify the code make huge_ptep_set_access_flags directly call
__ptep_set_access_flags so that we can compute the hugetlb page size in
hugetlb function.

Now that ptep_set_access_flags won't be called for hugetlb remove
the is_vm_hugetlb_page() check and add the assert of pte lock
unconditionally.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:33 +10:00
Alastair D'Silva 19df39581c ocxl: Rename pnv_ocxl_spa_remove_pe to clarify it's action
The function removes the process element from NPU cache.

Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:32 +10:00
Alastair D'Silva 71cc64a85d powerpc: use task_pid_nr() for TID allocation
The current implementation of TID allocation, using a global IDR, may
result in an errant process starving the system of available TIDs.
Instead, use task_pid_nr(), as mentioned by the original author. The
scenario described which prevented it's use is not applicable, as
set_thread_tidr can only be called after the task struct has been
populated.

In the unlikely event that 2 threads share the TID and are waiting,
all potential outcomes have been determined safe.

Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:31 +10:00
Alastair D'Silva 819844285e powerpc: Add TIDR CPU feature for POWER9
This patch adds a CPU feature bit to show whether the CPU has
the TIDR register available, enabling as_notify/wait in userspace.

Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:31 +10:00
Nicholas Piggin ee03b9b447 powerpc/powernv: call OPAL_QUIESCE before OPAL_SIGNAL_SYSTEM_RESET
Although it is often possible to recover a CPU that was interrupted
from OPAL with a system reset NMI, it's undesirable to interrupt them
for a few reasons. Firstly because dump/debug code itself needs to
call firmware, so it could hang on a lock or possibly corrupt a
per-cpu data structure if it or another CPU was interrupted from
OPAL. Secondly, the kexec crash dump code will not return from
interrupt to unwind the OPAL call.

Call OPAL_QUIESCE with QUIESCE_HOLD before sending an NMI IPI to
another CPU, which wait for it to leave firmware (or time out) to
avoid this problem in normal conditions. Firmware bugs may still
result in a timeout and interrupting OPAL, but that is the best
option (stops the CPU, and possibly allows firmware to be debugged).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:30 +10:00
Nicholas Piggin e360cd37f0 powerpc/time: account broadcast timer event interrupts separately
These are not local timer interrupts but IPIs. It's good to be able
to see how timer offloading is behaving, so split these out into
their own category.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:29 +10:00
Nicholas Piggin 3f984620f9 powerpc: generic clockevents broadcast receiver call tick_receive_broadcast
The broadcast tick recipient can call tick_receive_broadcast rather
than re-running the full timer interrupt.

It does not have to check for the next event time, because the sender
already determined the timer has expired. It does not have to test
irq_work_pending, because that's a direct decrementer interrupt and
does not go through the clock events subsystem. And it does not have
to read PURR because that was removed with the previous patch.

This results in no code size change, but both the decrementer and
broadcast path lengths are reduced.

Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:27 +10:00
Nicholas Piggin 3d3a6021dd powerpc/pseries: lparcfg calculate PURR on demand
For SPLPAR, lparcfg provides a sum of PURR registers for all CPUs.
Currently this is done by reading PURR in context switch and timer
interrupt, and storing that into a per-CPU variable. These are summed
to provide the value.

This does not work with all timer schemes (e.g., NO_HZ_FULL), and it
is sub-optimal for performance because it reads the PURR register on
every context switch, although that's been difficult to distinguish
from noise in the contxt_switch microbenchmark.

This patch implements the sum by calling a function on each CPU, to
read and add PURR values of each CPU.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:27 +10:00
Nicholas Piggin 36d632ea83 powerpc/64: remove start_tb and accum_tb from thread_struct
These fields are only written to.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:26 +10:00
Nicholas Piggin 54071e4176 powerpc/64s: micro-optimise __hard_irq_enable() for mtmsrd L=1 support
Book3S minimum supported ISA version now requires mtmsrd L=1. This
instruction does not require bits other than RI and EE to be supplied,
so __hard_irq_enable() and __hard_irq_disable() does not have to read
the kernel_msr from paca.

Interrupt entry code already relies on L=1 support.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:26 +10:00
Nicholas Piggin 9f4b61b277 powerpc/pseries: put cede MSR[EE] check under IRQ_SOFT_MASK_DEBUG
This check does not catch IRQ soft mask bugs, but this option is
slightly more suitable than TRACE_IRQFLAGS.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:25 +10:00
Michael Ellerman 2135a6ec3e Merge branch 'topic/pkey' into next
This is a branch with a mixture of mm, x86 and powerpc commits all
relating to some minor cross-arch pkeys consolidation. The x86/mm
changes have been reviewed by Ingo & Dave Hansen and the tree has been
in linux-next for some weeks without issue.
2018-06-03 20:35:27 +10:00
Michael Ellerman f1079d3a3d Merge branch 'fixes' into next
We ended up with an ugly conflict between fixes and next in ftrace.h
involving multiple nested ifdefs, and the automatic resolution is
wrong. So merge fixes into next so we can fix it up.
2018-06-03 20:32:02 +10:00
Michael Ellerman 481c63acba Merge branch 'topic/ppc-kvm' into next
Merge in some commits we're sharing with the kvm-ppc tree.
2018-06-03 20:23:54 +10:00
Aneesh Kumar K.V 667416f385 powerpc/mm: Fix kernel crash on page table free
Fix the below crash on Book3E 64. pgtable_page_dtor expects struct
page *arg.

Also call the destructor on non book3s platforms correctly. This frees
up the split PTL locks correctly if we had allocated them before.

Call Trace:
  .kmem_cache_free+0x9c/0x44c (unreliable)
  .ptlock_free+0x1c/0x30
  .tlb_remove_table+0xdc/0x224
  .free_pgd_range+0x298/0x500
  .shift_arg_pages+0x10c/0x1e0
  .setup_arg_pages+0x200/0x25c
  .load_elf_binary+0x450/0x16c8
  .search_binary_handler.part.11+0x9c/0x248
  .do_execveat_common.isra.13+0x868/0xc18
  .run_init_process+0x34/0x4c
  .try_to_run_init_process+0x1c/0x68
  .kernel_init+0xdc/0x130
  .ret_from_kernel_thread+0x58/0x7c

Fixes: 702346768 ("powerpc/mm/nohash: Remove pte fragment dependency from nohash")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-02 01:48:11 +10:00
Simon Guo 7284ca8a5e KVM: PPC: Book3S PR: Support TAR handling for PR KVM HTM
Currently guest kernel doesn't handle TAR facility unavailable and it
always runs with TAR bit on. PR KVM will lazily enable TAR. TAR is not
a frequent-use register and it is not included in SVCPU struct.

Due to the above, the checkpointed TAR val might be a bogus TAR val.
To solve this issue, we will make vcpu->arch.fscr tar bit consistent
with shadow_fscr when TM is enabled.

At the end of emulating treclaim., the correct TAR val need to be loaded
into the register if FSCR_TAR bit is on.

At the beginning of emulating trechkpt., TAR needs to be flushed so that
the right tar val can be copied into tar_tm.

Tested with:
tools/testing/selftests/powerpc/tm/tm-tar
tools/testing/selftests/powerpc/ptrace/ptrace-tm-tar (remove DSCR/PPR
related testing).

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01 10:30:43 +10:00
Simon Guo e32c53d1cf KVM: PPC: Book3S PR: Add emulation for trechkpt.
This patch adds host emulation when guest PR KVM executes "trechkpt.",
which is a privileged instruction and will trap into host.

We firstly copy vcpu ongoing content into vcpu tm checkpoint
content, then perform kvmppc_restore_tm_pr() to do trechkpt.
with updated vcpu tm checkpoint values.

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01 10:30:23 +10:00
Simon Guo 5706340a33 KVM: PPC: Book3S PR: Always fail transactions in guest privileged state
Currently the kernel doesn't use transaction memory.
And there is an issue for privileged state in the guest that:
tbegin/tsuspend/tresume/tabort TM instructions can impact MSR TM bits
without trapping into the PR host. So following code will lead to a
false mfmsr result:
	tbegin	<- MSR bits update to Transaction active.
	beq 	<- failover handler branch
	mfmsr	<- still read MSR bits from magic page with
		transaction inactive.

It is not an issue for non-privileged guest state since its mfmsr is
not patched with magic page and will always trap into the PR host.

This patch will always fail tbegin attempt for privileged state in the
guest, so that the above issue is prevented. It is benign since
currently (guest) kernel doesn't initiate a transaction.

Test case:
https://github.com/justdoitqd/publicFiles/blob/master/test_tbegin_pr.c

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01 10:30:10 +10:00
Simon Guo 533082ae86 KVM: PPC: Book3S PR: Emulate mtspr/mfspr using active TM SPRs
The mfspr/mtspr on TM SPRs(TEXASR/TFIAR/TFHAR) are non-privileged
instructions and can be executed by PR KVM guest in problem state
without trapping into the host. We only emulate mtspr/mfspr
texasr/tfiar/tfhar in guest PR=0 state.

When we are emulating mtspr tm sprs in guest PR=0 state, the emulation
result needs to be visible to guest PR=1 state. That is, the actual TM
SPR val should be loaded into actual registers.

We already flush TM SPRs into vcpu when switching out of CPU, and load
TM SPRs when switching back.

This patch corrects mfspr()/mtspr() emulation for TM SPRs to make the
actual source/dest be the actual TM SPRs.

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01 10:30:05 +10:00
Simon Guo 8d2e2fc5e0 KVM: PPC: Book3S PR: Add transaction memory save/restore skeleton
The transaction memory checkpoint area save/restore behavior is
triggered when VCPU qemu process is switching out/into CPU, i.e.
at kvmppc_core_vcpu_put_pr() and kvmppc_core_vcpu_load_pr().

MSR TM active state is determined by TS bits:
    active: 10(transactional) or 01 (suspended)
    inactive: 00 (non-transactional)
We don't "fake" TM functionality for guest. We "sync" guest virtual
MSR TM active state(10 or 01) with shadow MSR. That is to say,
we don't emulate a transactional guest with a TM inactive MSR.

TM SPR support(TFIAR/TFAR/TEXASR) has already been supported by
commit 9916d57e64 ("KVM: PPC: Book3S PR: Expose TM registers").
Math register support (FPR/VMX/VSX) will be done at subsequent
patch.

Whether TM context need to be saved/restored can be determined
by kvmppc_get_msr() TM active state:
	* TM active - save/restore TM context
	* TM inactive - no need to do so and only save/restore
TM SPRs.

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Suggested-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01 10:29:55 +10:00
Simon Guo caa3be92be KVM: PPC: Book3S PR: Add C function wrapper for _kvmppc_save/restore_tm()
Currently __kvmppc_save/restore_tm() APIs can only be invoked from
assembly function. This patch adds C function wrappers for them so
that they can be safely called from C function.

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01 10:29:17 +10:00
Paul Mackerras 43b812d9ab Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-next
This merges in the ppc-kvm topic branch of the powerpc repository
to get some changes on which future patches will depend, in particular
some new exports and TEXASR bit definitions.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-31 09:27:10 +10:00
Josh Poimboeuf 8ce621e1d9 powerpc/modules: remove unused mod_arch_specific.toc field
The toc field in the mod_arch_specific struct isn't actually used
anywhere, so remove it.

Also the ftrace-specific fields are now common between 32-bit and
64-bit, so simplify the struct definition a bit by moving them out of
the __powerpc64__ #ifdef.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-28 18:46:34 +10:00
Linus Torvalds ec30dcf7f4 KVM fixes for v4.17-rc7
PPC:
  - Close a hole which could possibly lead to the host timebase getting
    out of sync.
 
  - Three fixes relating to PTEs and TLB entries for radix guests.
 
  - Fix a bug which could lead to an interrupt never getting delivered
    to the guest, if it is pending for a guest vCPU when the vCPU gets
    offlined.
 
 s390:
  - Fix false negatives in VSIE validity check (Cc stable)
 
 x86:
  - Fix time drift of VMX preemption timer when a guest uses LAPIC timer
    in periodic mode (Cc stable)
 
  - Unconditionally expose CPUID.IA32_ARCH_CAPABILITIES to allow
    migration from hosts that don't need retpoline mitigation (Cc stable)
 
  - Fix guest crashes on reboot by properly coupling CR4.OSXSAVE and
    CPUID.OSXSAVE (Cc stable)
 
  - Report correct RIP after Hyper-V hypercall #UD (introduced in -rc6)
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCAAGBQJbCXxHAAoJEED/6hsPKofon5oIAKTwpbpBi0UKIyYcHQ2pwIoP
 +qITTZUGGhEaIfe+aDkzE4vxVIA2ywYCbaC2+OSy4gNVThnytRL8WuhLyV8WLmlC
 sDVSQ87RWaN8mW6hEJ95qXMS7FS0TsDJdytaw+c8OpODrsykw1XMSyV2rMLb0sMT
 SmfioO2kuDx5JQGyiAPKFFXKHjAnnkH+OtffNemAEHGoPpenJ4qLRuXvrjQU8XT6
 tVARIBZsutee5ITIsBKVDmI2n98mUoIe9na21M7N2QaJ98IF+qRz5CxZyL1CgvFk
 tHqG8PZ/bqhnmuIIR5Di919UmhamOC3MODsKUVeciBLDS6LHlhado+HEpj6B8mI=
 =ygB7
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Radim Krčmář:
 "PPC:

   - Close a hole which could possibly lead to the host timebase getting
     out of sync.

   - Three fixes relating to PTEs and TLB entries for radix guests.

   - Fix a bug which could lead to an interrupt never getting delivered
     to the guest, if it is pending for a guest vCPU when the vCPU gets
     offlined.

  s390:

   - Fix false negatives in VSIE validity check (Cc stable)

  x86:

   - Fix time drift of VMX preemption timer when a guest uses LAPIC
     timer in periodic mode (Cc stable)

   - Unconditionally expose CPUID.IA32_ARCH_CAPABILITIES to allow
     migration from hosts that don't need retpoline mitigation (Cc
     stable)

   - Fix guest crashes on reboot by properly coupling CR4.OSXSAVE and
     CPUID.OSXSAVE (Cc stable)

   - Report correct RIP after Hyper-V hypercall #UD (introduced in
     -rc6)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: fix #UD address of failed Hyper-V hypercalls
  kvm: x86: IA32_ARCH_CAPABILITIES is always supported
  KVM: x86: Update cpuid properly when CR4.OSXAVE or CR4.PKE is changed
  x86/kvm: fix LAPIC timer drift when guest uses periodic mode
  KVM: s390: vsie: fix < 8k check for the itdba
  KVM: PPC: Book 3S HV: Do ptesync in radix guest exit path
  KVM: PPC: Book3S HV: XIVE: Resend re-routed interrupts on CPU priority change
  KVM: PPC: Book3S HV: Make radix clear pte when unmapping
  KVM: PPC: Book3S HV: Make radix use correct tlbie sequence in kvmppc_radix_tlbie_page
  KVM: PPC: Book3S HV: Snapshot timebase offset on guest entry
2018-05-26 10:46:57 -07:00
Mathieu Malaterre 3fc5ee9b28 powerpc: Add missing prototype
Add one missing prototype for function rh_dump_blk. Fix warning treated as
error in W=1:

  arch/powerpc/lib/rheap.c:740:6: error: no previous prototype for ‘rh_dump_blk’ [-Werror=missing-prototypes]

Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-25 12:04:43 +10:00
Mathieu Malaterre c3f0515ea1 powerpc/52xx: Add missing functions prototypes
The function prototypes were declared within a `#ifdef CONFIG_PPC_LITE5200`
block which would prevent them from being visible when compiling
`mpc52xx_pm.c`. Move the prototypes outside of the `#ifdef` block to fix
the following warnings treated as errors with W=1:

  arch/powerpc/platforms/52xx/mpc52xx_pm.c:58:5: error: no previous prototype for ‘mpc52xx_pm_prepare’ [-Werror=missing-prototypes]
  arch/powerpc/platforms/52xx/mpc52xx_pm.c:113:5: error: no previous prototype for ‘mpc52xx_pm_enter’ [-Werror=missing-prototypes]
  arch/powerpc/platforms/52xx/mpc52xx_pm.c:181:6: error: no previous prototype for ‘mpc52xx_pm_finish’ [-Werror=missing-prototypes]

Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-25 12:04:42 +10:00
Mathieu Malaterre f91d599899 powerpc/powermac: Move pmac_pfunc_base_install prototype to header file
The pmac_pfunc_base_install prototype was declared in powermac/smp.c since
function was used there, move it to pmac_pfunc.h header to be visible in
pfunc_base.c. Fix a warning treated as error with W=1:

  arch/powerpc/platforms/powermac/pfunc_base.c:330:12: error: no previous prototype for ‘pmac_pfunc_base_install’ [-Werror=missing-prototypes]

Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-25 12:04:41 +10:00
Mathieu Malaterre d8731527ac powerpc/sparse: Fix plain integer as NULL pointer warning
Trivial fix to remove the following sparse warnings:

  arch/powerpc/kernel/module_32.c:112:74: warning: Using plain integer as NULL pointer
  arch/powerpc/kernel/module_32.c:117:74: warning: Using plain integer as NULL pointer
  drivers/macintosh/via-pmu.c:1155:28: warning: Using plain integer as NULL pointer
  drivers/macintosh/via-pmu.c:1230:20: warning: Using plain integer as NULL pointer
  drivers/macintosh/via-pmu.c:1385:36: warning: Using plain integer as NULL pointer
  drivers/macintosh/via-pmu.c:1752:23: warning: Using plain integer as NULL pointer
  drivers/macintosh/via-pmu.c:2084:19: warning: Using plain integer as NULL pointer
  drivers/macintosh/via-pmu.c:2110:32: warning: Using plain integer as NULL pointer
  drivers/macintosh/via-pmu.c:2167:19: warning: Using plain integer as NULL pointer
  drivers/macintosh/via-pmu.c:2183:19: warning: Using plain integer as NULL pointer
  drivers/macintosh/via-pmu.c:277:20: warning: Using plain integer as NULL pointer
  arch/powerpc/platforms/powermac/setup.c:155:67: warning: Using plain integer as NULL pointer
  arch/powerpc/platforms/powermac/setup.c:247:27: warning: Using plain integer as NULL pointer
  arch/powerpc/platforms/powermac/setup.c:249:27: warning: Using plain integer as NULL pointer
  arch/powerpc/platforms/powermac/setup.c:252:37: warning: Using plain integer as NULL pointer
  arch/powerpc/mm/tlb_hash32.c:127:21: warning: Using plain integer as NULL pointer
  arch/powerpc/mm/tlb_hash32.c:148:21: warning: Using plain integer as NULL pointer
  arch/powerpc/mm/tlb_hash32.c:44:21: warning: Using plain integer as NULL pointer
  arch/powerpc/mm/tlb_hash32.c:57:21: warning: Using plain integer as NULL pointer
  arch/powerpc/mm/tlb_hash32.c:87:21: warning: Using plain integer as NULL pointer
  arch/powerpc/kernel/btext.c:160:31: warning: Using plain integer as NULL pointer
  arch/powerpc/kernel/btext.c:167:22: warning: Using plain integer as NULL pointer
  arch/powerpc/kernel/btext.c:274:21: warning: Using plain integer as NULL pointer
  arch/powerpc/kernel/btext.c:285:31: warning: Using plain integer as NULL pointer
  arch/powerpc/include/asm/hugetlb.h:204:16: warning: Using plain integer as NULL pointer
  arch/powerpc/mm/ppc_mmu_32.c:170:21: warning: Using plain integer as NULL pointer
  arch/powerpc/platforms/powermac/pci.c:1227:23: warning: Using plain integer as NULL pointer
  arch/powerpc/platforms/powermac/pci.c:65:24: warning: Using plain integer as NULL pointer

Also use `--fix` command line option from `script/checkpatch --strict` to
remove the following:

  CHECK: Comparison to NULL could be written "!dispDeviceBase"
  #72: FILE: arch/powerpc/kernel/btext.c:160:
  +	if (dispDeviceBase == NULL)

  CHECK: Comparison to NULL could be written "!vbase"
  #80: FILE: arch/powerpc/kernel/btext.c:167:
  +	if (vbase == NULL)

  CHECK: Comparison to NULL could be written "!base"
  #89: FILE: arch/powerpc/kernel/btext.c:274:
  +	if (base == NULL)

  CHECK: Comparison to NULL could be written "!dispDeviceBase"
  #98: FILE: arch/powerpc/kernel/btext.c:285:
  +	if (dispDeviceBase == NULL)

  CHECK: Comparison to NULL could be written "strstr"
  #117: FILE: arch/powerpc/kernel/module_32.c:117:
  +		if (strstr(secstrings + sechdrs[i].sh_name, ".debug") != NULL)

  CHECK: Comparison to NULL could be written "!Hash"
  #130: FILE: arch/powerpc/mm/ppc_mmu_32.c:170:
  +	if (Hash == NULL)

  CHECK: Comparison to NULL could be written "Hash"
  #143: FILE: arch/powerpc/mm/tlb_hash32.c:44:
  +	if (Hash != NULL) {

  CHECK: Comparison to NULL could be written "!Hash"
  #152: FILE: arch/powerpc/mm/tlb_hash32.c:57:
  +	if (Hash == NULL) {

  CHECK: Comparison to NULL could be written "!Hash"
  #161: FILE: arch/powerpc/mm/tlb_hash32.c:87:
  +	if (Hash == NULL) {

  CHECK: Comparison to NULL could be written "!Hash"
  #170: FILE: arch/powerpc/mm/tlb_hash32.c:127:
  +	if (Hash == NULL) {

  CHECK: Comparison to NULL could be written "!Hash"
  #179: FILE: arch/powerpc/mm/tlb_hash32.c:148:
  +	if (Hash == NULL) {

  ERROR: space required after that ';' (ctx:VxV)
  #192: FILE: arch/powerpc/platforms/powermac/pci.c:65:
  +	for (; node != NULL;node = node->sibling) {

  CHECK: Comparison to NULL could be written "node"
  #192: FILE: arch/powerpc/platforms/powermac/pci.c:65:
  +	for (; node != NULL;node = node->sibling) {

  CHECK: Comparison to NULL could be written "!region"
  #201: FILE: arch/powerpc/platforms/powermac/pci.c:1227:
  +	if (region == NULL)

  CHECK: Comparison to NULL could be written "of_get_property"
  #214: FILE: arch/powerpc/platforms/powermac/setup.c:155:
  +		if (of_get_property(np, "cache-unified", NULL) != NULL && dc) {

  CHECK: Comparison to NULL could be written "!np"
  #223: FILE: arch/powerpc/platforms/powermac/setup.c:247:
  +		if (np == NULL)

  CHECK: Comparison to NULL could be written "np"
  #226: FILE: arch/powerpc/platforms/powermac/setup.c:249:
  +		if (np != NULL) {

  CHECK: Comparison to NULL could be written "l2cr"
  #230: FILE: arch/powerpc/platforms/powermac/setup.c:252:
  +			if (l2cr != NULL) {

  CHECK: Comparison to NULL could be written "via"
  #243: FILE: drivers/macintosh/via-pmu.c:277:
  +	if (via != NULL)

  CHECK: Comparison to NULL could be written "current_req"
  #252: FILE: drivers/macintosh/via-pmu.c:1155:
  +	if (current_req != NULL) {

  CHECK: Comparison to NULL could be written "!req"
  #261: FILE: drivers/macintosh/via-pmu.c:1230:
  +	if (req == NULL || pmu_state != idle

  CHECK: Comparison to NULL could be written "!req"
  #270: FILE: drivers/macintosh/via-pmu.c:1385:
  +			if (req == NULL) {

  CHECK: Comparison to NULL could be written "!pp"
  #288: FILE: drivers/macintosh/via-pmu.c:2084:
  +	if (pp == NULL)

  CHECK: Comparison to NULL could be written "!pp"
  #297: FILE: drivers/macintosh/via-pmu.c:2110:
  +	if (count < 1 || pp == NULL)

  CHECK: Comparison to NULL could be written "!pp"
  #306: FILE: drivers/macintosh/via-pmu.c:2167:
  +	if (pp == NULL)

  CHECK: Comparison to NULL could be written "pp"
  #315: FILE: drivers/macintosh/via-pmu.c:2183:
  +	if (pp != NULL) {

Link: https://github.com/linuxppc/linux/issues/37
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-25 12:04:38 +10:00
Mathieu Malaterre 7cf76a68f1 powerpc/altivec: Add missing prototypes for altivec
Some functions prototypes were missing for the non-altivec code. Add the
missing prototypes in a new header file, fix warnings treated as errors
with W=1:

  arch/powerpc/lib/xor_vmx_glue.c:18:6: error: no previous prototype for ‘xor_altivec_2’ [-Werror=missing-prototypes]
  arch/powerpc/lib/xor_vmx_glue.c:29:6: error: no previous prototype for ‘xor_altivec_3’ [-Werror=missing-prototypes]
  arch/powerpc/lib/xor_vmx_glue.c:40:6: error: no previous prototype for ‘xor_altivec_4’ [-Werror=missing-prototypes]
  arch/powerpc/lib/xor_vmx_glue.c:52:6: error: no previous prototype for ‘xor_altivec_5’ [-Werror=missing-prototypes]

The prototypes were already present in <asm/xor.h> but this header file is
meant to be included after <include/linux/raid/xor.h>. Trying to re-use
<asm/xor.h> directly would lead to warnings such as:

  arch/powerpc/include/asm/xor.h:39:15: error: variable ‘xor_block_altivec’ has initializer but incomplete type

Trying to re-use <asm/xor.h> after <include/linux/raid/xor.h> in
xor_vmx_glue.c would in turn trigger the following warnings:

  include/asm-generic/xor.h:688:34: error: ‘xor_block_32regs’ defined but not used [-Werror=unused-variable]

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-25 12:04:38 +10:00
Mathieu Malaterre e70d8f5526 powerpc/xmon: Add __printf annotation to xmon_printf()
This allows the compiler to verify the format strings vs the types of
the arguments.

Update the other prototype declarations in asm/xmon.h.

Silence warnings (triggered at W=1) by adding relevant __printf
attribute. Move #define at bottom of the file to prevent conflict with
gcc attribute.

Solves the original warning:

  arch/powerpc/xmon/nonstdio.c:178:2: error: function might be
  possible candidate for ‘gnu_printf’ format attribute

In turn this uncovered many formatting errors in xmon.c, all fixed in
this patch.

Signed-off-by: Mathieu Malaterre <malat@debian.org>
[mpe: Always use px not p, fixup the 44x specific code, tweak change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-25 12:04:36 +10:00
Christophe Leroy 8a0b1120cb powerpc/mm: Use instruction symbolic names in store_updates_sp()
Use symbolic names defined in asm/ppc-opcode.h
instead of hardcoded values.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-25 00:08:24 +10:00
Simon Guo eacbb218fb powerpc: Export tm_enable()/tm_disable/tm_abort() APIs
This patch exports tm_enable()/tm_disable/tm_abort() APIs, which
will be used for PR KVM transactional memory logic.

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Reviewed-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-24 16:04:02 +10:00
Simon Guo ab3759b573 powerpc/reg: Add TEXASR related macros
This patches add some macros for CR0/TEXASR bits so that PR KVM TM
logic (tbegin./treclaim./tabort.) can make use of them later.

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Reviewed-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-24 16:03:36 +10:00
Simon Guo acc9eb9305 KVM: PPC: Reimplement LOAD_VMX/STORE_VMX instruction mmio emulation with analyse_instr() input
This patch reimplements LOAD_VMX/STORE_VMX MMIO emulation with
analyse_instr() input. When emulating the store, the VMX reg will need to
be flushed so that the right reg val can be retrieved before writing to
IO MEM.

This patch also adds support for lvebx/lvehx/lvewx/stvebx/stvehx/stvewx
MMIO emulation. To meet the requirement of handling different element
sizes, kvmppc_handle_load128_by2x64()/kvmppc_handle_store128_by2x64()
were replaced with kvmppc_handle_vmx_load()/kvmppc_handle_vmx_store().

The framework used is similar to VSX instruction MMIO emulation.

Suggested-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-22 19:53:00 +10:00
Simon Guo da2a32b876 KVM: PPC: Expand mmio_vsx_copy_type to cover VMX load/store element types
VSX MMIO emulation uses mmio_vsx_copy_type to represent VSX emulated
element size/type, such as KVMPPC_VSX_COPY_DWORD_LOAD, etc. This
patch expands mmio_vsx_copy_type to cover VMX copy type, such as
KVMPPC_VMX_COPY_BYTE(stvebx/lvebx), etc. As a result,
mmio_vsx_copy_type is also renamed to mmio_copy_type.

It is a preparation for reimplementing VMX MMIO emulation.

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-22 19:52:55 +10:00
Simon Guo 2e6baa46b4 KVM: PPC: Add giveup_ext() hook to PPC KVM ops
Currently HV will save math regs(FP/VEC/VSX) when trap into host. But
PR KVM will only save math regs when qemu task switch out of CPU, or
when returning from qemu code.

To emulate FP/VEC/VSX mmio load, PR KVM need to make sure that math
regs were flushed firstly and then be able to update saved VCPU
FPR/VEC/VSX area reasonably.

This patch adds giveup_ext() field to KVM ops. Only PR KVM has non-NULL
giveup_ext() ops. kvmppc_complete_mmio_load() can invoke that hook
(when not NULL) to flush math regs accordingly, before updating saved
register vals.

Math regs flush is also necessary for STORE, which will be covered
in later patch within this patch series.

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-22 19:51:13 +10:00
Simon Guo 7092360399 KVM: PPC: Reimplement non-SIMD LOAD/STORE instruction mmio emulation with analyse_instr() input
This patch reimplements non-SIMD LOAD/STORE instruction MMIO emulation
with analyse_instr() input. It utilizes the BYTEREV/UPDATE/SIGNEXT
properties exported by analyse_instr() and invokes
kvmppc_handle_load(s)/kvmppc_handle_store() accordingly.

It also moves CACHEOP type handling into the skeleton.

instruction_type within kvm_ppc.h is renamed to avoid conflict with
sstep.h.

Suggested-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-22 19:51:08 +10:00
Simon Guo 94dd7fa1c0 KVM: PPC: Add KVMPPC_VSX_COPY_WORD_LOAD_DUMP type support for mmio emulation
Some VSX instructions like lxvwsx will splat word into VSR. This patch
adds a new VSX copy type KVMPPC_VSX_COPY_WORD_LOAD_DUMP to support this.

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Reviewed-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-22 19:51:03 +10:00
Nicholas Piggin a048a07d7f powerpc/64s: Add support for a store forwarding barrier at kernel entry/exit
On some CPUs we can prevent a vulnerability related to store-to-load
forwarding by preventing store forwarding between privilege domains,
by inserting a barrier in kernel entry and exit paths.

This is known to be the case on at least Power7, Power8 and Power9
powerpc CPUs.

Barriers must be inserted generally before the first load after moving
to a higher privilege, and after the last store before moving to a
lower privilege, HV and PR privilege transitions must be protected.

Barriers are added as patch sections, with all kernel/hypervisor entry
points patched, and the exit points to lower privilge levels patched
similarly to the RFI flush patching.

Firmware advertisement is not implemented yet, so CPU flush types
are hard coded.

Thanks to Michal Suchánek for bug fixes and review.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michal Suchánek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-21 20:45:31 -07:00
Shilpasri G Bhat 5cdcb01e0a powernv: opal-sensor: Add support to read 64bit sensor values
This patch adds support to read 64-bit sensor values. This method is
used to read energy sensors and counters which are of type u64.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-21 14:48:02 +10:00
Thomas Gleixner b563ea676a Merge branch 'linus' into timers/2038
Merge upstream to pick up changes on which pending patches depend on.
2018-05-19 13:55:40 +02:00
Michael Ellerman 8056fe28d0 powerpc/io: Add __raw_writeq_be() __raw_rm_writeq_be()
Add byte-swapping versions of __raw_writeq() and __raw_rm_writeq().

This allows us to avoid sparse warnings caused by passing __be64 to
__raw_writeq(), which takes unsigned long:

  arch/powerpc/platforms/powernv/pci-ioda.c:1981:38:
  warning: incorrect type in argument 1 (different base types)
      expected unsigned long [unsigned] v
      got restricted __be64 [usertype] <noident>

It's also generally preferable to use a byte-swapping accessor rather
than doing it by hand in the code, which is more bug prone.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
2018-05-18 21:59:56 +10:00
Nicholas Piggin b755745147 KVM: PPC: Book3S HV: Lockless tlbie for HPT hcalls
tlbies to an LPAR do not have to be serialised since POWER4/PPC970,
after which the MMU_FTR_LOCKLESS_TLBIE feature was introduced to
avoid tlbie locking.

Since commit c17b98cf60 ("KVM: PPC: Book3S HV: Remove code for
PPC970 processors"), KVM no longer supports processors that do not
have this feature, so the tlbie locking can be removed completely.
A sanity check for the feature is put in kvmppc_mmu_hv_init.

Testing was done on a POWER9 system in HPT mode, with a -smp 32 guest
in HPT mode. 32 instances of the powerpc fork benchmark from selftests
were run with --fork, and the results measured.

Without this patch, total throughput was about 13.5K/sec, and this is
the top of the host profile:

   74.52%  [k] do_tlbies
    2.95%  [k] kvmppc_book3s_hv_page_fault
    1.80%  [k] calc_checksum
    1.80%  [k] kvmppc_vcpu_run_hv
    1.49%  [k] kvmppc_run_core

After this patch, throughput was about 51K/sec, with this profile:

   21.28%  [k] do_tlbies
    5.26%  [k] kvmppc_run_core
    4.88%  [k] kvmppc_book3s_hv_page_fault
    3.30%  [k] _raw_spin_lock_irqsave
    3.25%  [k] gup_pgd_range

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-18 15:38:23 +10:00
Simon Guo 173c520a04 KVM: PPC: Move nip/ctr/lr/xer registers to pt_regs in kvm_vcpu_arch
This patch moves nip/ctr/lr/xer registers from scattered places in
kvm_vcpu_arch to pt_regs structure.

cr register is "unsigned long" in pt_regs and u32 in vcpu->arch.
It will need more consideration and may move in later patches.

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-18 15:38:23 +10:00
Simon Guo 1143a70665 KVM: PPC: Add pt_regs into kvm_vcpu_arch and move vcpu->arch.gpr[] into it
Current regs are scattered at kvm_vcpu_arch structure and it will
be more neat to organize them into pt_regs structure.

Also it will enable reimplementation of MMIO emulation code with
analyse_instr() later.

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-18 15:38:23 +10:00
Paul Mackerras 9c9e9cf40a Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-next
This merges in the ppc-kvm topic branch of the powerpc repository
to get some changes on which future patches will depend, in particular
the definitions of various new TLB flushing functions.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-18 15:30:10 +10:00
Christophe Leroy a1f3ae3fe8 powerpc/32: Use stmw/lmw for registers save/restore in asm
arch/powerpc/Makefile activates -mmultiple on BE PPC32 configs
in order to use multiple word instructions in functions entry/exit.

The patch does the same for the asm parts, for consistency.

On processors like the 8xx on which insn fetching is pretty slow,
this speeds up registers save/restore.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: PPC32 is BE only, so drop the endian checks]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-18 00:09:06 +10:00
Christophe Leroy 96f391cf40 Revert "powerpc/64: Fix checksum folding in csum_add()"
This reverts commit 6ad966d730.

That commit was pointless, because csum_add() sums two 32 bits
values, so the sum is 0x1fffffffe at the maximum.
And then when adding upper part (1) and lower part (0xfffffffe),
the result is 0xffffffff which doesn't carry.
Any lower value will not carry either.

And behind the fact that this commit is useless, it also kills the
whole purpose of having an arch specific inline csum_add()
because the resulting code gets even worse than what is obtained
with the generic implementation of csum_add()

0000000000000240 <.csum_add>:
 240:	38 00 ff ff 	li      r0,-1
 244:	7c 84 1a 14 	add     r4,r4,r3
 248:	78 00 00 20 	clrldi  r0,r0,32
 24c:	78 89 00 22 	rldicl  r9,r4,32,32
 250:	7c 80 00 38 	and     r0,r4,r0
 254:	7c 09 02 14 	add     r0,r9,r0
 258:	78 09 00 22 	rldicl  r9,r0,32,32
 25c:	7c 00 4a 14 	add     r0,r0,r9
 260:	78 03 00 20 	clrldi  r3,r0,32
 264:	4e 80 00 20 	blr

In comparison, the generic implementation of csum_add() gives:

0000000000000290 <.csum_add>:
 290:	7c 63 22 14 	add     r3,r3,r4
 294:	7f 83 20 40 	cmplw   cr7,r3,r4
 298:	7c 10 10 26 	mfocrf  r0,1
 29c:	54 00 ef fe 	rlwinm  r0,r0,29,31,31
 2a0:	7c 60 1a 14 	add     r3,r0,r3
 2a4:	78 63 00 20 	clrldi  r3,r3,32
 2a8:	4e 80 00 20 	blr

And the reverted implementation for PPC64 gives:

0000000000000240 <.csum_add>:
 240:	7c 84 1a 14 	add     r4,r4,r3
 244:	78 80 00 22 	rldicl  r0,r4,32,32
 248:	7c 80 22 14 	add     r4,r0,r4
 24c:	78 83 00 20 	clrldi  r3,r4,32
 250:	4e 80 00 20 	blr

Fixes: 6ad966d730 ("powerpc/64: Fix checksum folding in csum_add()")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-18 00:09:04 +10:00
Christophe Leroy 5279821a6f powerpc: get rid of PMD_PAGE_SIZE() and _PMD_SIZE
PMD_PAGE_SIZE() is nowhere used and _PMD_SIZE is only
used by PMD_PAGE_SIZE().

This patch removes them.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-18 00:09:03 +10:00
Paul Mackerras 7aa15842c1 KVM: PPC: Book3S HV: Set RWMR on POWER8 so PURR/SPURR count correctly
Although Linux doesn't use PURR and SPURR ((Scaled) Processor
Utilization of Resources Register), other OSes depend on them.
On POWER8 they count at a rate depending on whether the VCPU is
idle or running, the activity of the VCPU, and the value in the
RWMR (Region-Weighting Mode Register).  Hardware expects the
hypervisor to update the RWMR when a core is dispatched to reflect
the number of online VCPUs in the vcore.

This adds code to maintain a count in the vcore struct indicating
how many VCPUs are online.  In kvmppc_run_core we use that count
to set the RWMR register on POWER8.  If the core is split because
of a static or dynamic micro-threading mode, we use the value for
8 threads.  The RWMR value is not relevant when the host is
executing because Linux does not use the PURR or SPURR register,
so we don't bother saving and restoring the host value.

For the sake of old userspace which does not set the KVM_REG_PPC_ONLINE
register, we set online to 1 if it was 0 at the time of a KVM_RUN
ioctl.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-17 16:36:48 +10:00
Paul Mackerras a1f158262a KVM: PPC: Book3S HV: Add 'online' register to ONE_REG interface
This adds a new KVM_REG_PPC_ONLINE register which userspace can set
to 0 or 1 via the GET/SET_ONE_REG interface to indicate whether it
considers the VCPU to be offline (0), that is, not currently running,
or online (1).  This will be used in a later patch to configure the
register which controls PURR and SPURR accumulation.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-17 16:36:36 +10:00
Paul Mackerras 57b8daa70a KVM: PPC: Book3S HV: Snapshot timebase offset on guest entry
Currently, the HV KVM guest entry/exit code adds the timebase offset
from the vcore struct to the timebase on guest entry, and subtracts
it on guest exit.  Which is fine, except that it is possible for
userspace to change the offset using the SET_ONE_REG interface while
the vcore is running, as there is only one timebase offset per vcore
but potentially multiple VCPUs in the vcore.  If that were to happen,
KVM would subtract a different offset on guest exit from that which
it had added on guest entry, leading to the timebase being out of sync
between cores in the host, which then leads to bad things happening
such as hangs and spurious watchdog timeouts.

To fix this, we add a new field 'tb_offset_applied' to the vcore struct
which stores the offset that is currently applied to the timebase.
This value is set from the vcore tb_offset field on guest entry, and
is what is subtracted from the timebase on guest exit.  Since it is
zero when the timebase offset is not applied, we can simplify the
logic in kvmhv_start_timing and kvmhv_accumulate_time.

In addition, we had secondary threads reading the timebase while
running concurrently with code on the primary thread which would
eventually add or subtract the timebase offset from the timebase.
This occurred while saving or restoring the DEC register value on
the secondary threads.  Although no specific incorrect behaviour has
been observed, this is a race which should be fixed.  To fix it, we
move the DEC saving code to just before we call kvmhv_commence_exit,
and the DEC restoring code to after the point where we have waited
for the primary thread to switch the MMU context and add the timebase
offset.  That way we are sure that the timebase contains the guest
timebase value in both cases.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-17 15:16:45 +10:00
Nicholas Piggin 0078778a86 powerpc/mm/radix: implement LPID based TLB flushes to be used by KVM
Implement a local TLB flush for invalidating an LPID with variants for
process or partition scope. And a global TLB flush for invalidating
a partition scoped page of an LPID.

These will be used by KVM in subsequent patches.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-17 11:13:29 +10:00
Aneesh Kumar K.V 738f964555 powerpc/mm: Use page fragments for allocation page table at PMD level
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-15 22:29:12 +10:00
Aneesh Kumar K.V 8a6c697b99 powerpc/mm: Implement helpers for pagetable fragment support at PMD level
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-15 22:29:12 +10:00
Aneesh Kumar K.V 0c4d268029 powerpc/book3s64/mm: Simplify the rcu callback for page table free
Instead of encoding shift in the table address, use an enumerated index value.
This allow us to do different things in the callback for pte and pmd.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-15 22:29:11 +10:00
Aneesh Kumar K.V 1c7ec8a40a powerpc/mm/book3s64/4k: Switch 4k pagesize config to use pagetable fragment
4K config use one full page at level 4 of the pagetable. Add support for single
fragment allocation in pagetable fragment code and and use that for 4K config.
This makes both 4k and 64k use the same code path. Later we will switch pmd to
use the page table fragment code. This is done only for 64bit platforms which
is using page table fragment support.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-15 22:29:11 +10:00
Aneesh Kumar K.V 702346768c powerpc/mm/nohash: Remove pte fragment dependency from nohash
Now that we have removed 64K page size support, the RCU page table free can
be much simpler for nohash. Make a copy of the the rcu callback to pgalloc.h
header similar to nohash 32. We could possibly merge 32 and 64 bit there. But
that is for a later patch

We also move the book3s specific handler to pgtable_book3s64.c. This will be
updated in a later patch to handle split pmd ptlock.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-15 22:29:11 +10:00
Aneesh Kumar K.V 7820856a4f powerpc/mm/book3e/64: Remove unsupported 64Kpage size from 64bit booke
We have in Kconfig

config PPC_64K_PAGES
	bool "64k page size"
	depends on !PPC_FSL_BOOK3E && (44x || PPC_BOOK3S_64 || PPC_BOOK3E_64)
	select HAVE_ARCH_SOFT_DIRTY if PPC_BOOK3S_64

Only supported BOOK3E 64 bit platforms is FSL_BOOK3E. Remove the dead 64k page
support code from 64bit nohash.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-15 22:29:10 +10:00
Bryant G. Ly 0eca353e7a misc: IBM Virtual Management Channel Driver (VMC)
This driver is a logical device which provides an
interface between the hypervisor and a management
partition. This interface is like a message
passing interface. This management partition
is intended to provide an alternative to HMC-based
system management.

VMC enables the Management LPAR to provide basic
logical partition functions:
- Logical Partition Configuration
- Boot, start, and stop actions for individual
  partitions
- Display of partition status
- Management of virtual Ethernet
- Management of virtual Storage
- Basic system management

This driver is to be used for the POWER Virtual
Management Channel Virtual Adapter on the PowerPC
platform. It provides a character device which
allows for both request/response and async message
support through the /dev/ibmvmc node.

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Reviewed-by: Steven Royer <seroyer@linux.vnet.ibm.com>
Reviewed-by: Adam Reznechek <adreznec@linux.vnet.ibm.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Taylor Jakobson <tjakobs@us.ibm.com>
Tested-by: Brad Warrum <bwarrum@us.ibm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-14 16:35:42 +02:00
Frederic Weisbecker 1321a5de1e softirq/powerpc: Switch to generic local_softirq_pending() implementation
Remove the ad-hoc implementation, the generic code now allows us not to
reinvent the wheel.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/1525786706-22846-9-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-14 11:25:28 +02:00
Al Viro 4c1481ae60 powerpc/syscalls: timer_create can be handle by perfectly normal COMPAT_SYS_SPU
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-10 23:25:16 +10:00
Al Viro 28b9c34aa6 powerpc/syscalls: kill ppc32_select()
it had always been pointless - compat_sys_select() sign-extends
the first argument just fine on its own.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[mpe: Use COMPAT_SPU_NEW() to keep systbl_chk.sh happy]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-10 23:25:15 +10:00
Al Viro 4c392e6591 powerpc/syscalls: switch rtas(2) to SYSCALL_DEFINE
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[mpe: Update sys_ni.c for s/ppc_rtas/sys_rtas/]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-10 23:25:14 +10:00
Al Viro f3675644e1 powerpc/syscalls: signal_{32, 64} - switch to SYSCALL_DEFINE
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[mpe: Fix sys_debug_setcontext() prototype to return long]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-10 23:25:13 +10:00
Michael Ellerman 8f2133cc0e powerpc/pseries: hcall_exit tracepoint retval should be signed
The hcall_exit() tracepoint has retval defined as unsigned long. That
leads to humours results like:

  bash-3686  [009] d..2   854.134094: hcall_entry: opcode=24
  bash-3686  [009] d..2   854.134095: hcall_exit: opcode=24 retval=18446744073709551609

It's normal for some hcalls to return negative values, displaying them
as unsigned isn't very helpful. So change it to signed.

  bash-3711  [001] d..2   471.691008: hcall_entry: opcode=24
  bash-3711  [001] d..2   471.691008: hcall_exit: opcode=24 retval=-7

Which can be more easily compared to H_NOT_FOUND in hvcall.h

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
2018-05-10 23:17:43 +10:00
Michael Ellerman d86a4ba02f powerpc/pkeys: Drop private VM_PKEY definitions
Now that we've updated the generic headers to support 5 PKEY bits for
powerpc we don't need our own #defines in arch code.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-10 19:46:10 +10:00
Michael Ellerman dbec10e58d mm/pkeys, powerpc, x86: Provide an empty vma_pkey() in linux/pkeys.h
Consolidate the pkey handling by providing a common empty definition
of vma_pkey() in pkeys.h when CONFIG_ARCH_HAS_PKEYS=n.

This also removes another entanglement of pkeys.h and
asm/mmu_context.h.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Ram Pai <linuxram@us.ibm.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
2018-05-09 11:50:41 +10:00
Ram Pai 5212213aa5 mm, powerpc, x86: define VM_PKEY_BITx bits if CONFIG_ARCH_HAS_PKEYS is enabled
VM_PKEY_BITx are defined only if CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
is enabled. Powerpc also needs these bits. Hence lets define the
VM_PKEY_BITx bits for any architecture that enables
CONFIG_ARCH_HAS_PKEYS.

Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-09 00:35:33 +10:00
Michael Ellerman 6c0a8f6b5a powerpc/pseries: Fix CONFIG_NUMA=n build
The build is failing with CONFIG_NUMA=n and some compiler versions:

  arch/powerpc/platforms/pseries/hotplug-cpu.o: In function `dlpar_online_cpu':
  hotplug-cpu.c:(.text+0x12c): undefined reference to `timed_topology_update'
  arch/powerpc/platforms/pseries/hotplug-cpu.o: In function `dlpar_cpu_remove':
  hotplug-cpu.c:(.text+0x400): undefined reference to `timed_topology_update'

Fix it by moving the empty version of timed_topology_update() into the
existing #ifdef block, which has the right guard of SPLPAR && NUMA.

Fixes: cee5405da4 ("powerpc/hotplug: Improve responsiveness of hotplug change")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-08 14:59:56 +10:00
Naveen N. Rao edf6a2dfe3 powerpc/trace/syscalls: Update syscall name matching logic to account for ppc_ prefix
Some syscall entry functions on powerpc are prefixed with
ppc_/ppc32_/ppc64_ rather than the usual sys_/__se_sys prefix. fork(),
clone(), swapcontext() are some examples of syscalls with such entry
points. We need to match against these names when initializing ftrace
syscall tracing.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-07 21:29:44 +10:00
Naveen N. Rao 0b7758aaf6 powerpc/trace/syscalls: Update syscall name matching logic
On powerpc64 ABIv1, we are enabling syscall tracing for only ~20
syscalls. This is due to commit e145242ea0 ("syscalls/core,
syscalls/x86: Clean up syscall stub naming convention") which has
changed the syscall entry wrapper prefix from "SyS" to "__se_sys".

Update the logic for ABIv1 to not just skip the initial dot, but also
the "__se_sys" prefix.

Fixes: commit e145242ea0 ("syscalls/core, syscalls/x86: Clean up syscall stub naming convention")
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-07 21:29:30 +10:00
Michael Ellerman c4ec1f0353 powerpc/64: Remove unused paca->soft_enabled
In commit 4e26bc4a4e ("powerpc/64: Rename soft_enabled to
irq_soft_mask") we renamed paca->soft_enabled. But then in commit
8e0b634b13 ("powerpc/64s: Do not allocate lppaca if we are not
virtualized") we added it back. Oops. This happened because the two
patches were in flight at the same time and rebased vs each other
multiple times, and we missed it in review.

Fixes: 8e0b634b13 ("powerpc/64s: Do not allocate lppaca if we are not virtualized")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-07 21:22:26 +10:00
Christophe Leroy d5808ffaec powerpc/nohash: Use IS_ENABLED() to simplify __set_pte_at()
By using IS_ENABLED() we can simplify __set_pte_at() by removing
redundant *ptep = pte.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-07 20:37:47 +10:00
Christophe Leroy 0d594aa75f powerpc/nohash: Remove _PAGE_BUSY
_PAGE_BUSY is always 0, remove it.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-07 20:37:46 +10:00
Christophe Leroy 45201c8794 powerpc/nohash: Remove hash related code from nohash headers.
When nohash and book3s header were split, some hash related stuff
remained in the nohash header. This patch removes them.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Duplicate pte_young() to avoid circular header dependency]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-07 20:37:46 +10:00
Christoph Hellwig 325ef1857f PCI: remove PCI_DMA_BUS_IS_PHYS
This was used by the ide, scsi and networking code in the past to
determine if they should bounce payloads.  Now that the dma mapping
always have to support dma to all physical memory (thanks to swiotlb
for non-iommu systems) there is no need to this crude hack any more.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Palmer Dabbelt <palmer@sifive.com> (for riscv)
Reviewed-by: Jens Axboe <axboe@kernel.dk>
2018-05-07 07:15:41 +02:00
Hari Bathini 8597538712 powerpc/fadump: Do not use hugepages when fadump is active
FADump capture kernel boots in restricted memory environment preserving
the context of previous kernel to save vmcore. Supporting hugepages in
such environment makes things unnecessarily complicated, as hugepages
need memory set aside for them. This means most of the capture kernel's
memory is used in supporting hugepages. In most cases, this results in
out-of-memory issues while booting FADump capture kernel. But hugepages
are not of much use in capture kernel whose only job is to save vmcore.
So, disabling hugepages support, when fadump is active, is a reliable
solution for the out of memory issues. Introducing a flag variable to
disable HugeTLB support when fadump is active.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-03 23:09:25 +10:00
Michael Ellerman 0c0c52306f powerpc: Only support DYNAMIC_FTRACE not static
We've had dynamic ftrace support for over 9 years since Steve first
wrote it, all the distros use dynamic, and static is basically
untested these days, so drop support for static ftrace.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-03 22:32:29 +10:00
Naveen N. Rao ae30cc05be powerpc64/ftrace: Implement support for ftrace_regs_caller()
With -mprofile-kernel, we always save the full register state in
ftrace_caller(). While this works, this is inefficient if we're not
interested in the register state, such as when we're using the function
tracer.

Rename the existing ftrace_caller() as ftrace_regs_caller() and provide
a simpler implementation for ftrace_caller() that is used when registers
are not required to be saved.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-03 22:32:29 +10:00
Naveen N. Rao acd55b1005 powerpc64/ftrace: Add helpers to hard disable ftrace
Add some helpers to enable/disable ftrace through paca->ftrace_enabled.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-03 22:32:26 +10:00
Naveen N. Rao c3e59d7784 powerpc64/ftrace: Rearrange #ifdef sections in ftrace.h
Re-arrange the last #ifdef section in preparation for a subsequent
change.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-03 22:32:25 +10:00
Naveen N. Rao ea678ac627 powerpc64/ftrace: Add a field in paca to disable ftrace in unsafe code paths
We have some C code that we call into from real mode where we cannot
take any exceptions. Though the C functions themselves are mostly safe,
if these functions are traced, there is a possibility that we may take
an exception. For instance, in certain conditions, the ftrace code uses
WARN(), which uses a 'trap' to do its job.

For such scenarios, introduce a new field in paca 'ftrace_enabled',
which is checked on ftrace entry before continuing. This field can then
be set to zero to disable/pause ftrace, and set to a non-zero value to
resume ftrace.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-03 22:32:25 +10:00
Thomas Gleixner 604a98f1df Merge branch 'timers/urgent' into timers/core
Pick up urgent fixes to apply dependent cleanup patch
2018-05-02 16:11:12 +02:00
Eric W. Biederman e821fa4245 signal/powerpc: Replace TRAP_FIXME with TRAP_UNK
Using an si_code of 0 that aliases with SI_USER is clearly the wrong
thing todo, and causes problems in interesting ways.

For use in unknown_exception the recently defined TRAP_UNK
semantically is a perfect fit.  For use in RunModeException it looks
like something more specific than TRAP_UNK could be used.  No one has
bothered to find a better fit than the broken si_code of 0 in all of
these years and I don't see an obvious better fit so TRAP_UNK is
switching RunModeException to return TRAP_UNK is clearly an
improvement.

Recent history suggests no actually cares about crazy corner
cases of the kernel behavior like this so I don't expect any
regressions from changing this.  However if something does
happen this change is easy to revert.

Though I wonder if SIGKILL might not be a better fit.

Cc: Paul Mackerras <paulus@samba.org>
Cc: Kumar Gala <kumar.gala@freescale.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org
Fixes: 9bad068c24d7 ("[PATCH] ppc32: support for e500 and 85xx")
Fixes: 0ed70f6105ef ("PPC32: Provide proper siginfo information on various exceptions.")
History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-04-25 10:40:58 -05:00
Eric W. Biederman aeb1c0f6ff signal/powerpc: Replace FPE_FIXME with FPE_FLTUNK
Using an si_code of 0 that aliases with SI_USER is clearly the
wrong thing todo, and causes problems in interesting ways.

The newly defined FPE_FLTUNK semantically appears to fit the
bill so use it instead.

Cc: Paul Mackerras <paulus@samba.org>
Cc: Kumar Gala <kumar.gala@freescale.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc:  linuxppc-dev@lists.ozlabs.org
Fixes: 9bad068c24d7 ("[PATCH] ppc32: support for e500 and 85xx")
Fixes: 0ed70f6105ef ("PPC32: Provide proper siginfo information on various exceptions.")
History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-04-25 10:40:55 -05:00
Alistair Popple a1409adac7 powerpc/powernv/npu: Prevent overwriting of pnv_npu2_init_contex() callback parameters
There is a single npu context per set of callback parameters. Callers
should be prevented from overwriting existing callback values so
instead return an error if different parameters are passed.

Fixes: 1ab66d1fba ("powerpc/powernv: Introduce address translation services for Nvlink2")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Reviewed-by: Mark Hairgrove <mhairgrove@nvidia.com>
Tested-by: Mark Hairgrove <mhairgrove@nvidia.com>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-24 09:46:57 +10:00
Arnd Bergmann d0b67de998 y2038: powerpc: Extend sysvipc data structures
powerpc, uses a nonstandard variation of the generic sysvipc
data structures, intended to have the padding moved around
so it can deal with big-endian 32-bit user space that has
64-bit time_t.

powerpc has the same definition as parisc and sparc, but now also
supports little-endian mode, which is now wrong because the
padding is made for big-endian user space.

This takes just take the same approach here that we have for
the asm-generic headers and adds separate 32-bit fields for the
upper halves of the timestamps, to let libc deal with the mess
in user space.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-04-20 16:20:13 +02:00
Deepa Dinamani 0d55303c51 compat: Move compat_timespec/ timeval to compat_time.h
All the current architecture specific defines for these
are the same. Refactor these common defines to a common
header file.

The new common linux/compat_time.h is also useful as it
will eventually be used to hold all the defines that
are needed for compat time types that support non y2038
safe types. New architectures need not have to define these
new types as they will only use new y2038 safe syscalls.
This file can be deleted after y2038 when we stop supporting
non y2038 safe syscalls.

The patch also requires an operation similar to:

git grep "asm/compat\.h" | cut -d ":" -f 1 |  xargs -n 1 sed -i -e "s%asm/compat.h%linux/compat.h%g"

Cc: acme@kernel.org
Cc: benh@kernel.crashing.org
Cc: borntraeger@de.ibm.com
Cc: catalin.marinas@arm.com
Cc: cmetcalf@mellanox.com
Cc: cohuck@redhat.com
Cc: davem@davemloft.net
Cc: deller@gmx.de
Cc: devel@driverdev.osuosl.org
Cc: gerald.schaefer@de.ibm.com
Cc: gregkh@linuxfoundation.org
Cc: heiko.carstens@de.ibm.com
Cc: hoeppner@linux.vnet.ibm.com
Cc: hpa@zytor.com
Cc: jejb@parisc-linux.org
Cc: jwi@linux.vnet.ibm.com
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: mark.rutland@arm.com
Cc: mingo@redhat.com
Cc: mpe@ellerman.id.au
Cc: oberpar@linux.vnet.ibm.com
Cc: oprofile-list@lists.sf.net
Cc: paulus@samba.org
Cc: peterz@infradead.org
Cc: ralf@linux-mips.org
Cc: rostedt@goodmis.org
Cc: rric@kernel.org
Cc: schwidefsky@de.ibm.com
Cc: sebott@linux.vnet.ibm.com
Cc: sparclinux@vger.kernel.org
Cc: sth@linux.vnet.ibm.com
Cc: ubraun@linux.vnet.ibm.com
Cc: will.deacon@arm.com
Cc: x86@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: James Hogan <jhogan@kernel.org>
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-04-19 13:29:54 +02:00
Linus Torvalds b1cb4f93b5 powerpc fixes for 4.17 #2
- Fix crashes when loading modules built with a different CONFIG_RELOCATABLE
    value by adding CONFIG_RELOCATABLE to vermagic.
 
  - Fix busy loops in the OPAL NVRAM driver if we get certain error conditions
    from firmware.
 
  - Remove tlbie trace points from KVM code that's called in real mode, because
    it causes crashes.
 
  - Fix checkstops caused by invalid tlbiel on Power9 Radix.
 
  - Ensure the set of CPU features we "know" are always enabled is actually the
    minimal set when we build with support for firmware supplied CPU features.
 
 Thanks to:
   Aneesh Kumar K.V, Anshuman Khandual, Nicholas Piggin.
 -----BEGIN PGP SIGNATURE-----
 
 iQIwBAABCAAaBQJa0oWBExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYAV
 EA//UQB7n33HlroMBL3841VswUrIgY36gSi9+QZPjxHiTYGVStI+u0FHq5hm8OMm
 1FkronD0sfbN7t8LJ9FS6vCoAlW15vMBj95pWJXAL7NuQneO7cM5JvRbowVKbNbq
 GESn4EkUj9bAXl7aTzX2yA7jruzMJIwE/2bgl1J6YkpByHx5x/fgzKTuoCRggQqY
 Xd656ZZyWR/uIuWhAjCPFdrPnekVvo7/Gy5Yo5kcR6LbX4MO0JFNOHpiT3wPGGv/
 LJedJtdpH1biMk7SrqLfWD7mpMsVJeMelpkftEbe8zUXf17wBd1rl4JkybLTDwZD
 HznZ0nbnh0j5Q/KRHwOh0sLDSisJF6pQHH/Bnc4Xqrsn4OERwEk40/Ym/O0kDsjH
 n2F3X5a5QLWoBWRsdV3BMyEzc0bgnb++tXeBWOV4jJBEOhoqxwimCU+3fmLiy95A
 urJYf8Sljwve9I5kJyXKpruopgeoJ1aumRwopE8YCG9lfBgzvU/90u6+uKqAdkOv
 kpZxS5FDT2lNIwbCP9WjEelAOGrtA6JQNWU0iPGwsVAvAxX/p1RwwbQeWVlWs3Ek
 UdVF8eGezClpLPa/FQFdDyXfGE6WIf1vK9YnG7k3rUwwkSqoYxKgVQ3o1Vetp0Lg
 fzPsDdDi2V2I3KDYNav8s6kohAIIS/a9XYk4BGvT/htRnUg=
 =0e7c
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix crashes when loading modules built with a different
   CONFIG_RELOCATABLE value by adding CONFIG_RELOCATABLE to vermagic.

 - Fix busy loops in the OPAL NVRAM driver if we get certain error
   conditions from firmware.

 - Remove tlbie trace points from KVM code that's called in real mode,
   because it causes crashes.

 - Fix checkstops caused by invalid tlbiel on Power9 Radix.

 - Ensure the set of CPU features we "know" are always enabled is
   actually the minimal set when we build with support for firmware
   supplied CPU features.

Thanks to: Aneesh Kumar K.V, Anshuman Khandual, Nicholas Piggin.

* tag 'powerpc-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s: Fix CPU_FTRS_ALWAYS vs DT CPU features
  powerpc/mm/radix: Fix checkstops caused by invalid tlbiel
  KVM: PPC: Book3S HV: trace_tlbie must not be called in realmode
  powerpc/8xx: Fix build with hugetlbfs enabled
  powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops
  powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops
  powerpc/fscr: Enable interrupts earlier before calling get_user()
  powerpc/64s: Fix section mismatch warnings from setup_rfi_flush()
  powerpc/modules: Fix crashes by adding CONFIG_RELOCATABLE to vermagic
2018-04-15 11:57:12 -07:00
AKASHI Takahiro 9ec4ecef0a kexec_file,x86,powerpc: factor out kexec_file_ops functions
As arch_kexec_kernel_image_{probe,load}(),
arch_kimage_file_post_load_cleanup() and arch_kexec_kernel_verify_sig()
are almost duplicated among architectures, they can be commonalized with
an architecture-defined kexec_file_ops array.  So let's factor them out.

Link: http://lkml.kernel.org/r/20180306102303.9063-3-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13 17:10:27 -07:00
Michael Ellerman 81b654c273 powerpc/64s: Fix CPU_FTRS_ALWAYS vs DT CPU features
The cpu_has_feature() mechanism has an optimisation where at build
time we construct a mask of the CPU feature bits that will always be
true for the given .config, based on the platform/bitness/etc. that we
are building for.

That is incompatible with DT CPU features, where the set of CPU
features is dependent on feature flags that are given to us by
firmware.

The result is that some feature bits can not be *disabled* by DT CPU
features. Or more accurately, they can be disabled but they will still
appear in the ALWAYS mask, meaning cpu_has_feature() will always
return true for them.

In the past this hasn't really been a problem because on Book3S
64 (where we support DT CPU features), the set of ALWAYS bits has been
very small. That was because we always built for POWER4 and later,
meaning the set of common bits was small.

The only bit that could be cleared by DT CPU features that was also in
the ALWAYS mask was CPU_FTR_NODSISRALIGN, and that was only used in
the alignment handler to create a fake DSISR. That code was itself
deleted in 31bfdb036f ("powerpc: Use instruction emulation
infrastructure to handle alignment faults") (Sep 2017).

However the set of ALWAYS features changed with the recent commit
db5ae1c155 ("powerpc/64s: Refine feature sets for little endian
builds") which restricted the set of feature flags when building
little endian to Power7 or later. That caused the ALWAYS mask to
become much larger for little endian builds.

The result is that the following feature bits can currently not
be *disabled* by DT CPU features:

  CPU_FTR_REAL_LE, CPU_FTR_MMCRA, CPU_FTR_CTRL, CPU_FTR_SMT,
  CPU_FTR_PURR, CPU_FTR_SPURR, CPU_FTR_DSCR, CPU_FTR_PKEY,
  CPU_FTR_VMX_COPY, CPU_FTR_CFAR, CPU_FTR_HAS_PPR.

To fix it we need to mask the set of ALWAYS features with the base set
of DT CPU features, ie. the features that are always enabled by DT CPU
features. That way there are no bits in the ALWAYS mask that are not
also always set by DT CPU features.

Fixes: db5ae1c155 ("powerpc/64s: Refine feature sets for little endian builds")
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-13 23:51:44 +10:00
Nicholas Piggin 34dd25de9f powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops
This is the start of an effort to tidy up and standardise all the
delays. Existing loops have a range of delay/sleep periods from 1ms
to 20ms, and some have no delay. They all loop forever except rtc,
which times out after 10 retries, and that uses 10ms delays. So use
10ms as our standard delay. The OPAL maintainer agrees 10ms is a
reasonable starting point.

The idea is to use the same recipe everywhere, once this is proven to
work then it will be documented as an OPAL API standard. Then both
firmware and OS can agree, and if a particular call needs something
else, then that can be documented with reasoning.

This is not the end-all of this effort, it's just a relatively easy
change that fixes some existing high latency delays. There should be
provision for standardising timeouts and/or interruptible loops where
possible, so non-fatal firmware errors don't cause hangs.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-11 00:05:23 +10:00
Linus Torvalds d8312a3f61 ARM:
- VHE optimizations
 - EL2 address space randomization
 - speculative execution mitigations ("variant 3a", aka execution past invalid
 privilege register access)
 - bugfixes and cleanups
 
 PPC:
 - improvements for the radix page fault handler for HV KVM on POWER9
 
 s390:
 - more kvm stat counters
 - virtio gpu plumbing
 - documentation
 - facilities improvements
 
 x86:
 - support for VMware magic I/O port and pseudo-PMCs
 - AMD pause loop exiting
 - support for AMD core performance extensions
 - support for synchronous register access
 - expose nVMX capabilities to userspace
 - support for Hyper-V signaling via eventfd
 - use Enlightened VMCS when running on Hyper-V
 - allow userspace to disable MWAIT/HLT/PAUSE vmexits
 - usual roundup of optimizations and nested virtualization bugfixes
 
 Generic:
 - API selftest infrastructure (though the only tests are for x86 as of now)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJay19UAAoJEL/70l94x66DGKYIAIu9PTHAEwaX0et15fPW5y2x
 rrtS355lSAmMrPJ1nePRQ+rProD/1B0Kizj3/9O+B9OTKKRsorRYNa4CSu9neO2k
 N3rdE46M1wHAPwuJPcYvh3iBVXtgbMayk1EK5aVoSXaMXEHh+PWZextkl+F+G853
 kC27yDy30jj9pStwnEFSBszO9ua/URdKNKBATNx8WUP6d9U/dlfm5xv3Dc3WtKt2
 UMGmog2wh0i7ecXo7hRkMK4R7OYP3ZxAexq5aa9BOPuFp+ZdzC/MVpN+jsjq2J/M
 Zq6RNyA2HFyQeP0E9QgFsYS2BNOPeLZnT5Jg1z4jyiD32lAZ/iC51zwm4oNKcDM=
 =bPlD
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:
   - VHE optimizations

   - EL2 address space randomization

   - speculative execution mitigations ("variant 3a", aka execution past
     invalid privilege register access)

   - bugfixes and cleanups

  PPC:
   - improvements for the radix page fault handler for HV KVM on POWER9

  s390:
   - more kvm stat counters

   - virtio gpu plumbing

   - documentation

   - facilities improvements

  x86:
   - support for VMware magic I/O port and pseudo-PMCs

   - AMD pause loop exiting

   - support for AMD core performance extensions

   - support for synchronous register access

   - expose nVMX capabilities to userspace

   - support for Hyper-V signaling via eventfd

   - use Enlightened VMCS when running on Hyper-V

   - allow userspace to disable MWAIT/HLT/PAUSE vmexits

   - usual roundup of optimizations and nested virtualization bugfixes

  Generic:
   - API selftest infrastructure (though the only tests are for x86 as
     of now)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (174 commits)
  kvm: x86: fix a prototype warning
  kvm: selftests: add sync_regs_test
  kvm: selftests: add API testing infrastructure
  kvm: x86: fix a compile warning
  KVM: X86: Add Force Emulation Prefix for "emulate the next instruction"
  KVM: X86: Introduce handle_ud()
  KVM: vmx: unify adjacent #ifdefs
  x86: kvm: hide the unused 'cpu' variable
  KVM: VMX: remove bogus WARN_ON in handle_ept_misconfig
  Revert "KVM: X86: Fix SMRAM accessing even if VM is shutdown"
  kvm: Add emulation for movups/movupd
  KVM: VMX: raise internal error for exception during invalid protected mode state
  KVM: nVMX: Optimization: Dont set KVM_REQ_EVENT when VMExit with nested_run_pending
  KVM: nVMX: Require immediate-exit when event reinjected to L2 and L1 event pending
  KVM: x86: Fix misleading comments on handling pending exceptions
  KVM: x86: Rename interrupt.pending to interrupt.injected
  KVM: VMX: No need to clear pending NMI/interrupt on inject realmode interrupt
  x86/kvm: use Enlightened VMCS when running on Hyper-V
  x86/hyper-v: detect nested features
  x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits
  ...
2018-04-09 11:42:31 -07:00
Michael Ellerman 73aca179d7 powerpc/modules: Fix crashes by adding CONFIG_RELOCATABLE to vermagic
If you build the kernel with CONFIG_RELOCATABLE=n, then install the
modules, rebuild the kernel with CONFIG_RELOCATABLE=y and leave the
old modules installed, we crash something like:

  Unable to handle kernel paging request for data at address 0xd000000018d66cef
  Faulting instruction address: 0xc0000000021ddd08
  Oops: Kernel access of bad area, sig: 11 [#1]
  Modules linked in: x_tables autofs4
  CPU: 2 PID: 1 Comm: systemd Not tainted 4.16.0-rc6-gcc_ubuntu_le-g99fec39 #1
  ...
  NIP check_version.isra.22+0x118/0x170
  Call Trace:
    __ksymtab_xt_unregister_table+0x58/0xfffffffffffffcb8 [x_tables] (unreliable)
    resolve_symbol+0xb4/0x150
    load_module+0x10e8/0x29a0
    SyS_finit_module+0x110/0x140
    system_call+0x58/0x6c

This happens because since commit 71810db27c ("modversions: treat
symbol CRCs as 32 bit quantities"), a relocatable kernel encodes and
handles symbol CRCs differently from a non-relocatable kernel.

Although it's possible we could try and detect this situation and
handle it, it's much more robust to simply make the state of
CONFIG_RELOCATABLE part of the module vermagic.

Fixes: 71810db27c ("modversions: treat symbol CRCs as 32 bit quantities")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-09 13:21:02 +10:00