Commit Graph

2672 Commits

Author SHA1 Message Date
Ingo Molnar 1e5f8a3085 Linux 5.5-rc3
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl4AEiYeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGR3sH/ixrBBYUVyjRPOxS
 ce4iVoTqphGSoAzq/3FA1YZZOPQ/Ep0NXL4L2fTGxmoiqIiuy8JPp07/NKbHQjj1
 Rt6PGm6cw2pMJHaK9gRdlTH/6OyXkp06OkH1uHqKYrhPnpCWDnj+i2SHAX21Hr1y
 oBQh4/XKvoCMCV96J2zxRsLvw8OkQFE0ouWWfj6LbpXIsmWZ++s0OuaO1cVdP/oG
 j+j2Voi3B3vZNQtGgJa5W7YoZN5Qk4ZIj9bMPg7bmKRd3wNB228AiJH2w68JWD/I
 jCA+JcITilxC9ud96uJ6k7SMS2ufjQlnP0z6Lzd0El1yGtHYRcPOZBgfOoPU2Euf
 33WGSyI=
 =iEwx
 -----END PGP SIGNATURE-----

Merge tag 'v5.5-rc3' into sched/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25 10:41:37 +01:00
Vasily Gorbik b4adfe5591 s390/ftrace: save traced function caller
A typical backtrace acquired from ftraced function currently looks like
the following (e.g. for "path_openat"):

arch_stack_walk+0x15c/0x2d8
stack_trace_save+0x50/0x68
stack_trace_call+0x15a/0x3b8
ftrace_graph_caller+0x0/0x1c
0x3e0007e3c98 <- ftraced function caller (should be do_filp_open+0x7c/0xe8)
do_open_execat+0x70/0x1b8
__do_execve_file.isra.0+0x7d8/0x860
__s390x_sys_execve+0x56/0x68
system_call+0xdc/0x2d8

Note random "0x3e0007e3c98" stack value as ftraced function caller. This
value causes either imprecise unwinder result or unwinding failure.
That "0x3e0007e3c98" comes from r14 of ftraced function stack frame, which
it haven't had a chance to initialize since the very first instruction
calls ftrace code ("ftrace_caller"). (ftraced function might never
save r14 as well). Nevertheless according to s390 ABI any function
is called with stack frame allocated for it and r14 contains return
address. "ftrace_caller" itself is called with "brasl %r0,ftrace_caller".
So, to fix this issue simply always save traced function caller onto
ftraced function stack frame.

Reported-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-12-18 23:29:26 +01:00
Vasily Gorbik eef06cbf67 s390/unwind: stop gracefully at user mode pt_regs in irq stack
Consider reaching user mode pt_regs at the bottom of irq stack graceful
unwinder termination. This is the case when irq/mcck/ext interrupt arrives
while in user mode.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-12-18 23:29:26 +01:00
Heiko Carstens 1b68ac8678 s390: remove last diag 0x44 caller
diag 0x44 is a voluntary undirected yield of a virtual CPU. This has
caused a lot of performance issues in the past.

There is only one caller left, and that one is only executed if diag
0x9c (directed yield) is not present. Given that all hypervisors
implement diag 0x9c anyway, remove the last diag 0x44 to avoid that
more callers will be added.

Worst case that could happen now, if diag 0x9c is not present, is that
a virtual CPU would loop a bit instead of giving its time slice up.

diag 0x44 statistics in debugfs are kept and will always be zero, so
that user space can tell that there are no calls.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-12-11 19:53:24 +01:00
Thomas Richter 0539ad0b22 s390/cpum_sf: Avoid SBD overflow condition in irq handler
The s390 CPU Measurement sampling facility has an overflow condition
which fires when all entries in a SBD are used.
The measurement alert interrupt is triggered and reads out all samples
in this SDB. It then tests the successor SDB, if this SBD is not full,
the interrupt handler does not read any samples at all from this SDB
The design waits for the hardware to fill this SBD and then trigger
another meassurement alert interrupt.

This scheme works nicely until
an perf_event_overflow() function call discards the sample due to
a too high sampling rate.
The interrupt handler has logic to read out a partially filled SDB
when the perf event overflow condition in linux common code is met.
This causes the CPUM sampling measurement hardware and the PMU
device driver to operate on the same SBD's trailer entry.
This should not happen.

This can be seen here using this trace:
   cpumsf_pmu_add: tear:0xb5286000
   hw_perf_event_update: sdbt 0xb5286000 full 1 over 0 flush_all:0
   hw_perf_event_update: sdbt 0xb5286008 full 0 over 0 flush_all:0
        above shows 1. interrupt
   hw_perf_event_update: sdbt 0xb5286008 full 1 over 0 flush_all:0
   hw_perf_event_update: sdbt 0xb5286008 full 0 over 0 flush_all:0
        above shows 2. interrupt
	... this goes on fine until...
   hw_perf_event_update: sdbt 0xb5286068 full 1 over 0 flush_all:0
   perf_push_sample1: overflow
      one or more samples read from the IRQ handler are rejected by
      perf_event_overflow() and the IRQ handler advances to the next SDB
      and modifies the trailer entry of a partially filled SDB.
   hw_perf_event_update: sdbt 0xb5286070 full 0 over 0 flush_all:1
      timestamp: 14:32:52.519953

Next time the IRQ handler is called for this SDB the trailer entry shows
an overflow count of 19 missed entries.
   hw_perf_event_update: sdbt 0xb5286070 full 1 over 19 flush_all:1
      timestamp: 14:32:52.970058

Remove access to a follow on SDB when event overflow happened.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-12-11 19:53:24 +01:00
Thomas Richter 39d4a501a9 s390/cpum_sf: Adjust sampling interval to avoid hitting sample limits
Function perf_event_ever_overflow() and perf_event_account_interrupt()
are called every time samples are processed by the interrupt handler.
However function perf_event_account_interrupt() has checks to avoid being
flooded with interrupts (more then 1000 samples are received per
task_tick).  Samples are then dropped and a PERF_RECORD_THROTTLED is
added to the perf data. The perf subsystem limit calculation is:

    maximum sample frequency := 100000 --> 1 samples per 10 us
    task_tick = 10ms = 10000us --> 1000 samples per task_tick

The work flow is

measurement_alert() uses SDBT head and each SBDT points to 511
 SDB pages, each with 126 sample entries. After processing 8 SBDs
 and for each valid sample calling:

     perf_event_overflow()
       perf_event_account_interrupts()

there is a considerable amount of samples being dropped, especially when
the sample frequency is very high and near the 100000 limit.

To avoid the high amount of samples being dropped near the end of a
task_tick time frame, increment the sampling interval in case of
dropped events. The CPU Measurement sampling facility on the s390
supports only intervals, specifiing how many CPU cycles have to be
executed before a sample is generated. Increase the interval when the
samples being generated hit the task_tick limit.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-12-11 19:53:23 +01:00
Thomas Gleixner fa68645305 sched/rt, s390: Use CONFIG_PREEMPTION
CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT.
Both PREEMPT and PREEMPT_RT require the same functionality which today
depends on CONFIG_PREEMPT.

Switch the preemption and entry code over to use CONFIG_PREEMPTION. Add
PREEMPT_RT output to die().

[bigeasy: +Kconfig, dumpstack.c]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Link: https://lore.kernel.org/r/20191015191821.11479-18-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-08 14:37:35 +01:00
Linus Torvalds 01d1dff646 s390 updates for the 5.5 merge window #2
- Make stack unwinder reliable and suitable for livepatching. Add unwinder
   testing module.
 
 - Fixes for CALL_ON_STACK helper used for stack switching.
 
 - Fix unwinding from bpf code.
 
 - Fix getcpu and remove compat support in vdso code.
 
 - Fix address space control registers initialization.
 
 - Save KASLR offset for early dumps.
 
 - Handle new FILTERED_BY_HYPERVISOR reply code in crypto code.
 
 - Minor perf code cleanup and potential memory leak fix.
 
 - Add couple of error messages for corner cases during PCI device
   creation.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAl3mUEMACgkQjYWKoQLX
 FBgHBgf/Ui3sgKGozvIAwy2kQ3oPtCdsmnTKEhLdhYT0cKMWNkA/jc13vn37ZqSk
 vMhawMjgjHhn4CLSjxKRGCprYViXIgnF2XrCywTDsBoj87QwB6/dME1gXJRW+/Rm
 OPvO+8D+210Ow0Xip3xXSRIPNFsUINCQeCCEtQCOuhGMdQPC0VIKgYtgvk1TAo1E
 +DycHbZ0e+uEp6zvVSsoP9wrkXw/L9krTDnjHncQ7FULJAYnBhY+qaeNTek09QAT
 j3Ywh5/fYR11c62W6fjb1lQHLb75L0aeK7Q5r5WspxG5LwiR2ncYWOQ4BQPZoUXq
 GjdNvwRmvEkB3IbnpLp/ft7sqsPn2w==
 =CoqQ
 -----END PGP SIGNATURE-----

Merge tag 's390-5.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull more s390 updates from Vasily Gorbik:

 - Make stack unwinder reliable and suitable for livepatching. Add
   unwinder testing module.

 - Fixes for CALL_ON_STACK helper used for stack switching.

 - Fix unwinding from bpf code.

 - Fix getcpu and remove compat support in vdso code.

 - Fix address space control registers initialization.

 - Save KASLR offset for early dumps.

 - Handle new FILTERED_BY_HYPERVISOR reply code in crypto code.

 - Minor perf code cleanup and potential memory leak fix.

 - Add couple of error messages for corner cases during PCI device
   creation.

* tag 's390-5.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (33 commits)
  s390: remove compat vdso code
  s390/livepatch: Implement reliable stack tracing for the consistency model
  s390/unwind: add stack pointer alignment sanity checks
  s390/unwind: filter out unreliable bogus %r14
  s390/unwind: start unwinding from reliable state
  s390/test_unwind: add program check context tests
  s390/test_unwind: add irq context tests
  s390/test_unwind: print verbose unwinding results
  s390/test_unwind: add CALL_ON_STACK tests
  s390: fix register clobbering in CALL_ON_STACK
  s390/test_unwind: require that unwinding ended successfully
  s390/unwind: add a test for the internal API
  s390/unwind: always inline get_stack_pointer
  s390/pci: add error message on device number limit
  s390/pci: add error message for UID collision
  s390/cpum_sf: Check for SDBT and SDB consistency
  s390/cpum_sf: Use TEAR_REG macro consistantly
  s390/cpum_sf: Remove unnecessary check for pending SDBs
  s390/cpum_sf: Replace function name in debug statements
  s390/kaslr: store KASLR offset for early dumps
  ...
2019-12-03 12:50:00 -08:00
Heiko Carstens 2115fbf721 s390: remove compat vdso code
Remove compat vdso code, since there is hardly any compat user space
left. Still existing compat user space will have to use system calls
instead.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-12-01 12:48:49 +01:00
Linus Torvalds b94ae8ad9f seccomp updates for v5.5
- implement SECCOMP_USER_NOTIF_FLAG_CONTINUE (Christian Brauner)
 - fixes to selftests (Christian Brauner)
 - remove secure_computing() argument (Christian Brauner)
 -----BEGIN PGP SIGNATURE-----
 Comment: Kees Cook <kees@outflux.net>
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl3dT/kWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJg7eD/9PFh0xAgk7swWIOnkv/Ckj6pqR
 lcnVaugsap2sp99P+QxVPoqKoBsHF/OZ96OqJcokljdWO77ElBMG4Xxgjho/mPPU
 Yzhsd9/Q0j4zYIe/Gy+4LxZ+wSudBxv7ls4l86fst1GWg880VkLk32/1N0BUjFAp
 uyBBaEuDoXcnkru8ojKH1xgp0Cd1KjyO1KEAQdkSt2GROo3nhROh9955Hrrxuanr
 0sjWLYe8E8P3hPugRI/3WRZu4VqdIn47pm+/UMPwGpC80kI+mSL1jtidszqC022w
 u0H5yoedEhZCan7uHWtEY1TXfwgktUKMZOzMP8LSoZ9cNPAFyKXsFqN7Jzf/1Edr
 9Zsc+9gc3lfBr6YYBSHUC4XYGzZ2fy0itK/yRTvZdUGO/XETrE61fR/wyVjQttRS
 OR1tAtmd9/3iZqe1jh1l3Rw4bJh1w/hS768sWpp8qAMunCGF5gQvFdqGFAxjIS5c
 Ddd0gjxK/NV72+iUzCSL0qUXcYjNYPT4cUapywBuQ4H1i4hl5EM3nGyCbLFbpqkp
 L2fzeAdRGSZIzZ35emTWhvSLZ36Ty64zEViNbAOP9o/+j6/SR5TjL1aNDkz69Eca
 GM1XiDeg4AoamtPR38+DzS+EnzBWfOD6ujsKNFgjAJbVIaa414Vql9utrq7fSvf2
 OIJjAD8PZKN93t1qaw==
 =igQG
 -----END PGP SIGNATURE-----

Merge tag 'seccomp-v5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull seccomp updates from Kees Cook:
 "Mostly this is implementing the new flag SECCOMP_USER_NOTIF_FLAG_CONTINUE,
  but there are cleanups as well.

   - implement SECCOMP_USER_NOTIF_FLAG_CONTINUE (Christian Brauner)

   - fixes to selftests (Christian Brauner)

   - remove secure_computing() argument (Christian Brauner)"

* tag 'seccomp-v5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  seccomp: rework define for SECCOMP_USER_NOTIF_FLAG_CONTINUE
  seccomp: fix SECCOMP_USER_NOTIF_FLAG_CONTINUE test
  seccomp: simplify secure_computing()
  seccomp: test SECCOMP_USER_NOTIF_FLAG_CONTINUE
  seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE
  seccomp: avoid overflow in implicit constant conversion
2019-11-30 17:23:16 -08:00
Miroslav Benes aa137a6d30 s390/livepatch: Implement reliable stack tracing for the consistency model
The livepatch consistency model requires reliable stack tracing
architecture support in order to work properly. In order to achieve
this, two main issues have to be solved. First, reliable and consistent
call chain backtracing has to be ensured. Second, the unwinder needs to
be able to detect stack corruptions and return errors.

The "zSeries ELF Application Binary Interface Supplement" says:

  "The stack pointer points to the first word of the lowest allocated
  stack frame. If the "back chain" is implemented this word will point to
  the previously allocated stack frame (towards higher addresses), except
  for the first stack frame, which shall have a back chain of zero (NULL).
  The stack shall grow downwards, in other words towards lower addresses."

"back chain" is optional. GCC option -mbackchain enables it. Quoting
Martin Schwidefsky [1]:

  "The compiler is called with the -mbackchain option, all normal C
  function will store the backchain in the function prologue. All
  functions written in assembler code should do the same, if you find one
  that does not we should fix that. The end result is that a task that
  *voluntarily* called schedule() should have a proper backchain at all
  times.

  Dependent on the use case this may or may not be enough. Asynchronous
  interrupts may stop the CPU at the beginning of a function, if kernel
  preemption is enabled we can end up with a broken backchain.  The
  production kernels for IBM Z are all compiled *without* kernel
  preemption. So yes, we might get away without the objtool support.

  On a side-note, we do have a line item to implement the ORC unwinder for
  the kernel, that includes the objtool support. Once we have that we can
  drop the -mbackchain option for the kernel build. That gives us a nice
  little performance benefit. I hope that the change from backchain to the
  ORC unwinder will not be too hard to implement in the livepatch tools."

Since -mbackchain is enabled by default when the kernel is compiled, the
call chain backtracing should be currently ensured and objtool should
not be necessary for livepatch purposes.

Regarding the second issue, stack corruptions and non-reliable states
have to be recognized by the unwinder. Mainly it means to detect
preemption or page faults, the end of the task stack must be reached,
return addresses must be valid text addresses and hacks like function
graph tracing and kretprobes must be properly detected.

Unwinding a running task's stack is not a problem, because there is a
livepatch requirement that every checked task is blocked, except for the
current task. Due to that, the implementation can be much simpler
compared to the existing non-reliable infrastructure. We can consider a
task's kernel/thread stack only and skip the other stacks.

[1] 20180912121106.31ffa97c@mschwideX1 [not archived on lore.kernel.org]

Link: https://lkml.kernel.org/r/20191106095601.29986-5-mbenes@suse.cz
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30 10:52:48 +01:00
Miroslav Benes be2d11b2a1 s390/unwind: add stack pointer alignment sanity checks
ABI requires SP to be aligned 8 bytes, report unwinding error otherwise.

Link: https://lkml.kernel.org/r/20191106095601.29986-5-mbenes@suse.cz
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30 10:52:48 +01:00
Vasily Gorbik bf018ee644 s390/unwind: filter out unreliable bogus %r14
Currently unwinder unconditionally returns %r14 from the first frame
pointed by %r15 from pt_regs. A task could be interrupted when a function
already allocated this frame (if it needs it) for its callees or to
store local variables. In that case this frame would contain random
values from stack or values stored there by a callee. As we are only
interested in %r14 to get potential return address, skip bogus return
addresses which doesn't belong to kernel text.

This helps to avoid duplicating filtering logic in unwider users, most
of which use unwind_get_return_address() and would choke on bogus 0
address returned by it otherwise.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30 10:52:48 +01:00
Vasily Gorbik 222ee9087a s390/unwind: start unwinding from reliable state
A comment in arch/s390/include/asm/unwind.h says:
> If 'first_frame' is not zero unwind_start skips unwind frames until it
> reaches the specified stack pointer.
> The end of the unwinding is indicated with unwind_done, this can be true
> right after unwind_start, e.g. with first_frame!=0 that can not be found.
> unwind_next_frame skips to the next frame.
> Once the unwind is completed unwind_error() can be used to check if there
> has been a situation where the unwinder could not correctly understand
> the tasks call chain.

With this change backchain unwinder now comply with behaviour
described. As well as matches orc unwinder implementation.  Now unwinder
starts from reliable state, i.e. __unwind_start own stack frame is
taken or stack frame generated by __switch_to (ksp) - both known to be
valid. In case of pt_regs %r15 is better match for pt_regs psw, than
sometimes random "sp" caller passed.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30 10:52:48 +01:00
Vasily Gorbik 0610154650 s390/test_unwind: print verbose unwinding results
Add stack name, sp and reliable information into test unwinding
results. Also consider ip outside of kernel text as failure if the
state is reported reliable.

Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30 10:52:48 +01:00
Thomas Richter 247f265fa5 s390/cpum_sf: Check for SDBT and SDB consistency
Each SBDT is located at a 4KB page and contains 512 entries.
Each entry of a SDBT points to a SDB, a 4KB page containing
sampled data. The last entry is a link to another SDBT page.

When an event is created the function sequence executed is:

  __hw_perf_event_init()
  +--> allocate_buffers()
       +--> realloc_sampling_buffers()
	    +---> alloc_sample_data_block()

Both functions realloc_sampling_buffers() and
alloc_sample_data_block() allocate pages and the allocation
can fail. This is handled correctly and all allocated
pages are freed and error -ENOMEM is returned to the
top calling function. Finally the event is not created.

Once the event has been created, the amount of initially
allocated SDBT and SDB can be too low. This is detected
during measurement interrupt handling, where the amount
of lost samples is calculated. If the number of lost samples
is too high considering sampling frequency and already allocated
SBDs, the number of SDBs is enlarged during the next execution
of cpumsf_pmu_enable().

If more SBDs need to be allocated, functions

       realloc_sampling_buffers()
       +---> alloc-sample_data_block()

are called to allocate more pages. Page allocation may fail
and the returned error is ignored. A SDBT and SDB setup
already exists.

However the modified SDBTs and SDBs might end up in a situation
where the first entry of an SDBT does not point to an SDB,
but another SDBT, basicly an SBDT without payload.
This can not be handled by the interrupt handler, where an SDBT
must have at least one entry pointing to an SBD.

Add a check to avoid SDBTs with out payload (SDBs) when enlarging
the buffer setup.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30 10:52:46 +01:00
Thomas Richter 7dd6b199df s390/cpum_sf: Use TEAR_REG macro consistantly
The macro TEAR_REG() saves the last used SDBT address
in the perf_hw_event structure. This is also done
by function hw_reset_registers() which is a one-liner
and simply uses macro TEAR_REG(). Remove function
hw_reset_registers(), which is only used one time and use
macro TEAR_REG() instead. This macro is used throughout
the code anyway.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30 10:52:46 +01:00
Thomas Richter c17a7c6ee8 s390/cpum_sf: Remove unnecessary check for pending SDBs
In interrupt handling the function extend_sampling_buffer()
is called after checking for a possibly extension.
This check is not necessary as the called function itself
performs this check again.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30 10:52:46 +01:00
Thomas Richter 532da3de70 s390/cpum_sf: Replace function name in debug statements
Replace hard coded function names in debug statements
by the "%s ...", __func__ construct suggested by checkpatch.pl
script.  Use consistent debug print format of the form variable
blank value. Also add leading 0x for all hex values.
Print allocated page addresses consistantly as hex numbers
with leading 0x.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30 10:52:46 +01:00
Gerald Schaefer a9f2f6865d s390/kaslr: store KASLR offset for early dumps
The KASLR offset is added to vmcoreinfo in arch_crash_save_vmcoreinfo(),
so that it can be found by crash when processing kernel dumps.

However, arch_crash_save_vmcoreinfo() is called during a subsys_initcall,
so if the kernel crashes before that, we have no vmcoreinfo and no KASLR
offset.

Fix this by storing the KASLR offset in the lowcore, where the vmcore_info
pointer will be stored, and where it can be found by crash. In order to
make it distinguishable from a real vmcore_info pointer, mark it as uneven
(KASLR offset itself is aligned to THREAD_SIZE).

When arch_crash_save_vmcoreinfo() stores the real vmcore_info pointer in
the lowcore, it overwrites the KASLR offset. At that point, the KASLR
offset is not yet added to vmcoreinfo, so we also need to move the
mem_assign_absolute() behind the vmcoreinfo_append_str().

Fixes: b2d24b97b2 ("s390/kernel: add support for kernel address space layout randomization (KASLR)")
Cc: <stable@vger.kernel.org> # v5.2+
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30 10:52:45 +01:00
Vasily Gorbik e76e69611e s390/unwind: stop gracefully at task pt_regs
Consider reaching task pt_regs graceful unwinder termination. Task
pt_regs itself never contains a valid state to which a task might return
within the kernel context (user task pt_regs is a special case). Since
we already avoid printing user task pt_regs and in most cases we don't
even bother filling task pt_regs psw and r15 with something reasonable
simply skip task pt_regs altogether. With this change unwind_error() now
accurately represent whether unwinder reached task pt_regs successfully
or failed along the way.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30 10:52:45 +01:00
Vasily Gorbik cb7948e8c3 s390/head64: correct init_task stack setup
Add missing allocation of pt_regs at the bottom of the stack. This
makes it consistent with other stack setup cases and also what stack
unwinder expects.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30 10:52:45 +01:00
Vasily Gorbik 97806dfb6f s390/unwind: make reuse_sp default when unwinding pt_regs
Currently unwinder yields 2 entries when pt_regs are met:
sp="address of pt_regs itself" ip=pt_regs->psw
sp=pt_regs->gprs[15] ip="r14 from stack frame pointed by pt_regs->gprs[15]"

And neither of those 2 states (combination of sp and ip) ever happened.

reuse_sp has been introduced by commit a1d863ac3e ("s390/unwind: fix
mixing regs and sp"). reuse_sp=true makes unwinder keen to produce the
following result, when pt_regs are given (as an arg to unwind_start):
sp=pt_regs->gprs[15] ip=pt_regs->psw
sp=pt_regs->gprs[15] ip="r14 from stack frame pointed by pt_regs->gprs[15]"

The first state is an actual state in which a task was when pt_regs were
collected. The second state is marked unreliable and is for debugging
purposes to cover the case when a task has been interrupted in between
stack frame allocation and writing back_chain - in this case r14 might
show an actual caller.

Make unwinder behaviour enabled via reuse_sp=true default and drop the
special case handling.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30 10:52:45 +01:00
Vasily Gorbik 67f5593419 s390/unwind: report an error if pt_regs are not on stack
If unwinder is looking at pt_regs which is not on stack then something
went wrong and an error has to be reported rather than successful
unwinding termination.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30 10:52:45 +01:00
Vasily Gorbik 7bcaad1f9f s390: avoid misusing CALL_ON_STACK for task stack setup
CALL_ON_STACK is intended to be used for temporary stack switching with
potential return to the caller.

When CALL_ON_STACK is misused to switch from nodat stack to task stack
back_chain information would later lead stack unwinder from task stack into
(per cpu) nodat stack which is reused for other purposes. This would
yield confusing unwinding result or errors.

To avoid that introduce CALL_ON_STACK_NORETURN to be used instead. It
makes sure that back_chain is zeroed and unwinder finishes gracefully
ending up at task pt_regs.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30 10:52:45 +01:00
Vasily Gorbik 103b4cca60 s390/unwind: unify task is current checks
Avoid mixture of task == NULL and task == current meaning the same
thing and simply always initialize task with current in unwind_start.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30 10:52:45 +01:00
Vasily Gorbik 7f28dad395 s390: disable preemption when switching to nodat stack with CALL_ON_STACK
Make sure preemption is disabled when temporary switching to nodat
stack with CALL_ON_STACK helper, because nodat stack is per cpu.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30 10:52:45 +01:00
Heiko Carstens 5a5525b048 s390/vdso: fix getcpu
getcpu reads the required values for cpu and node with two
instructions. This might lead to an inconsistent result if user space
gets preempted and migrated to a different CPU between the two
instructions.

Fix this by using just a single instruction to read both values at
once.

This is currently rather a theoretical bug, since there is no real
NUMA support available (except for NUMA emulation).

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30 10:52:44 +01:00
Heiko Carstens a2308c11ec s390/smp,vdso: fix ASCE handling
When a secondary CPU is brought up it must initialize its control
registers. CPU A which triggers that a secondary CPU B is brought up
stores its control register contents into the lowcore of new CPU B,
which then loads these values on startup.

This is problematic in various ways: the control register which
contains the home space ASCE will correctly contain the kernel ASCE;
however control registers for primary and secondary ASCEs are
initialized with whatever values were present in CPU A.

Typically:
- the primary ASCE will contain the user process ASCE of the process
  that triggered onlining of CPU B.
- the secondary ASCE will contain the percpu VDSO ASCE of CPU A.

Due to lazy ASCE handling we may also end up with other combinations.

When then CPU B switches to a different process (!= idle) it will
fixup the primary ASCE. However the problem is that the (wrong) ASCE
from CPU A was loaded into control register 1: as soon as an ASCE is
attached (aka loaded) a CPU is free to generate TLB entries using that
address space.
Even though it is very unlikey that CPU B will actually generate such
entries, this could result in TLB entries of the address space of the
process that ran on CPU A. These entries shouldn't exist at all and
could cause problems later on.

Furthermore the secondary ASCE of CPU B will not be updated correctly.
This means that processes may see wrong results or even crash if they
access VDSO data on CPU B. The correct VDSO ASCE will eventually be
loaded on return to user space as soon as the kernel executed a call
to strnlen_user or an atomic futex operation on CPU B.

Fix both issues by intializing the to be loaded control register
contents with the correct ASCEs and also enforce (re-)loading of the
ASCEs upon first context switch and return to user space.

Fixes: 0aaba41b58 ("s390: remove all code using the access register mode")
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30 10:52:44 +01:00
Linus Torvalds 77a05940ee Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The biggest changes in this cycle were:

   - Make kcpustat vtime aware (Frederic Weisbecker)

   - Rework the CFS load_balance() logic (Vincent Guittot)

   - Misc cleanups, smaller enhancements, fixes.

  The load-balancing rework is the most intrusive change: it replaces
  the old heuristics that have become less meaningful after the
  introduction of the PELT metrics, with a grounds-up load-balancing
  algorithm.

  As such it's not really an iterative series, but replaces the old
  load-balancing logic with the new one. We hope there are no
  performance regressions left - but statistically it's highly probable
  that there *is* going to be some workload that is hurting from these
  chnages. If so then we'd prefer to have a look at that workload and
  fix its scheduling, instead of reverting the changes"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
  rackmeter: Use vtime aware kcpustat accessor
  leds: Use all-in-one vtime aware kcpustat accessor
  cpufreq: Use vtime aware kcpustat accessors for user time
  procfs: Use all-in-one vtime aware kcpustat accessor
  sched/vtime: Bring up complete kcpustat accessor
  sched/cputime: Support other fields on kcpustat_field()
  sched/cpufreq: Move the cfs_rq_util_change() call to cpufreq_update_util()
  sched/fair: Add comments for group_type and balancing at SD_NUMA level
  sched/fair: Fix rework of find_idlest_group()
  sched/uclamp: Fix overzealous type replacement
  sched/Kconfig: Fix spelling mistake in user-visible help text
  sched/core: Further clarify sched_class::set_next_task()
  sched/fair: Use mul_u32_u32()
  sched/core: Simplify sched_class::pick_next_task()
  sched/core: Optimize pick_next_task()
  sched/core: Make pick_next_task_idle() more consistent
  sched/fair: Better document newidle_balance()
  leds: Use vtime aware kcpustat accessor to fetch CPUTIME_SYSTEM
  cpufreq: Use vtime aware kcpustat accessor to fetch CPUTIME_SYSTEM
  procfs: Use vtime aware kcpustat accessor to fetch CPUTIME_SYSTEM
  ...
2019-11-26 15:23:14 -08:00
Linus Torvalds 1d87200446 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Cross-arch changes to move the linker sections for NOTES and
     EXCEPTION_TABLE into the RO_DATA area, where they belong on most
     architectures. (Kees Cook)

   - Switch the x86 linker fill byte from x90 (NOP) to 0xcc (INT3), to
     trap jumps into the middle of those padding areas instead of
     sliding execution. (Kees Cook)

   - A thorough cleanup of symbol definitions within x86 assembler code.
     The rather randomly named macros got streamlined around a
     (hopefully) straightforward naming scheme:

        SYM_START(name, linkage, align...)
        SYM_END(name, sym_type)

        SYM_FUNC_START(name)
        SYM_FUNC_END(name)

        SYM_CODE_START(name)
        SYM_CODE_END(name)

        SYM_DATA_START(name)
        SYM_DATA_END(name)

     etc - with about three times of these basic primitives with some
     label, local symbol or attribute variant, expressed via postfixes.

     No change in functionality intended. (Jiri Slaby)

   - Misc other changes, cleanups and smaller fixes"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
  x86/entry/64: Remove pointless jump in paranoid_exit
  x86/entry/32: Remove unused resume_userspace label
  x86/build/vdso: Remove meaningless CFLAGS_REMOVE_*.o
  m68k: Convert missed RODATA to RO_DATA
  x86/vmlinux: Use INT3 instead of NOP for linker fill bytes
  x86/mm: Report actual image regions in /proc/iomem
  x86/mm: Report which part of kernel image is freed
  x86/mm: Remove redundant address-of operators on addresses
  xtensa: Move EXCEPTION_TABLE to RO_DATA segment
  powerpc: Move EXCEPTION_TABLE to RO_DATA segment
  parisc: Move EXCEPTION_TABLE to RO_DATA segment
  microblaze: Move EXCEPTION_TABLE to RO_DATA segment
  ia64: Move EXCEPTION_TABLE to RO_DATA segment
  h8300: Move EXCEPTION_TABLE to RO_DATA segment
  c6x: Move EXCEPTION_TABLE to RO_DATA segment
  arm64: Move EXCEPTION_TABLE to RO_DATA segment
  alpha: Move EXCEPTION_TABLE to RO_DATA segment
  x86/vmlinux: Move EXCEPTION_TABLE to RO_DATA segment
  x86/vmlinux: Actually use _etext for the end of the text segment
  vmlinux.lds.h: Allow EXCEPTION_TABLE to live in RO_DATA
  ...
2019-11-26 10:42:40 -08:00
Linus Torvalds ea1f56fa16 s390 updates for the 5.5 merge window
- Adjust PMU device drivers registration to avoid WARN_ON and few other
   perf improvements.
 
 - Enhance tracing in vfio-ccw.
 
 - Few stack unwinder fixes and improvements, convert get_wchan custom
   stack unwinding to generic api usage.
 
 - Fixes for mm helpers issues uncovered with tests validating architecture
   page table helpers.
 
 - Fix noexec bit handling when hardware doesn't support it.
 
 - Fix memleak and unsigned value compared with zero bugs in crypto
   code. Minor code simplification.
 
 - Fix crash during kdump with kasan enabled kernel.
 
 - Switch bug and alternatives from asm to asm_inline to improve inlining
   decisions.
 
 - Use 'depends on cc-option' for MARCH and TUNE options in Kconfig,
   add z13s and z14 ZR1 to TUNE descriptions.
 
 - Minor head64.S simplification.
 
 - Fix physical to logical CPU map for SMT.
 
 - Several cleanups in qdio code.
 
 - Other minor cleanups and fixes all over the code.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAl3ahGYACgkQjYWKoQLX
 FBguwAgAig+FNos8zkd7Sr2wg4DPL2IlYERVP40fOLXfGuVUOnMLg8OTO6yDWDpH
 5+cKAQS1wWgyvlfjWRUJ6anXLBAsgKRD1nyFIZTpn/wArGk/duCbnl/VFriDgrST
 8KTQDJpZ9w9nXtQ7lA2QWaw5U2WG8I2T2JuQJCdLXze7RXi0bDVe8e6131NMaJ42
 LLxqOqm8d8XDnd8oDVP04LT5IfhuI2cILoGBP/GyI2fqQk9Ems6M2gxuISq1COmy
 WORDLfwWyCLeF7gWKKjxf8Vo1HYcyoFvdXnxWiHb0TDZesQZJr/LLELTP03fbCW9
 U4jbXncnnPA7kT4tlC95jT5M69yK5w==
 =+FxG
 -----END PGP SIGNATURE-----

Merge tag 's390-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Vasily Gorbik:

 - Adjust PMU device drivers registration to avoid WARN_ON and few other
   perf improvements.

 - Enhance tracing in vfio-ccw.

 - Few stack unwinder fixes and improvements, convert get_wchan custom
   stack unwinding to generic api usage.

 - Fixes for mm helpers issues uncovered with tests validating
   architecture page table helpers.

 - Fix noexec bit handling when hardware doesn't support it.

 - Fix memleak and unsigned value compared with zero bugs in crypto
   code. Minor code simplification.

 - Fix crash during kdump with kasan enabled kernel.

 - Switch bug and alternatives from asm to asm_inline to improve
   inlining decisions.

 - Use 'depends on cc-option' for MARCH and TUNE options in Kconfig, add
   z13s and z14 ZR1 to TUNE descriptions.

 - Minor head64.S simplification.

 - Fix physical to logical CPU map for SMT.

 - Several cleanups in qdio code.

 - Other minor cleanups and fixes all over the code.

* tag 's390-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (41 commits)
  s390/cpumf: Adjust registration of s390 PMU device drivers
  s390/smp: fix physical to logical CPU map for SMT
  s390/early: move access registers setup in C code
  s390/head64: remove unnecessary vdso_per_cpu_data setup
  s390/early: move control registers setup in C code
  s390/kasan: support memcpy_real with TRACE_IRQFLAGS
  s390/crypto: Fix unsigned variable compared with zero
  s390/pkey: use memdup_user() to simplify code
  s390/pkey: fix memory leak within _copy_apqns_from_user()
  s390/disassembler: don't hide instruction addresses
  s390/cpum_sf: Assign error value to err variable
  s390/cpum_sf: Replace function name in debug statements
  s390/cpum_sf: Use consistant debug print format for sampling
  s390/unwind: drop unnecessary code around calling ftrace_graph_ret_addr()
  s390: add error handling to perf_callchain_kernel
  s390: always inline current_stack_pointer()
  s390/mm: add mm_pxd_folded() checks to pxd_free()
  s390/mm: properly clear _PAGE_NOEXEC bit when it is not supported
  s390/mm: simplify page table helpers for large entries
  s390/mm: make pmd/pud_bad() report large entries as bad
  ...
2019-11-25 17:23:53 -08:00
Thomas Richter 6a82e23f45 s390/cpumf: Adjust registration of s390 PMU device drivers
Linux-next commit titled "perf/core: Optimize perf_init_event()"
changed the semantics of PMU device driver registration.
It was done to speed up the lookup/handling of PMU device driver
specific events. It also enforces that only one PMU device
driver will be registered of type PERF_EVENT_RAW.

This change added these line in function perf_pmu_register():

  ...
  +       ret = idr_alloc(&pmu_idr, pmu, max, 0, GFP_KERNEL);
  +       if (ret < 0)
                goto free_pdc;
  +
  +       WARN_ON(type >= 0 && ret != type);

The warn_on generates a message. We have 3 PMU device drivers,
each registered as type PERF_TYPE_RAW.
The cf_diag device driver (arch/s390/kernel/perf_cpumf_cf_diag.c)
always hits the WARN_ON because it is the second PMU device driver
(after sampling device driver arch/s390/kernel/perf_cpumf_sf.c)
which is registered as type 4 (PERF_TYPE_RAW).
So when the sampling device driver is registered, ret has value 4.
When cf_diag device driver is registered with type 4,
ret has value of 5 and WARN_ON fires.

Adjust the PMU device drivers for s390 to support the new
semantics required by perf_pmu_register().

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-20 17:16:01 +01:00
Heiko Carstens 72a81ad9d6 s390/smp: fix physical to logical CPU map for SMT
If an SMT capable system is not IPL'ed from the first CPU the setup of
the physical to logical CPU mapping is broken: the IPL core gets CPU
number 0, but then the next core gets CPU number 1. Correct would be
that all SMT threads of CPU 0 get the subsequent logical CPU numbers.

This is important since a lot of code (like e.g. the CPU topology
code) assumes that CPU maps are setup like this. If the mapping is
broken the system will not IPL due to broken topology masks:

[    1.716341] BUG: arch topology broken
[    1.716342]      the SMT domain not a subset of the MC domain
[    1.716343] BUG: arch topology broken
[    1.716344]      the MC domain not a subset of the BOOK domain

This scenario can usually not happen since LPARs are always IPL'ed
from CPU 0 and also re-IPL is intiated from CPU 0. However older
kernels did initiate re-IPL on an arbitrary CPU. If therefore a re-IPL
from an old kernel into a new kernel is initiated this may lead to
crash.

Fix this by setting up the physical to logical CPU mapping correctly.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-20 12:58:13 +01:00
Vasily Gorbik c231359421 s390/early: move access registers setup in C code
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-20 12:58:13 +01:00
Vasily Gorbik b8ce1fa489 s390/head64: remove unnecessary vdso_per_cpu_data setup
vdso_per_cpu_data lowcore value is only needed for fully functional
exception handlers, which are activated in setup_lowcore_dat_off. The
same function does init vdso_per_cpu_data via vdso_alloc_boot_cpu.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-20 12:58:12 +01:00
Vasily Gorbik c02ee6a16a s390/early: move control registers setup in C code
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-20 12:58:12 +01:00
Ilya Leoshkevich 544f1d62e3 s390/disassembler: don't hide instruction addresses
Due to kptr_restrict, JITted BPF code is now displayed like this:

000000000b6ed1b2: ebdff0800024  stmg    %r13,%r15,128(%r15)
000000004cde2ba0: 41d0f040      la      %r13,64(%r15)
00000000fbad41b0: a7fbffa0      aghi    %r15,-96

Leaking kernel addresses to dmesg is not a concern in this case, because
this happens only when JIT debugging is explicitly activated, which only
root can do.

Use %px in this particular instance, and also to print an instruction
address in show_code and PCREL (e.g. brasl) arguments in print_insn.
While at present functionally equivalent to %016lx, %px is recommended
by Documentation/core-api/printk-formats.rst for such cases.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-12 11:24:10 +01:00
Thomas Richter 72fbcd057f s390/cpum_sf: Assign error value to err variable
When starting the CPU Measurement sampling facility using
qsi() function, this function may return an error value.
This error value is referenced in the else part of the
if statement to dump its value in a debug statement.
Right now this value is always zero because it has not been
assigned a value.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-12 11:24:10 +01:00
Thomas Richter c18388340c s390/cpum_sf: Replace function name in debug statements
Replace hard coded function names in debug statements
by the "%s ...", __func__ construct suggested by checkpatch.pl
script.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-12 11:24:10 +01:00
Thomas Richter d98b5d0728 s390/cpum_sf: Use consistant debug print format for sampling
Use consistant debug print format of the form variable
blank value. Also add leading 0x for all hex values.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-12 11:24:10 +01:00
Ingo Molnar 6d5a763c30 Linux 5.4-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl3IqJQeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGOiUH+gOEDwid5OODaFAd
 CggXugdFIlBZefKqGVNW5sjgX8pxFWHXuEMC8iNb6QXtQZdFrI6LFf9hhUDmzQtm
 6y1LPxxEiTZjObMEsBNylb7tyzgujFHcAlp0Zro3w/HLCqmYTSP3FF46i2u6KZfL
 XhkpM4X7R7qxlfpdhlfESv/ElRGocZe6SwXfC7pcPo5flFcmkdu9ijqhNd/6CZ/h
 Nf9rTsD/wEDVUelFbgVN+LJzlaB0tsyc4Zbof07n8OsFZjhdEOop8gfM/kTBLcyY
 6bh66SfDScdsNnC/l8csbPjSZRx+i+nQs67DyhGNnsSAFgHBZdC4Tb/2mDCwhCLR
 dUvuYZc=
 =1N6F
 -----END PGP SIGNATURE-----

Merge tag 'v5.4-rc7' into sched/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-11 08:34:59 +01:00
Miroslav Benes c2f2093e14 s390/unwind: drop unnecessary code around calling ftrace_graph_ret_addr()
The current code around calling ftrace_graph_ret_addr() is ifdeffed and
also tests if ftrace redirection is present on stack.
ftrace_graph_ret_addr() however performs the test internally and there
is a version for !CONFIG_FUNCTION_GRAPH_TRACER as well. The unnecessary
code can thus be dropped.

Link: http://lkml.kernel.org/r/20191029143904.24051-2-mbenes@suse.cz
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-05 12:30:15 +01:00
Kees Cook c9174047b4 vmlinux.lds.h: Replace RW_DATA_SECTION with RW_DATA
Rename RW_DATA_SECTION to RW_DATA. (Calling this a "section" is a lie,
since it's multiple sections and section flags cannot be applied to
the macro.)

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # s390
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Will Deacon <will@kernel.org>
Cc: x86-ml <x86@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: https://lkml.kernel.org/r/20191029211351.13243-14-keescook@chromium.org
2019-11-04 15:57:41 +01:00
Kees Cook 93240b3279 vmlinux.lds.h: Replace RO_DATA_SECTION with RO_DATA
Finish renaming RO_DATA_SECTION to RO_DATA. (Calling this a "section"
is a lie, since it's multiple sections and section flags cannot be
applied to the macro.)

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # s390
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Will Deacon <will@kernel.org>
Cc: x86-ml <x86@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: https://lkml.kernel.org/r/20191029211351.13243-13-keescook@chromium.org
2019-11-04 15:56:16 +01:00
Kees Cook eaf937075c vmlinux.lds.h: Move NOTES into RO_DATA
The .notes section should be non-executable read-only data. As such,
move it to the RO_DATA macro instead of being per-architecture defined.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # s390
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Will Deacon <will@kernel.org>
Cc: x86-ml <x86@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: https://lkml.kernel.org/r/20191029211351.13243-11-keescook@chromium.org
2019-11-04 15:34:41 +01:00
Kees Cook fbe6a8e618 vmlinux.lds.h: Move Program Header restoration into NOTES macro
In preparation for moving NOTES into RO_DATA, make the Program Header
assignment restoration be part of the NOTES macro itself.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # s390
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Will Deacon <will@kernel.org>
Cc: x86-ml <x86@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: https://lkml.kernel.org/r/20191029211351.13243-10-keescook@chromium.org
2019-11-04 15:34:39 +01:00
Kees Cook 441110a547 vmlinux.lds.h: Provide EMIT_PT_NOTE to indicate export of .notes
In preparation for moving NOTES into RO_DATA, provide a mechanism for
architectures that want to emit a PT_NOTE Program Header to do so.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # s390
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Will Deacon <will@kernel.org>
Cc: x86-ml <x86@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: https://lkml.kernel.org/r/20191029211351.13243-9-keescook@chromium.org
2019-11-04 15:34:38 +01:00
Kees Cook 6434efbd9a s390: Move RO_DATA into "text" PT_LOAD Program Header
In preparation for moving NOTES into RO_DATA, move RO_DATA back into the
"text" PT_LOAD Program Header, as done with other architectures. The
"data" PT_LOAD now starts with the writable data section.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Will Deacon <will@kernel.org>
Cc: x86-ml <x86@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: https://lkml.kernel.org/r/20191029211351.13243-7-keescook@chromium.org
2019-11-04 15:34:32 +01:00
Heiko Carstens 3d7efa4edd s390/idle: fix cpu idle time calculation
The idle time reported in /proc/stat sometimes incorrectly contains
huge values on s390. This is caused by a bug in arch_cpu_idle_time().

The kernel tries to figure out when a different cpu entered idle by
accessing its per-cpu data structure. There is an ordering problem: if
the remote cpu has an idle_enter value which is not zero, and an
idle_exit value which is zero, it is assumed it is idle since
"now". The "now" timestamp however is taken before the idle_enter
value is read.

Which in turn means that "now" can be smaller than idle_enter of the
remote cpu. Unconditionally subtracting idle_enter from "now" can thus
lead to a negative value (aka large unsigned value).

Fix this by moving the get_tod_clock() invocation out of the
loop. While at it also make the code a bit more readable.

A similar bug also exists for show_idle_time(). Fix this is as well.

Cc: <stable@vger.kernel.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-10-31 17:26:48 +01:00
Ilya Leoshkevich a1d863ac3e s390/unwind: fix mixing regs and sp
unwind_for_each_frame stops after the first frame if regs->gprs[15] <=
sp.

The reason is that in case regs are specified, the first frame should be
regs->psw.addr and the second frame should be sp->gprs[8]. However,
currently the second frame is regs->gprs[15], which confuses
outside_of_stack().

Fix by introducing a flag to distinguish this special case from
unwinding the interrupt handler, for which the current behavior is
appropriate.

Fixes: 78c98f9074 ("s390/unwind: introduce stack unwind API")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: stable@vger.kernel.org # v5.2+
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-10-31 17:26:48 +01:00
Ilya Leoshkevich effb83ccc8 s390: add error handling to perf_callchain_kernel
perf_callchain_kernel stops neither when it encounters a garbage
address, nor when it runs out of space. Fix both issues using x86
version as an inspiration.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-10-31 17:20:54 +01:00
Vasily Gorbik 6756dd9b89 s390/process: avoid custom stack unwinding in get_wchan
Currently get_wchan uses custom stack unwinding implementation which
relies on back_chain presence. Replace it with more abstract stack
unwinding api usage.

Suggested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-10-31 17:20:53 +01:00
Vasily Gorbik d3baaeb5ae s390: avoid double handling of "noexec" option
"noexec" option is already parsed during startup and its value is
exposed via noexec_disabled variable. Simply reuse that value during
machine facilities detection.

Suggested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-10-31 17:20:52 +01:00
Heiko Carstens f653e29bc2 s390/time: remove monotonic_clock()
Remove unused monotonic_clock() function.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-10-31 17:20:52 +01:00
Gerald Schaefer ac49303d9e s390/kaslr: add support for R_390_GLOB_DAT relocation type
Commit "bpf: Process in-kernel BTF" in linux-next introduced an undefined
__weak symbol, which results in an R_390_GLOB_DAT relocation type. That
is not yet handled by the KASLR relocation code, and the kernel stops with
the message "Unknown relocation type".

Add code to detect and handle R_390_GLOB_DAT relocation types and undefined
symbols.

Fixes: 805bc0bc23 ("s390/kernel: build a relocatable kernel")
Cc: <stable@vger.kernel.org> # v5.2+
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-10-22 17:55:51 +02:00
Christian Brauner fefad9ef58 seccomp: simplify secure_computing()
Afaict, the struct seccomp_data argument to secure_computing() is unused
by all current callers. So let's remove it.
The argument was added in [1]. It was added because having the arch
supply the syscall arguments used to be faster than having it done by
secure_computing() (cf. Andy's comment in [2]). This is not true anymore
though.

/* References */
[1]: 2f275de5d1 ("seccomp: Add a seccomp_data parameter secure_computing()")
[2]: https://lore.kernel.org/r/CALCETrU_fs_At-hTpr231kpaAd0z7xJN4ku-DvzhRU6cvcJA_w@mail.gmail.com

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-parisc@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-um@lists.infradead.org
Cc: x86@kernel.org
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20190924064420.6353-1-christian.brauner@ubuntu.com
Signed-off-by: Kees Cook <keescook@chromium.org>
2019-10-10 14:55:24 -07:00
Frederic Weisbecker f83eeb1a01 sched/cputime: Rename vtime_account_system() to vtime_account_kernel()
vtime_account_system() decides if we need to account the time to the
system (__vtime_account_system()) or to the guest (vtime_account_guest()).

So this function is a misnomer as we are on a higher level than
"system". All we know when we call that function is that we are
accounting kernel cputime. Whether it belongs to guest or system time
is a lower level detail.

Rename this function to vtime_account_kernel(). This will clarify things
and avoid too many underscored vtime_account_system() versions.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Link: https://lkml.kernel.org/r/20191003161745.28464-2-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-09 12:39:25 +02:00
Thomas Richter e14e59c125 s390/cpumf: Fix indentation in sampling device driver
Fix indentation in the s390 CPU Measuement Facility
sampling device dirver.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-10-01 09:41:36 +02:00
Thomas Richter 932bfc5aae s390/cpumsf: Check for CPU Measurement sampling
s390 IBM z15 introduces a check if the CPU Mesurement Facility
sampling is temporarily unavailable. If this is the case return -EBUSY
and abort the setup of CPU Measuement facility sampling.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-10-01 09:41:36 +02:00
Thomas Richter 8513458280 s390/cpumf: Use consistant debug print format
Use consistant debug print format of the form variable
blank value.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-10-01 09:41:36 +02:00
Linus Torvalds aefcf2f4b5 Merge branch 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull kernel lockdown mode from James Morris:
 "This is the latest iteration of the kernel lockdown patchset, from
  Matthew Garrett, David Howells and others.

  From the original description:

    This patchset introduces an optional kernel lockdown feature,
    intended to strengthen the boundary between UID 0 and the kernel.
    When enabled, various pieces of kernel functionality are restricted.
    Applications that rely on low-level access to either hardware or the
    kernel may cease working as a result - therefore this should not be
    enabled without appropriate evaluation beforehand.

    The majority of mainstream distributions have been carrying variants
    of this patchset for many years now, so there's value in providing a
    doesn't meet every distribution requirement, but gets us much closer
    to not requiring external patches.

  There are two major changes since this was last proposed for mainline:

   - Separating lockdown from EFI secure boot. Background discussion is
     covered here: https://lwn.net/Articles/751061/

   -  Implementation as an LSM, with a default stackable lockdown LSM
      module. This allows the lockdown feature to be policy-driven,
      rather than encoding an implicit policy within the mechanism.

  The new locked_down LSM hook is provided to allow LSMs to make a
  policy decision around whether kernel functionality that would allow
  tampering with or examining the runtime state of the kernel should be
  permitted.

  The included lockdown LSM provides an implementation with a simple
  policy intended for general purpose use. This policy provides a coarse
  level of granularity, controllable via the kernel command line:

    lockdown={integrity|confidentiality}

  Enable the kernel lockdown feature. If set to integrity, kernel features
  that allow userland to modify the running kernel are disabled. If set to
  confidentiality, kernel features that allow userland to extract
  confidential information from the kernel are also disabled.

  This may also be controlled via /sys/kernel/security/lockdown and
  overriden by kernel configuration.

  New or existing LSMs may implement finer-grained controls of the
  lockdown features. Refer to the lockdown_reason documentation in
  include/linux/security.h for details.

  The lockdown feature has had signficant design feedback and review
  across many subsystems. This code has been in linux-next for some
  weeks, with a few fixes applied along the way.

  Stephen Rothwell noted that commit 9d1f8be5cf ("bpf: Restrict bpf
  when kernel lockdown is in confidentiality mode") is missing a
  Signed-off-by from its author. Matthew responded that he is providing
  this under category (c) of the DCO"

* 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (31 commits)
  kexec: Fix file verification on S390
  security: constify some arrays in lockdown LSM
  lockdown: Print current->comm in restriction messages
  efi: Restrict efivar_ssdt_load when the kernel is locked down
  tracefs: Restrict tracefs when the kernel is locked down
  debugfs: Restrict debugfs when the kernel is locked down
  kexec: Allow kexec_file() with appropriate IMA policy when locked down
  lockdown: Lock down perf when in confidentiality mode
  bpf: Restrict bpf when kernel lockdown is in confidentiality mode
  lockdown: Lock down tracing and perf kprobes when in confidentiality mode
  lockdown: Lock down /proc/kcore
  x86/mmiotrace: Lock down the testmmiotrace module
  lockdown: Lock down module params that specify hardware parameters (eg. ioport)
  lockdown: Lock down TIOCSSERIAL
  lockdown: Prohibit PCMCIA CIS storage when the kernel is locked down
  acpi: Disable ACPI table override if the kernel is locked down
  acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
  ACPI: Limit access to custom_method when the kernel is locked down
  x86/msr: Restrict MSR access when the kernel is locked down
  x86: Lock down IO port access when the kernel is locked down
  ...
2019-09-28 08:14:15 -07:00
Linus Torvalds f1f2f614d5 Merge branch 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity updates from Mimi Zohar:
 "The major feature in this time is IMA support for measuring and
  appraising appended file signatures. In addition are a couple of bug
  fixes and code cleanup to use struct_size().

  In addition to the PE/COFF and IMA xattr signatures, the kexec kernel
  image may be signed with an appended signature, using the same
  scripts/sign-file tool that is used to sign kernel modules.

  Similarly, the initramfs may contain an appended signature.

  This contained a lot of refactoring of the existing appended signature
  verification code, so that IMA could retain the existing framework of
  calculating the file hash once, storing it in the IMA measurement list
  and extending the TPM, verifying the file's integrity based on a file
  hash or signature (eg. xattrs), and adding an audit record containing
  the file hash, all based on policy. (The IMA support for appended
  signatures patch set was posted and reviewed 11 times.)

  The support for appended signature paves the way for adding other
  signature verification methods, such as fs-verity, based on a single
  system-wide policy. The file hash used for verifying the signature and
  the signature, itself, can be included in the IMA measurement list"

* 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
  ima: ima_api: Use struct_size() in kzalloc()
  ima: use struct_size() in kzalloc()
  sefltest/ima: support appended signatures (modsig)
  ima: Fix use after free in ima_read_modsig()
  MODSIGN: make new include file self contained
  ima: fix freeing ongoing ahash_request
  ima: always return negative code for error
  ima: Store the measurement again when appraising a modsig
  ima: Define ima-modsig template
  ima: Collect modsig
  ima: Implement support for module-style appended signatures
  ima: Factor xattr_verify() out of ima_appraise_measurement()
  ima: Add modsig appraise_type option for module-style appended signatures
  integrity: Select CONFIG_KEYS instead of depending on it
  PKCS#7: Introduce pkcs7_get_digest()
  PKCS#7: Refactor verify_pkcs7_signature()
  MODSIGN: Export module signature definitions
  ima: initialize the "template" field with the default template
2019-09-27 19:37:27 -07:00
Vasily Gorbik f3122a79a1 s390/topology: avoid firing events before kobjs are created
arch_update_cpu_topology is first called from:
kernel_init_freeable->sched_init_smp->sched_init_domains

even before cpus has been registered in:
kernel_init_freeable->do_one_initcall->s390_smp_init

Do not trigger kobject_uevent change events until cpu devices are
actually created. Fixes the following kasan findings:

BUG: KASAN: global-out-of-bounds in kobject_uevent_env+0xb40/0xee0
Read of size 8 at addr 0000000000000020 by task swapper/0/1

BUG: KASAN: global-out-of-bounds in kobject_uevent_env+0xb36/0xee0
Read of size 8 at addr 0000000000000018 by task swapper/0/1

CPU: 0 PID: 1 Comm: swapper/0 Tainted: G    B
Hardware name: IBM 3906 M04 704 (LPAR)
Call Trace:
([<0000000143c6db7e>] show_stack+0x14e/0x1a8)
 [<0000000145956498>] dump_stack+0x1d0/0x218
 [<000000014429fb4c>] print_address_description+0x64/0x380
 [<000000014429f630>] __kasan_report+0x138/0x168
 [<0000000145960b96>] kobject_uevent_env+0xb36/0xee0
 [<0000000143c7c47c>] arch_update_cpu_topology+0x104/0x108
 [<0000000143df9e22>] sched_init_domains+0x62/0xe8
 [<000000014644c94a>] sched_init_smp+0x3a/0xc0
 [<0000000146433a20>] kernel_init_freeable+0x558/0x958
 [<000000014599002a>] kernel_init+0x22/0x160
 [<00000001459a71d4>] ret_from_fork+0x28/0x30
 [<00000001459a71dc>] kernel_thread_starter+0x0/0x10

Cc: stable@vger.kernel.org
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-09-23 23:27:52 +02:00
Thomas Richter 2cb549a821 s390/cpum_sf: Support ioctl PERF_EVENT_IOC_PERIOD
A perf_event can be set up to deliver overflow notifications
via SIGIO signal.  The setup of the event is:

 1. create event with perf_event_open()
 2. assign it a signal for I/O notification with fcntl()
 3. Install signal handler and consume samples

The initial setup of perf_event_open() determines the
period/frequency time span needed to elapse before each signal
is delivered to the user process.

While the event is active, system call
ioctl(.., PERF_EVENT_IOC_PERIOD, value) can be used the change
the frequency/period time span of the active event.
The remaining signal handler invocations honour the new value.

This does not work on s390. In fact the time span does not change
regardless of ioctl's third argument 'value'. The call succeeds
but the time span does not change.

Support this behavior and make it common with other platforms.
This is achieved by changing the interval value of the sampling
control block accordingly and feed this new value every time
the event is enabled using pmu_event_enable().

Before this change the interval value was set only once at
pmu_event_add() and never changed.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-09-19 12:56:07 +02:00
Linus Torvalds d590284419 s390 updates for the 5.4 merge window
- Add support for IBM z15 machines.
 
 - Add SHA3 and CCA AES cipher key support in zcrypt and pkey refactoring.
 
 - Move to arch_stack_walk infrastructure for the stack unwinder.
 
 - Various kasan fixes and improvements.
 
 - Various command line parsing fixes.
 
 - Improve decompressor phase debuggability.
 
 - Lift no bss usage restriction for the early code.
 
 - Use refcount_t for reference counters for couple of places in
   mm code.
 
 - Logging improvements and return code fix in vfio-ccw code.
 
 - Couple of zpci fixes and minor refactoring.
 
 - Remove some outdated documentation.
 
 - Fix secure boot detection.
 
 - Other various minor code clean ups.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAl1/pRoACgkQjYWKoQLX
 FBjxLQf/Y1nlmoc8URLqaqfNTczIvUzdfXuahI7L75RoIIiqHtcHBrVwauSr7Lma
 XVRzK/+6q0UPISrOIZEEtQKsMMM7rGuUv/+XTyrOB/Tsc31kN2EIRXltfXI/lkb8
 BZdgch4Xs2rOD7y6TvqpYJsXYXsnLMWwCk8V+48V/pok4sEgMDgh0bTQRHPHYmZ6
 1cv8ZQ0AeuVxC6ChM30LhajGRPkYd8RQ82K7fU7jxT0Tjzu66SyrW3pTwA5empBD
 RI2yBZJ8EXwJyTCpvN8NKiBgihDs9oUZl61Dyq3j64Mb1OuNUhxXA/8jmtnGn0ok
 O9vtImCWzExhjSMkvotuhHEC05nEEQ==
 =LCgE
 -----END PGP SIGNATURE-----

Merge tag 's390-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Vasily Gorbik:

 - Add support for IBM z15 machines.

 - Add SHA3 and CCA AES cipher key support in zcrypt and pkey
   refactoring.

 - Move to arch_stack_walk infrastructure for the stack unwinder.

 - Various kasan fixes and improvements.

 - Various command line parsing fixes.

 - Improve decompressor phase debuggability.

 - Lift no bss usage restriction for the early code.

 - Use refcount_t for reference counters for couple of places in mm
   code.

 - Logging improvements and return code fix in vfio-ccw code.

 - Couple of zpci fixes and minor refactoring.

 - Remove some outdated documentation.

 - Fix secure boot detection.

 - Other various minor code clean ups.

* tag 's390-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (48 commits)
  s390: remove pointless drivers-y in drivers/s390/Makefile
  s390/cpum_sf: Fix line length and format string
  s390/pci: fix MSI message data
  s390: add support for IBM z15 machines
  s390/crypto: Support for SHA3 via CPACF (MSA6)
  s390/startup: add pgm check info printing
  s390/crypto: xts-aes-s390 fix extra run-time crypto self tests finding
  vfio-ccw: fix error return code in vfio_ccw_sch_init()
  s390: vfio-ap: fix warning reset not completed
  s390/base: remove unused s390_base_mcck_handler
  s390/sclp: Fix bit checked for has_sipl
  s390/zcrypt: fix wrong handling of cca cipher keygenflags
  s390/kasan: add kdump support
  s390/setup: avoid using strncmp with hardcoded length
  s390/sclp: avoid using strncmp with hardcoded length
  s390/module: avoid using strncmp with hardcoded length
  s390/pci: avoid using strncmp with hardcoded length
  s390/kaslr: reserve memory for kasan usage
  s390/mem_detect: provide single get_mem_detect_end
  s390/cmma: reuse kstrtobool for option value parsing
  ...
2019-09-17 14:04:43 -07:00
Thomas Richter 03e9e42f79 s390/cpum_sf: Fix line length and format string
Rewrite some lines to match line length and replace
format string 0x%x to %#x.  Add and remove blank line.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-09-16 13:21:51 +02:00
Martin Schwidefsky a0e2251132 s390: add support for IBM z15 machines
Add detection for machine types 0x8562 and 8x8561 and set the ELF platform
name to z15. Add the miscellaneous-instruction-extension 3 facility to
the list of facilities for z15.

And allow to generate code that only runs on a z15 machine.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2019-09-13 12:19:14 +02:00
Matthew Garrett 45893a0abe kexec: Fix file verification on S390
I accidentally typoed this #ifdef, so verification would always be
disabled.

Signed-off-by: Matthew Garrett <mjg59@google.com>
Reported-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
2019-09-10 13:27:51 +01:00
Vasily Gorbik 512222789c s390/base: remove unused s390_base_mcck_handler
s390_base_mcck_handler was used during system reset if diag308 set was
not available. But after commit d485235b00 ("s390: assume diag308 set
always works") is a dead code and could be removed.

Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-09-03 13:53:56 +02:00
Vasily Gorbik d0b319843b s390/setup: avoid using strncmp with hardcoded length
Replace strncmp usage in console mode setup code with simple strcmp.
Replace strncmp which is used for prefix comparison with str_has_prefix.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-08-29 15:34:58 +02:00
Vasily Gorbik 54fb07d030 s390/sclp: avoid using strncmp with hardcoded length
"earlyprintk" option documentation does not clearly state which
platform supports which additional values (e.g. ",keep"). Preserve old
option behaviour and reuse str_has_prefix instead of strncmp for prefix
testing.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-08-29 15:34:58 +02:00
Vasily Gorbik b29cd7c4c4 s390/module: avoid using strncmp with hardcoded length
Reuse str_has_prefix instead of strncmp with hardcoded length to
make the intent of a comparison more obvious.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-08-29 15:34:57 +02:00
Vasily Gorbik 3d64436453 s390/vdso: reuse kstrtobool for option value parsing
"vdso" option setup already recognises integer and textual values. Yet
kstrtobool is a more common way to parse boolean values, reuse it to
unify option value parsing behavior and simplify code a bit.

While at it, __setup value parsing callbacks are expected to return
1 when an option is recognized, and returning any other value won't
trigger any error message currently, so simply return 1.

Also don't change default vdso_enabled value of 1 when "vdso" option
value is invalid.

Reviewed-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-08-26 12:51:17 +02:00
Vasily Gorbik e991e5bb11 s390/stacktrace: use common arch_stack_walk infrastructure
Use common arch_stack_walk infrastructure to avoid duplicated code and
avoid taking care of the stack storage and filtering.

Common code also uses try_get_task_stack/put_task_stack when needed which
have been missing in our code, which also solves potential problem for us.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-08-21 12:58:53 +02:00
Vasily Gorbik 2c7fa8a11c s390/kasan: avoid report in get_wchan
Reading other running task's stack can be a dangerous endeavor. Kasan
stack memory access instrumentation includes special prologue and epilogue
to mark/remove red zones in shadow memory between stack variables. For
that reason there is always a race between a task reading value in other
task's stack and that other task returning from a function and entering
another one generating different red zones pattern.

To avoid kasan reports simply perform uninstrumented memory reads.

Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-08-21 12:58:53 +02:00
Vasily Gorbik 8769f610fe s390/process: avoid potential reading of freed stack
With THREAD_INFO_IN_TASK (which is selected on s390) task's stack usage
is refcounted and should always be protected by get/put when touching
other task's stack to avoid race conditions with task's destruction code.

Fixes: d5c352cdd0 ("s390: move thread_info into task_struct")
Cc: stable@vger.kernel.org # v4.10+
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-08-21 12:58:52 +02:00
Vasily Gorbik 2e83e0eb85 s390: clean .bss before running uncompressed kernel
Clean uncompressed kernel .bss section in the startup code before
the uncompressed kernel is executed. At this point of time initrd and
certificates have been already rescued. Uncompressed kernel .bss size
is known from vmlinux_info. It is also taken into consideration during
uncompressed kernel positioning by kaslr (so it is safe to clean it).

With that uncompressed kernel is starting with .bss section zeroed and
no .bss section usage restrictions apply. Which makes chkbss checks for
uncompressed kernel objects obsolete and they can be removed.

early_nobss.c is also not needed anymore. Parts of it which are still
relevant are moved to early.c. Kasan initialization code is now called
directly from head64 (early.c is instrumented and should not be
executed before kasan shadow memory is set up).

Reviewed-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-08-21 12:58:52 +02:00
Vasily Gorbik 59793c5ab9 s390: move vmalloc option parsing to startup code
Few other crucial memory setup options are already handled in
the startup code. Those values are needed by kaslr and kasan
implementations. "vmalloc" is the last piece required for future
improvements such as early decision on kernel page levels depth required
for actual memory setup, as well as vmalloc memory area access monitoring
in kasan.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-08-21 12:41:43 +02:00
Jiri Bohac 99d5cadfde kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG and KEXEC_SIG_FORCE
This is a preparatory patch for kexec_file_load() lockdown.  A locked down
kernel needs to prevent unsigned kernel images from being loaded with
kexec_file_load().  Currently, the only way to force the signature
verification is compiling with KEXEC_VERIFY_SIG.  This prevents loading
usigned images even when the kernel is not locked down at runtime.

This patch splits KEXEC_VERIFY_SIG into KEXEC_SIG and KEXEC_SIG_FORCE.
Analogous to the MODULE_SIG and MODULE_SIG_FORCE for modules, KEXEC_SIG
turns on the signature verification but allows unsigned images to be
loaded.  KEXEC_SIG_FORCE disallows images without a valid signature.

Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Matthew Garrett <mjg59@google.com>
cc: kexec@lists.infradead.org
Signed-off-by: James Morris <jmorris@namei.org>
2019-08-19 21:54:15 -07:00
Heiko Carstens 404861e15b s390/vdso: map vdso also for statically linked binaries
s390 does not map the vdso for statically linked binaries, assuming
that this doesn't make sense. See commit fc5243d98a ("[S390]
arch_setup_additional_pages arguments").

However with glibc commit d665367f596d ("linux: Enable vDSO for static
linking as default (BZ#19767)") and commit 5e855c895401 ("s390: Enable
VDSO for static linking") the vdso is also used for statically linked
binaries - if the kernel would make it available.

Therefore map the vdso always, just like all other architectures.

Reported-by: Stefan Liebler <stli@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-08-09 11:03:28 +02:00
Vasily Gorbik 24350fdadb s390: put _stext and _etext into .text section
Perf relies on _etext and _stext symbols being one of 't', 'T', 'v' or
'V'. Put them into .text section to guarantee that.

Also moves padding to page boundary inside .text which has an effect that
.text section is now padded with nops rather than 0's, which apparently
has been the initial intention for specifying 0x0700 fill expression.

Reported-by: Thomas Richter <tmricht@linux.ibm.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Suggested-by: Andreas Krebbel <krebbel@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-08-06 13:58:35 +02:00
Vasily Gorbik b9f23b7376 s390/head64: cleanup unused labels
Cleanup labels in head64 some of which are not being used since git
recorded history.

Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-08-06 13:58:35 +02:00
Vasily Gorbik fd0c7435d7 s390/unwind: remove stack recursion warning
Remove pointless stack recursion on stack type ... warning, which
only confuses people. There is no way to make backchain unwinder 100%
reliable. When a task is interrupted in-between stack frame allocation
and backchain write instructions new stack frame backchain pointer is
left uninitialized (there are also sometimes additional instruction
in-between stack frame allocation and backchain write instructions due
to gcc shrink-wrapping). In attempt to unwind such stack the unwinder
would still try to use that invalid backchain value and perform all kind
of sanity checks on it to make sure we are not pointed out of stack. In
some cases that invalid backchain value would be 0 and we would falsely
treat next stackframe as pt_regs and again gprs[15] in those pt_regs
might happen to point at some address within the task's stack.

Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-08-06 13:58:35 +02:00
Vasily Gorbik 218ddd5acf s390/setup: adjust start_code of init_mm to _text
After some investigation it doesn't look like init_mm fields
start_code/end_code are used anywhere besides potentially in dump_mm for
debugging purposes. Originally the value of 0 for start_code reflected
the presence of lowcore and early boot code. But with kaslr in place
start_code/end_code range should not span over unoccupied by the code
segment memory. So, adjust init_mm start_code to point at the beginning
of the code segment like other architectures do it.

Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-08-06 13:58:34 +02:00
Vasily Gorbik a287a49e67 s390/protvirt: avoid memory sharing for diag 308 set/store
This reverts commit db9492cef4 ("s390/protvirt: add memory sharing for
diag 308 set/store") which due to ultravisor implementation change is
not needed after all.

Fixes: db9492cef4 ("s390/protvirt: add memory sharing for diag 308 set/store")
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-08-06 13:58:34 +02:00
Thiago Jung Bauermann c8424e776b MODSIGN: Export module signature definitions
IMA will use the module_signature format for append signatures, so export
the relevant definitions and factor out the code which verifies that the
appended signature trailer is valid.

Also, create a CONFIG_MODULE_SIG_FORMAT option so that IMA can select it
and be able to use mod_check_sig() without having to depend on either
CONFIG_MODULE_SIG or CONFIG_MODULES.

s390 duplicated the definition of struct module_signature so now they can
use the new <linux/module_signature.h> header instead.

Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Acked-by: Jessica Yu <jeyu@kernel.org>
Reviewed-by: Philipp Rudo <prudo@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2019-08-05 18:39:56 -04:00
Vasily Gorbik 1877011a35 s390/kexec: add missing include to machine_kexec_reloc.c
Include <asm/kexec.h> into machine_kexec_reloc.c to expose
arch_kexec_do_relocs declaration and avoid the following sparse warnings:
arch/s390/kernel/machine_kexec_reloc.c:4:5: warning: symbol 'arch_kexec_do_relocs' was not declared. Should it be static?
arch/s390/boot/../kernel/machine_kexec_reloc.c:4:5: warning: symbol 'arch_kexec_do_relocs' was not declared. Should it be static?

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-29 18:05:03 +02:00
Vasily Gorbik 06f9895fda s390/perf: make cf_diag_csd static
Since there is really no reason for cf_diag_csd per cpu variable to be
globally visible make it static to avoid the following sparse warning:
arch/s390/kernel/perf_cpum_cf_diag.c:37:1: warning: symbol 'cf_diag_csd' was not declared. Should it be static?

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-29 18:05:02 +02:00
Vasily Gorbik 5518aed82d s390: wire up clone3 system call
Tested (64-bit and compat mode) using program from
http://lkml.kernel.org/r/20190604212930.jaaztvkent32b7d3@brauner.io
with the following:
       return syscall(__NR_clone, flags, 0, pidfd, 0, 0);
changed to:
       return syscall(__NR_clone, 0, flags, pidfd, 0, 0);
due to CLONE_BACKWARDS2.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2019-07-23 10:45:53 +02:00
Matteo Croce eec4844fae proc/sysctl: add shared variables for range check
In the sysctl code the proc_dointvec_minmax() function is often used to
validate the user supplied value between an allowed range.  This
function uses the extra1 and extra2 members from struct ctl_table as
minimum and maximum allowed value.

On sysctl handler declaration, in every source file there are some
readonly variables containing just an integer which address is assigned
to the extra1 and extra2 members, so the sysctl range is enforced.

The special values 0, 1 and INT_MAX are very often used as range
boundary, leading duplication of variables like zero=0, one=1,
int_max=INT_MAX in different source files:

    $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l
    248

Add a const int array containing the most commonly used values, some
macros to refer more easily to the correct array member, and use them
instead of creating a local one for every object file.

This is the bloat-o-meter output comparing the old and new binary
compiled with the default Fedora config:

    # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o
    add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164)
    Data                                         old     new   delta
    sysctl_vals                                    -      12     +12
    __kstrtab_sysctl_vals                          -      12     +12
    max                                           14      10      -4
    int_max                                       16       -     -16
    one                                           68       -     -68
    zero                                         128      28    -100
    Total: Before=20583249, After=20583085, chg -0.00%

[mcroce@redhat.com: tipc: remove two unused variables]
  Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com
[akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c]
[arnd@arndb.de: proc/sysctl: make firmware loader table conditional]
  Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de
[akpm@linux-foundation.org: fix fs/eventpoll.c]
Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Aaron Tomlin <atomlin@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18 17:08:07 -07:00
Christian Brauner 1a271a68e0
arch: mark syscall number 435 reserved for clone3
A while ago Arnd made it possible to give new system calls the same
syscall number on all architectures (except alpha). To not break this
nice new feature let's mark 435 for clone3 as reserved on all
architectures that do not yet implement it.
Even if an architecture does not plan to implement it this ensures that
new system calls coming after clone3 will have the same number on all
architectures.

Signed-off-by: Christian Brauner <christian@brauner.io>
Cc: linux-arch@vger.kernel.org
Cc: linux-alpha@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-mips@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Link: https://lore.kernel.org/r/20190714192205.27190-2-christian@brauner.io
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Christian Brauner <christian@brauner.io>
2019-07-15 00:39:33 +02:00
Linus Torvalds aabfea8dc9 s390 updates for the 5.3 merge window #2
- Fix integer overflow during stack frame unwind with invalid backchain.
 
  - Cleanup unused symbol export in zcrypt code.
 
  - Fix MIO addressing control activation in PCI code and expose its
    usage via sysfs.
 
  - Fix kernel image signature verification report presence detection.
 
  - Fix irq registration in vfio-ap code.
 
  - Add CPU measurement counters for newer machines.
 
  - Add base DASD thin provisioning support and code cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAl0oirgACgkQjYWKoQLX
 FBiL7gf+MOToP48a3h+lLcIrbH48B2+OR3W+kIID5qR0GtDoPU2gd2HtSwrn9frs
 jgh5ZLwGgrnSU/MqFXpCwDfD7x0mSWL/HunlSck1zf6h22LmuYjcjntWzHTS7csv
 gTFKNSQX4AdgZAdTEeqC+Axem2ygtpELhe35NPT8HBQD52twe1XkniDl8gu7/zj2
 WdPAGsm30wlodIUFt+di4IQAxSQGkDl9nKN+IIREPjdq/kFKP3pFKzXf5k0FduVk
 v+VOb+KENzeJoqPqRD/GS+vdVm52gqRbKrJVm+p5NlltGDPoa0m3u4Cjy2ZVA2/t
 +BKv4aGHseb6x89XTPIaf1dTLDNf6w==
 =GxOr
 -----END PGP SIGNATURE-----

Merge tag 's390-5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull more s390 updates from Vasily Gorbik:

 - Fix integer overflow during stack frame unwind with invalid
   backchain.

 - Cleanup unused symbol export in zcrypt code.

 - Fix MIO addressing control activation in PCI code and expose its
   usage via sysfs.

 - Fix kernel image signature verification report presence detection.

 - Fix irq registration in vfio-ap code.

 - Add CPU measurement counters for newer machines.

 - Add base DASD thin provisioning support and code cleanups.

* tag 's390-5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (21 commits)
  s390/unwind: avoid int overflow in outside_of_stack
  s390/zcrypt: remove the exporting of ap_query_configuration
  s390/pci: add mio_enabled attribute
  s390: fix setting of mio addressing control
  s390/ipl: Fix detection of has_secure attribute
  s390: vfio-ap: fix irq registration
  s390/cpumf: Add extended counter set definitions for model 8561 and 8562
  s390/dasd: Handle out-of-space constraint
  s390/dasd: Add discard support for ESE volumes
  s390/dasd: Use ALIGN_DOWN macro
  s390/dasd: Make dasd_setup_queue() a discipline function
  s390/dasd: Add new ioctl to release space
  s390/dasd: Add dasd_sleep_on_queue_interruptible()
  s390/dasd: Add missing intensity definition
  s390/dasd: Fix whitespace
  s390/dasd: Add dynamic formatting support for ESE volumes
  s390/dasd: Recognise data for ESE volumes
  s390/dasd: Put sub-order definitions in a separate section
  s390/dasd: Make layout analysis ESE compatible
  s390/dasd: Remove old defines and function
  ...
2019-07-12 15:39:22 -07:00
Vasily Gorbik 9a15919041 s390/unwind: avoid int overflow in outside_of_stack
When current task is interrupted in-between stack frame allocation
and backchain write instructions new stack frame backchain pointer
is left uninitialized. That invalid backchain value is passed into
outside_of_stack for sanity check. Make sure int overflow does not happen
by subtracting stack_frame size from the stack "end" rather than adding
it to "random" backchain value.

Fixes: 41b0474c1b1c ("s390/unwind: introduce stack unwind API")
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-11 20:40:02 +02:00
Sebastian Ott 9964f396f1 s390: fix setting of mio addressing control
Move enablement of mio addressing control from detect_machine_facilities
to pci_base_init. detect_machine_facilities runs so early that the
static branches have not been toggled yet, thus mio addressing control
was always off. In pci_base_init we have to use the SMP aware
ctl_set_bit though.

Fixes: 833b441ec0 ("s390: enable processes for mio instructions")
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-11 20:40:02 +02:00
Philipp Rudo 1b2be2071a s390/ipl: Fix detection of has_secure attribute
Use the correct bit for detection of the machine capability associated
with the has_secure attribute. It is expected that the underlying
platform (including hypervisors) unsets the bit when they don't provide
secure ipl for their guests.

Fixes: c9896acc78 ("s390/ipl: Provide has_secure sysfs attribute")
Cc: stable@vger.kernel.org # 5.2
Signed-off-by: Philipp Rudo <prudo@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-11 20:40:02 +02:00
Thomas Richter 820bace734 s390/cpumf: Add extended counter set definitions for model 8561 and 8562
Add the extended counter set definitions for s390 machine types
8561 and  8262. They are identical with machine types 3906 and
3907.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-11 20:40:01 +02:00
Linus Torvalds 5450e8a316 pidfd-updates-v5.3
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXSMhUgAKCRCRxhvAZXjc
 okkiAQC3Hlg/O2JoIb4PqgEvBkpHSdVxyuWagn0ksjACW9ANKQEAl5OadMhvOq16
 UHGhKlpE/M8HflknIffoEGlIAWHrdwU=
 =7kP5
 -----END PGP SIGNATURE-----

Merge tag 'pidfd-updates-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull pidfd updates from Christian Brauner:
 "This adds two main features.

   - First, it adds polling support for pidfds. This allows process
     managers to know when a (non-parent) process dies in a race-free
     way.

     The notification mechanism used follows the same logic that is
     currently used when the parent of a task is notified of a child's
     death. With this patchset it is possible to put pidfds in an
     {e}poll loop and get reliable notifications for process (i.e.
     thread-group) exit.

   - The second feature compliments the first one by making it possible
     to retrieve pollable pidfds for processes that were not created
     using CLONE_PIDFD.

     A lot of processes get created with traditional PID-based calls
     such as fork() or clone() (without CLONE_PIDFD). For these
     processes a caller can currently not create a pollable pidfd. This
     is a problem for Android's low memory killer (LMK) and service
     managers such as systemd.

  Both patchsets are accompanied by selftests.

  It's perhaps worth noting that the work done so far and the work done
  in this branch for pidfd_open() and polling support do already see
  some adoption:

   - Android is in the process of backporting this work to all their LTS
     kernels [1]

   - Service managers make use of pidfd_send_signal but will need to
     wait until we enable waiting on pidfds for full adoption.

   - And projects I maintain make use of both pidfd_send_signal and
     CLONE_PIDFD [2] and will use polling support and pidfd_open() too"

[1] https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.9+backport%22
    https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.14+backport%22
    https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.19+backport%22

[2] aab6e3eb73/src/lxc/start.c (L1753)

* tag 'pidfd-updates-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  tests: add pidfd_open() tests
  arch: wire-up pidfd_open()
  pid: add pidfd_open()
  pidfd: add polling selftests
  pidfd: add polling support
2019-07-10 22:17:21 -07:00
Linus Torvalds cf2d213e49 Power management updates for 5.3-rc1
- Improve the handling of shared ACPI power resources in the PCI
    bus type layer (Mika Westerberg).
 
  - Make the PCI layer take link delays required by the PCIe spec
    into account as appropriate and avoid polling devices in D3cold
    for PME (Mika Westerberg).
 
  - Fix some corner case issues in ACPI device power management and
    in the PCI bus type layer, optimiza and clean up the handling of
    runtime-suspended PCI devices during system-wide transitions to
    sleep states (Rafael Wysocki).
 
  - Rework hibernation handling in the ACPI core and the PCI bus type
    to resume runtime-suspended devices before hibernation (which
    allows some functional problems to be avoided) and fix some ACPI
    power management issues related to hiberation (Rafael Wysocki).
 
  - Extend the operating performance points (OPP) framework to support
    a wider range of devices (Rajendra Nayak, Stehpen Boyd).
 
  - Fix issues related to genpd_virt_devs and issues with platforms
    using the set_opp() callback in the OPP framework (Viresh Kumar,
    Dmitry Osipenko).
 
  - Add new cpufreq driver for Raspberry Pi (Nicolas Saenz Julienne).
 
  - Add new cpufreq driver for imx8m and imx7d chips (Leonard Crestez).
 
  - Fix and clean up the pcc-cpufreq, brcmstb-avs-cpufreq, s5pv210,
    and armada-37xx cpufreq drivers (David Arcari, Florian Fainelli,
    Paweł Chmiel, YueHaibing).
 
  - Clean up and fix the cpufreq core (Viresh Kumar, Daniel Lezcano).
 
  - Fix minor issue in the ACPI system sleep support code and export
    one function from it (Lenny Szubowicz, Dexuan Cui).
 
  - Clean up assorted pieces of PM code and documentation (Kefeng Wang,
    Andy Shevchenko, Bart Van Assche, Greg Kroah-Hartman, Fuqian Huang,
    Geert Uytterhoeven, Mathieu Malaterre, Rafael Wysocki).
 
  - Update the pm-graph utility to v5.4 (Todd Brandt).
 
  - Fix and clean up the cpupower utility (Abhishek Goel, Nick Black).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl0jK18SHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxgEAP/RbPe71Y9ufKu64L3EtgV6mS9iuLEhux
 /Ad9laLNeM1b0oceT3QxGk7xQCacnZcBlcaqXVWI4NRsn4RBZp1cYZngpgJ9DP6E
 ONr8hzyzDOMVReba3XJIF8H+WoTKjywMYtFutjdx6dRe2ZJLutqnuZ0JbH1YSSK7
 IxOt0mJVALf2M4Zz7F17d+n3yGE/4xAPBVbj/rBRcTEsGYlR/Hoxs7iF6EBau7fy
 R5drUH6XSrWk8adc+z7l3BTGqMMYj9deRSfAWB3wpM4YK7Fv7msX/amBoGINkdn6
 xP/ZcrHvhKKzE89MS8OUGP4rGVwq+7tu6mktnYL/tpKgutJqqx5LVvrLsGDSWr+W
 /aJExN8Eb4Jh98C6vog3XUJoqBxkVGbU8qoCBU3jlFsaznFEWjW9IKhBHs5CIaqz
 MXte6AsJ8lvFzxILjvx0m2206wNpRJRXYLX3a/BSBxa4OgOESjIpBTmbPfOwbxwj
 8z9hIVlDTTDtnF6BEyDQr1fjPi3Mxl7ibGnoqRrJm36VKBy9VZNNwG/0Y2oSvm6k
 Es8CiTWA3A/46dCZxGr18/9Vbfxn1Yvg9QZ1lCE5Fqij+0F2yRApbHZgUro+1rji
 6J8OWC5r5JccdKuGHh4RH/asMFhD0cAR/gUsRzS4dz/wz2jVIN1FstdD1aKN5GBy
 d0lchx/AKR5H
 =aBN3
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These update PCI and ACPI power management (improved handling of ACPI
  power resources and PCIe link delays, fixes related to corner cases,
  hibernation handling rework), fix and extend the operating performance
  points (OPP) framework, add new cpufreq drivers for Raspberry Pi and
  imx8m chips, update some other cpufreq drivers, clean up assorted
  pieces of PM code and documentation and update tools.

  Specifics:

   - Improve the handling of shared ACPI power resources in the PCI bus
     type layer (Mika Westerberg).

   - Make the PCI layer take link delays required by the PCIe spec into
     account as appropriate and avoid polling devices in D3cold for PME
     (Mika Westerberg).

   - Fix some corner case issues in ACPI device power management and in
     the PCI bus type layer, optimiza and clean up the handling of
     runtime-suspended PCI devices during system-wide transitions to
     sleep states (Rafael Wysocki).

   - Rework hibernation handling in the ACPI core and the PCI bus type
     to resume runtime-suspended devices before hibernation (which
     allows some functional problems to be avoided) and fix some ACPI
     power management issues related to hiberation (Rafael Wysocki).

   - Extend the operating performance points (OPP) framework to support
     a wider range of devices (Rajendra Nayak, Stehpen Boyd).

   - Fix issues related to genpd_virt_devs and issues with platforms
     using the set_opp() callback in the OPP framework (Viresh Kumar,
     Dmitry Osipenko).

   - Add new cpufreq driver for Raspberry Pi (Nicolas Saenz Julienne).

   - Add new cpufreq driver for imx8m and imx7d chips (Leonard Crestez).

   - Fix and clean up the pcc-cpufreq, brcmstb-avs-cpufreq, s5pv210, and
     armada-37xx cpufreq drivers (David Arcari, Florian Fainelli, Paweł
     Chmiel, YueHaibing).

   - Clean up and fix the cpufreq core (Viresh Kumar, Daniel Lezcano).

   - Fix minor issue in the ACPI system sleep support code and export
     one function from it (Lenny Szubowicz, Dexuan Cui).

   - Clean up assorted pieces of PM code and documentation (Kefeng Wang,
     Andy Shevchenko, Bart Van Assche, Greg Kroah-Hartman, Fuqian Huang,
     Geert Uytterhoeven, Mathieu Malaterre, Rafael Wysocki).

   - Update the pm-graph utility to v5.4 (Todd Brandt).

   - Fix and clean up the cpupower utility (Abhishek Goel, Nick Black)"

* tag 'pm-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (57 commits)
  ACPI: PM: Make acpi_sleep_state_supported() non-static
  PM: sleep: Drop dev_pm_skip_next_resume_phases()
  ACPI: PM: Unexport acpi_device_get_power()
  Documentation: ABI: power: Add missing newline at end of file
  ACPI: PM: Drop unused function and function header
  ACPI: PM: Introduce "poweroff" callbacks for ACPI PM domain and LPSS
  ACPI: PM: Simplify and fix PM domain hibernation callbacks
  PCI: PM: Simplify bus-level hibernation callbacks
  PM: ACPI/PCI: Resume all devices during hibernation
  cpufreq: Avoid calling cpufreq_verify_current_freq() from handle_update()
  cpufreq: Consolidate cpufreq_update_current_freq() and __cpufreq_get()
  kernel: power: swap: use kzalloc() instead of kmalloc() followed by memset()
  cpufreq: Don't skip frequency validation for has_target() drivers
  PCI: PM/ACPI: Refresh all stale power state data in pci_pm_complete()
  PCI / ACPI: Add _PR0 dependent devices
  ACPI / PM: Introduce concept of a _PR0 dependent device
  PCI / ACPI: Use cached ACPI device state to get PCI device power state
  ACPI: PM: Allow transitions to D0 to occur in special cases
  ACPI: PM: Avoid evaluating _PS3 on transitions from D3hot to D3cold
  cpufreq: Use has_target() instead of !setpolicy
  ...
2019-07-09 10:05:22 -07:00
Linus Torvalds 5ad18b2e60 Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull force_sig() argument change from Eric Biederman:
 "A source of error over the years has been that force_sig has taken a
  task parameter when it is only safe to use force_sig with the current
  task.

  The force_sig function is built for delivering synchronous signals
  such as SIGSEGV where the userspace application caused a synchronous
  fault (such as a page fault) and the kernel responded with a signal.

  Because the name force_sig does not make this clear, and because the
  force_sig takes a task parameter the function force_sig has been
  abused for sending other kinds of signals over the years. Slowly those
  have been fixed when the oopses have been tracked down.

  This set of changes fixes the remaining abusers of force_sig and
  carefully rips out the task parameter from force_sig and friends
  making this kind of error almost impossible in the future"

* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (27 commits)
  signal/x86: Move tsk inside of CONFIG_MEMORY_FAILURE in do_sigbus
  signal: Remove the signal number and task parameters from force_sig_info
  signal: Factor force_sig_info_to_task out of force_sig_info
  signal: Generate the siginfo in force_sig
  signal: Move the computation of force into send_signal and correct it.
  signal: Properly set TRACE_SIGNAL_LOSE_INFO in __send_signal
  signal: Remove the task parameter from force_sig_fault
  signal: Use force_sig_fault_to_task for the two calls that don't deliver to current
  signal: Explicitly call force_sig_fault on current
  signal/unicore32: Remove tsk parameter from __do_user_fault
  signal/arm: Remove tsk parameter from __do_user_fault
  signal/arm: Remove tsk parameter from ptrace_break
  signal/nds32: Remove tsk parameter from send_sigtrap
  signal/riscv: Remove tsk parameter from do_trap
  signal/sh: Remove tsk parameter from force_sig_info_fault
  signal/um: Remove task parameter from send_sigtrap
  signal/x86: Remove task parameter from send_sigtrap
  signal: Remove task parameter from force_sig_mceerr
  signal: Remove task parameter from force_sig
  signal: Remove task parameter from force_sigsegv
  ...
2019-07-08 21:48:15 -07:00