Commit Graph

1439 Commits

Author SHA1 Message Date
Kees Cook d1fd836dcf mm: split ET_DYN ASLR from mmap ASLR
This fixes the "offset2lib" weakness in ASLR for arm, arm64, mips,
powerpc, and x86.  The problem is that if there is a leak of ASLR from
the executable (ET_DYN), it means a leak of shared library offset as
well (mmap), and vice versa.  Further details and a PoC of this attack
is available here:

  http://cybersecurity.upv.es/attacks/offset2lib/offset2lib.html

With this patch, a PIE linked executable (ET_DYN) has its own ASLR
region:

  $ ./show_mmaps_pie
  54859ccd6000-54859ccd7000 r-xp  ...  /tmp/show_mmaps_pie
  54859ced6000-54859ced7000 r--p  ...  /tmp/show_mmaps_pie
  54859ced7000-54859ced8000 rw-p  ...  /tmp/show_mmaps_pie
  7f75be764000-7f75be91f000 r-xp  ...  /lib/x86_64-linux-gnu/libc.so.6
  7f75be91f000-7f75beb1f000 ---p  ...  /lib/x86_64-linux-gnu/libc.so.6
  7f75beb1f000-7f75beb23000 r--p  ...  /lib/x86_64-linux-gnu/libc.so.6
  7f75beb23000-7f75beb25000 rw-p  ...  /lib/x86_64-linux-gnu/libc.so.6
  7f75beb25000-7f75beb2a000 rw-p  ...
  7f75beb2a000-7f75beb4d000 r-xp  ...  /lib64/ld-linux-x86-64.so.2
  7f75bed45000-7f75bed46000 rw-p  ...
  7f75bed46000-7f75bed47000 r-xp  ...
  7f75bed47000-7f75bed4c000 rw-p  ...
  7f75bed4c000-7f75bed4d000 r--p  ...  /lib64/ld-linux-x86-64.so.2
  7f75bed4d000-7f75bed4e000 rw-p  ...  /lib64/ld-linux-x86-64.so.2
  7f75bed4e000-7f75bed4f000 rw-p  ...
  7fffb3741000-7fffb3762000 rw-p  ...  [stack]
  7fffb377b000-7fffb377d000 r--p  ...  [vvar]
  7fffb377d000-7fffb377f000 r-xp  ...  [vdso]

The change is to add a call the newly created arch_mmap_rnd() into the
ELF loader for handling ET_DYN ASLR in a separate region from mmap ASLR,
as was already done on s390.  Removes CONFIG_BINFMT_ELF_RANDOMIZE_PIE,
which is no longer needed.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Hector Marco-Gisbert <hecmargi@upv.es>
Cc: Russell King <linux@arm.linux.org.uk>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: "David A. Long" <dave.long@linaro.org>
Cc: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Arun Chandran <achandran@mvista.com>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Min-Hua Chen <orca.chen@gmail.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Alex Smith <alex@alex-smith.me.uk>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Vineeth Vijayan <vvijayan@mvista.com>
Cc: Jeff Bailey <jeffbailey@google.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Behan Webster <behanw@converseincode.com>
Cc: Ismael Ripoll <iripoll@upv.es>
Cc: Jan-Simon Mller <dl9pf@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:05 -07:00
Kees Cook c6f5b001e6 s390: redefine randomize_et_dyn for ELF_ET_DYN_BASE
In preparation for moving ET_DYN randomization into the ELF loader (which
requires a static ELF_ET_DYN_BASE), this redefines s390's existing ET_DYN
randomization in a call to arch_mmap_rnd(). This refactoring results in
the same ET_DYN randomization on s390.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:05 -07:00
Linus Torvalds 9497d7380b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatching updates from Jiri Kosina:
 "These are mostly smaller things that got accumulated during the
  development cycle.  The unified solution is still being worked on and
  is not mature enough for 4.1 yet.

   - s390 livepatching support, from Jiri Slaby (has Ack from s390
     maintainers)

   - error handling simplification, from Josh Poimboeuf

   - two minor code cleanups from Josh Poimboeuf and Miroslav Benes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
  livepatch: add support on s390
  livepatch: remove unnecessary call to klp_find_object_module()
  livepatch: simplify disable error path
  livepatch: remove extern specifier from header files
2015-04-14 10:15:34 -07:00
Linus Torvalds cc76ee75a9 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core locking changes from Ingo Molnar:
 "Main changes:

   - jump label asm preparatory work for PowerPC (Anton Blanchard)

   - rwsem optimizations and cleanups (Davidlohr Bueso)

   - mutex optimizations and cleanups (Jason Low)

   - futex fix (Oleg Nesterov)

   - remove broken atomicity checks from {READ,WRITE}_ONCE() (Peter
     Zijlstra)"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  powerpc, jump_label: Include linux/jump_label.h to get HAVE_JUMP_LABEL define
  jump_label: Allow jump labels to be used in assembly
  jump_label: Allow asm/jump_label.h to be included in assembly
  locking/mutex: Further simplify mutex_spin_on_owner()
  locking: Remove atomicy checks from {READ,WRITE}_ONCE
  locking/rtmutex: Rename argument in the rt_mutex_adjust_prio_chain() documentation as well
  locking/rwsem: Fix lock optimistic spinning when owner is not running
  locking: Remove ACCESS_ONCE() usage
  locking/rwsem: Check for active lock before bailing on spinning
  locking/rwsem: Avoid deceiving lock spinners
  locking/rwsem: Set lock ownership ASAP
  locking/rwsem: Document barrier need when waking tasks
  locking/futex: Check PF_KTHREAD rather than !p->mm to filter out kthreads
  locking/mutex: Refactor mutex_spin_on_owner()
  locking/mutex: In mutex_spin_on_owner(), return true when owner changes
2015-04-13 10:27:28 -07:00
Linus Torvalds 9003601310 The most interesting bit here is irqfd/ioeventfd support for ARM and ARM64.
ARM/ARM64: fixes for live migration, irqfd and ioeventfd support (enabling
 vhost, too), page aging
 
 s390: interrupt handling rework, allowing to inject all local interrupts
 via new ioctl and to get/set the full local irq state for migration
 and introspection.  New ioctls to access memory by virtual address,
 and to get/set the guest storage keys.  SIMD support.
 
 MIPS: FPU and MIPS SIMD Architecture (MSA) support.  Includes some patches
 from Ralf Baechle's MIPS tree.
 
 x86: bugfixes (notably for pvclock, the others are small) and cleanups.
 Another small latency improvement for the TSC deadline timer.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJVJ9vmAAoJEL/70l94x66DoMEH/R3rh8IMf4jTiWRkcqohOMPX
 k1+NaSY/lCKayaSgggJ2hcQenMbQoXEOdslvaA/H0oC+VfJGK+lmU6E63eMyyhjQ
 Y+Px6L85NENIzDzaVu/TIWWuhil5PvIRr3VO8cvntExRoCjuekTUmNdOgCvN2ObW
 wswN2qRdPIeEj2kkulbnye+9IV4G0Ne9bvsmUdOdfSSdi6ZcV43JcvrpOZT++mKj
 RrKB+3gTMZYGJXMMLBwMkdl8mK1ozriD+q0mbomT04LUyGlPwYLl4pVRDBqyksD7
 KsSSybaK2E4i5R80WEljgDMkNqrCgNfg6VZe4n9Y+CfAAOToNnkMJaFEi+yuqbs=
 =yu2b
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "First batch of KVM changes for 4.1

  The most interesting bit here is irqfd/ioeventfd support for ARM and
  ARM64.

  Summary:

  ARM/ARM64:
     fixes for live migration, irqfd and ioeventfd support (enabling
     vhost, too), page aging

  s390:
     interrupt handling rework, allowing to inject all local interrupts
     via new ioctl and to get/set the full local irq state for migration
     and introspection.  New ioctls to access memory by virtual address,
     and to get/set the guest storage keys.  SIMD support.

  MIPS:
     FPU and MIPS SIMD Architecture (MSA) support.  Includes some
     patches from Ralf Baechle's MIPS tree.

  x86:
     bugfixes (notably for pvclock, the others are small) and cleanups.
     Another small latency improvement for the TSC deadline timer"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits)
  KVM: use slowpath for cross page cached accesses
  kvm: mmu: lazy collapse small sptes into large sptes
  KVM: x86: Clear CR2 on VCPU reset
  KVM: x86: DR0-DR3 are not clear on reset
  KVM: x86: BSP in MSR_IA32_APICBASE is writable
  KVM: x86: simplify kvm_apic_map
  KVM: x86: avoid logical_map when it is invalid
  KVM: x86: fix mixed APIC mode broadcast
  KVM: x86: use MDA for interrupt matching
  kvm/ppc/mpic: drop unused IRQ_testbit
  KVM: nVMX: remove unnecessary double caching of MAXPHYADDR
  KVM: nVMX: checks for address bits beyond MAXPHYADDR on VM-entry
  KVM: x86: cache maxphyaddr CPUID leaf in struct kvm_vcpu
  KVM: vmx: pass error code with internal error #2
  x86: vdso: fix pvclock races with task migration
  KVM: remove kvm_read_hva and kvm_read_hva_atomic
  KVM: x86: optimize delivery of TSC deadline timer interrupt
  KVM: x86: extract blocking logic from __vcpu_run
  kvm: x86: fix x86 eflags fixed bit
  KVM: s390: migrate vcpu interrupt state
  ...
2015-04-13 09:47:01 -07:00
Richard Weinberger 6a32591a4a s390: Remove signal translation and exec_domain
As execution domain support is gone we can remove
signal translation from the signal code and remove
exec_domain from thread_info.

Signed-off-by: Richard Weinberger <richard@nod.at>
2015-04-12 20:58:25 +02:00
Anton Blanchard 55dd0df781 jump_label: Allow asm/jump_label.h to be included in assembly
Wrap asm/jump_label.h for all archs with #ifndef __ASSEMBLY__.
Since these are kernel only headers, we don't need #ifdef
__KERNEL__ so can simplify things a bit.

If an architecture wants to use jump labels in assembly, it
will still need to define a macro to create the __jump_table
entries (see ARCH_STATIC_BRANCH in the powerpc asm/jump_label.h
for an example).

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: benh@kernel.crashing.org
Cc: catalin.marinas@arm.com
Cc: davem@davemloft.net
Cc: heiko.carstens@de.ibm.com
Cc: jbaron@akamai.com
Cc: linux@arm.linux.org.uk
Cc: linuxppc-dev@lists.ozlabs.org
Cc: liuj97@gmail.com
Cc: mgorman@suse.de
Cc: mmarek@suse.cz
Cc: mpe@ellerman.id.au
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: rostedt@goodmis.org
Cc: schwidefsky@de.ibm.com
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1428551492-21977-1-git-send-email-anton@samba.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-09 09:40:23 +02:00
Paolo Bonzini 7f22b45d66 Features and fixes for 4.1 (kvm/next)
1. Assorted changes
 1.1 allow more feature bits for the guest
 1.2 Store breaking event address on program interrupts
 
 2. Interrupt handling rework
 2.1 Fix copy_to_user while holding a spinlock (cc stable)
 2.2 Rework floating interrupts to follow the priorities
 2.3 Allow to inject all local interrupts via new ioctl
 2.4 allow to get/set the full local irq state, e.g. for migration
     and introspection
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJVGvEUAAoJEBF7vIC1phx82tEP/3KrwsDRs+buiBqyv9k+qCFV
 v+R94gReBB5ggfbGfUYgBJMR2/4XQ+0jcZ55jfBCC4osOq6Juw/8HIj2nSgbQHmz
 F9Go0n8IqJ3DnqPTc0KYdFZ7kqDvMV5ME3XJrFiAHv1TUL9H/KpZArkcVIwD2NOo
 w01AVrCDY4bTajYqKShzGFymQl1K5vTGGvgxhh4kAHct4Nt5N5HFmyROm0RrsFZx
 Sycx4t177O7zhCN2tv5Zy8iWaEvzHAESoXkhZ2cJ6t+FXii2Eov5IgyyfYRXBfbm
 YACyvlFD087UdFGTt85ggPVS/S/5hn9xXmVHuIimHeyZU7CXCN5vYPcn+ZyksYr5
 uA8+/2OPAgcaeDa2f7nCjl8jmcLR3hkQ0n/urA+pPYAZANJoFDfiGOr/kVk6aKff
 JTGSFUjNK891/IGEsdrSk2p64U5xMd8LFa3Il++kZT91gc2nrZOHNz5FGlXlkLdJ
 sADeNFWhoprEt/2P4aX6W2j26L8G874XkldDSjrS41U8L55+IiEm09r8oAWgfc5A
 pryeDaN4nSjFC+HOtlPkcVkAcsswiI6nHIm3+/XFetCq+v4pnVKFMHWsTeEjiQgQ
 H5aV9mfEKTJaCPrAJMsj8ZsKq0usG+BeRNqpIvxPAQB8fyl3jw9iu+RHeY1xWsTg
 BRHB/+CGYIxDu4XdRexv
 =Rrx5
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-20150331' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

Features and fixes for 4.1 (kvm/next)

1. Assorted changes
1.1 allow more feature bits for the guest
1.2 Store breaking event address on program interrupts

2. Interrupt handling rework
2.1 Fix copy_to_user while holding a spinlock (cc stable)
2.2 Rework floating interrupts to follow the priorities
2.3 Allow to inject all local interrupts via new ioctl
2.4 allow to get/set the full local irq state, e.g. for migration
    and introspection
2015-04-07 18:10:03 +02:00
David S. Miller 9f0d34bc34 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/asix_common.c
	drivers/net/usb/sr9800.c
	drivers/net/usb/usbnet.c
	include/linux/usb/usbnet.h
	net/ipv4/tcp_ipv4.c
	net/ipv6/tcp_ipv6.c

The TCP conflicts were overlapping changes.  In 'net' we added a
READ_ONCE() to the socket cached RX route read, whilst in 'net-next'
Eric Dumazet touched the surrounding code dealing with how mini
sockets are handled.

With USB, it's a case of the same bug fix first going into net-next
and then I cherry picked it back into net.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:16:53 -04:00
Jens Freimann 6d3da24141 KVM: s390: deliver floating interrupts in order of priority
This patch makes interrupt handling compliant to the z/Architecture
Principles of Operation with regard to interrupt priorities.

Add a bitmap for pending floating interrupts. Each bit relates to a
interrupt type and its list. A turned on bit indicates that a list
contains items (interrupts) which need to be delivered.  When delivering
interrupts on a cpu we can merge the existing bitmap for cpu-local
interrupts and floating interrupts and have a single mechanism for
delivery.
Currently we have one list for all kinds of floating interrupts and a
corresponding spin lock. This patch adds a separate list per
interrupt type. An exception to this are service signal and machine check
interrupts, as there can be only one pending interrupt at a time.

Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-03-31 21:07:27 +02:00
Heiko Carstens 1a327ffd3d s390/syscalls: simplify syscall_get_arch()
Given that sizeof(long) is now always 8, we can simplify syscall_get_arch()
a bit.
Just another piece I didn't find when removing 31 bit support.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-03-30 13:26:07 +02:00
Jiri Slaby 21042d43b3 livepatch: add support on s390
This is a trivial port from kGraft. Module relocations are not supported yet.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-03-27 15:05:32 +01:00
Sebastian Ott 358e048d67 s390: fix /proc/interrupts output
The irqclass_sub_desc array and enum interruption_class are out of sync
thus /proc/interrupts is broken. Remove IRQIO_CLW.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-25 11:41:41 -04:00
Heiko Carstens 2a5e91c3ac s390: add missing arch_release_task_struct() declaration
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-03-25 11:49:46 +01:00
Heiko Carstens 8a5d8473dd s390/maccess: remove potentially broken probe_kernel_write()
Remove the s390 architecture implementation of probe_kernel_write() and
instead use a new function s390_kernel_write() to modify kernel text and
data everywhere.

The s390 implementation of probe_kernel_write() was potentially broken
since it modified memory in a read-modify-write fashion, which read four
bytes, modified the requested bytes within those four bytes and wrote
the result back.
If two cpus would modify the same four byte area at different locations
within that area, this could lead to corruption.
Right now the only places which called probe_kernel_write() did run within
stop_machine_run. Therefore the scenario can't happen right now, however
that might change at any time.

To fix this rename probe_kernel_write() to s390_kernel_write() which can
have special semantics, like only call it while running within stop_machine().

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-03-25 11:49:43 +01:00
Heiko Carstens 26f15caaf9 s390/cmpxchg: simplify cmpxchg_double
Since sizeof(long) == 4 is always false now, simplify cmpxchg_double a bit.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-03-25 11:49:36 +01:00
Heiko Carstens 5a79859ae0 s390: remove 31 bit support
Remove the 31 bit support in order to reduce maintenance cost and
effectively remove dead code. Since a couple of years there is no
distribution left that comes with a 31 bit kernel.

The 31 bit kernel also has been broken since more than a year before
anybody noticed. In addition I added a removal warning to the kernel
shown at ipl for 5 minutes: a960062e58 ("s390: add 31 bit warning
message") which let everybody know about the plan to remove 31 bit
code. We didn't get any response.

Given that the last 31 bit only machine was introduced in 1999 let's
remove the code.
Anybody with 31 bit user space code can still use the compat mode.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-03-25 11:49:33 +01:00
Marcelo Tosatti bbf4aef89d KVM: s390: Features and fixes for 4.1 (kvm/next)
1. Fixes
 2. Implement access register mode in KVM
 3. Provide a userspace post handler for the STSI instruction
 4. Provide an interface for compliant memory accesses
 5. Provide an interface for getting/setting the guest storage key
 6. Fixup for the vector facility patches: do not announce the
    vector facility in the guest for old QEMUs.
 
 1-5 were initially shown as RFC in
 
 http://www.spinics.net/lists/kvm/msg114720.html
 
 some small review changes
 - added some ACKs
 - have the AR mode patches first
 - get rid of unnecessary AR_INVAL define
 - typos and language
 
 6. two new patches
 The two new patches fixup the vector support patches that were
 introduced in the last pull request for QEMU versions that dont
 know about vector support and guests that do. (We announce the
 facility bit, but dont enable the facility so vector aware guests
 will crash on vector instructions).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJVCV7NAAoJEBF7vIC1phx8ymcP/RovYmBGd7e6jBZLx4fooc97
 DFuMkEdNT3bkA0x/L+SgYcExFkoAUX5KPK74mHOTmmBZHopX1AMDKQyEDacjNWBb
 9CJdPffJWKKjFtC7KwrkgnDKBrOsmNLdWsLtl8aEIAxxKznvLXYsrvMrBYqdRkUh
 nJsjaQueKP8AbzSLsG9N3Yilps2988VMo+wArfw0jVCCO+sWNZnYYsMXwRgyQsPb
 K5+0Co/cw1wfnzy1hUWqpRWs26JLIcewLnMx9Ycoaap1V0A59t8J/9xPDHHODX3d
 2wXlJsiNmoJj/kqakT4xlNTS0q7Tn2iLlJ1NNUADScR3zP7twXs17H2g8hGSVRiM
 xBd0s671m4eSZB5Bk3LSf0PPLbOmTnCB1qYpXhd56an3MGbYIzRtcmU9LvbY3SzH
 yxsVUGww1uvYN6A5RABDjqrbmnl9eQ4HruNUnA/fHLS6sDtYbmi7ln4bV3eE1sPa
 0r7lPYKWdvyN0FO3Rb9Qwnjhd3F/uaLTWjptdLXapRLO0fD8adiYqzWmbXMZrgcD
 BU1CNejIIYP/GZDOQanJoVQJWd9akUW/s6QiDMJRc87KaL13/cCWoicPi1j62ygj
 gueYjz1KfKh6hoVvviTl2NgyXke2qs+bIKDrRV6VfdIgfeSN50lUxdJm61RdDA9d
 IPrxUJDy8YdzH7rcMPXP
 =GqOj
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-20150318' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into queue

KVM: s390: Features and fixes for 4.1 (kvm/next)

1. Fixes
2. Implement access register mode in KVM
3. Provide a userspace post handler for the STSI instruction
4. Provide an interface for compliant memory accesses
5. Provide an interface for getting/setting the guest storage key
6. Fixup for the vector facility patches: do not announce the
   vector facility in the guest for old QEMUs.

1-5 were initially shown as RFC in

http://www.spinics.net/lists/kvm/msg114720.html

some small review changes
- added some ACKs
- have the AR mode patches first
- get rid of unnecessary AR_INVAL define
- typos and language

6. two new patches
The two new patches fixup the vector support patches that were
introduced in the last pull request for QEMU versions that dont
know about vector support and guests that do. (We announce the
facility bit, but dont enable the facility so vector aware guests
will crash on vector instructions).
2015-03-23 20:32:02 -03:00
Michael Mueller 18280d8b4b KVM: s390: represent SIMD cap in kvm facility
The patch represents capability KVM_CAP_S390_VECTOR_REGISTERS by means
of the SIMD facility bit. This allows to a) disable the use of SIMD when
used in conjunction with a not-SIMD-aware QEMU, b) to enable SIMD when
used with a SIMD-aware version of QEMU and c) finally by means of a QEMU
version using the future cpu model ioctls.

Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: Eric Farman <farman@linux.vnet.ibm.com>
Tested-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-03-17 16:33:14 +01:00
Ekaterina Tumanova e44fc8c9da KVM: s390: introduce post handlers for STSI
The Store System Information (STSI) instruction currently collects all
information it relays to the caller in the kernel. Some information,
however, is only available in user space. An example of this is the
guest name: The kernel always sets "KVMGuest", but user space knows the
actual guest name.

This patch introduces a new exit, KVM_EXIT_S390_STSI, guarded by a
capability that can be enabled by user space if it wants to be able to
insert such data. User space will be provided with the target buffer
and the requested STSI function code.

Reviewed-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-03-17 16:26:51 +01:00
Martin Schwidefsky e143fa93c2 s390/mm: limit STACK_RND_MASK for compat tasks
For compat tasks the mmap randomization does not use the maximum
randomization value from mmap_rnd_mask but the fixed value of 0x7ff.
This needs to be respected in the definition of STACK_RND_MASK as
well.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-03-13 12:15:45 +01:00
Marcelo Tosatti 2b25385761 KVM: s390: Features and Fixes for 4.1 (kvm/next)
1. Several Fixes and enhancements
 ---------------------------------
 - These 3 patches have cc stable:
 b75f4c9 KVM: s390: Zero out current VMDB of STSI before including level3 data.
 261520d KVM: s390: fix handling of write errors in the tpi handler
 15462e3 KVM: s390: reinjection of irqs can fail in the tpi handler
 
 2. SIMD support the kernel part (introduced with z13)
 -----------------------------------------------------
 - two KVM-generic changes in kvm.h:
 1. New capability that can be enabled: KVM_CAP_S390_VECTOR_REGISTERS
 2. increased padding size for sync regs in struct kvm_run to clarify that
    sync regs can be larger than 1k. This is fine as this is the last
    element in the structure.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJU+ab+AAoJEBF7vIC1phx8jMMQAJ1lXYJUbnOSJ7GD1aqQe4b3
 pGW9rjQtZ1sZ6tH45sQeHzHg9tbD5ZHvSwMRuW2itLZXtusUXYajkR3TdHiR2bcR
 karOqEducjPR6UCLdyfCAKkssZAkPyTXMnj3PRr9vpISvw53W5ZKxulIMtqSbMP0
 Cl5eP1x+qz24csDV7u/S0k4Kccol43XfmVqVtZm2yVVt1qXIM1qqbUoJBTGECLHC
 Ux9BbtTYC4lZl8Q/m9hyCD6YV7flhSKPKN6VbyAnRIM1LQiLCEAmppWJIbTbTI+3
 m+fM52ue/QlkogGCQ7Mg9+lJkwBKfNCwneSPds4/M41Atwqz5m66L/D1rIlfD4Bh
 HbnMB9FaRXIeyDAgdG/pemZQeOsnDHiJBHphQ0Q5TPl1lLbWjHFweG8uQ9vGMizf
 Diy5Rk16MBVbgjYBf8Jy7ZXHisBN+gbOkzXD5j28+j3OGdzhVilFQIJu2oqv5wzW
 r30ogxmdGH7ljh6bWJKj7YXZx0C3oDF/8HRkf4Fevh064c66S0jTLd8wb4YqGSy5
 3qfNqovnvaSjPyKLVajmSOeFIVwtSnylFTqmTOxsC5o786BWgU0Mh6fhKGyUQlcs
 PMmn5B5VrLLb8YZtX0XOWkeA45KXXeT19qb2WaB/VZSSgb4fG7W/oqqQ6LrZl0sc
 pEBYwbsFvkFG+04aIYN3
 =bzDF
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-20150306' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into queue

KVM: s390: Features and Fixes for 4.1 (kvm/next)

1. Several Fixes and enhancements
---------------------------------
- These 3 patches have cc stable:
b75f4c9 KVM: s390: Zero out current VMDB of STSI before including level3 data.
261520d KVM: s390: fix handling of write errors in the tpi handler
15462e3 KVM: s390: reinjection of irqs can fail in the tpi handler

2. SIMD support the kernel part (introduced with z13)
-----------------------------------------------------
- two KVM-generic changes in kvm.h:
1. New capability that can be enabled: KVM_CAP_S390_VECTOR_REGISTERS
2. increased padding size for sync regs in struct kvm_run to clarify that
   sync regs can be larger than 1k. This is fine as this is the last
   element in the structure.
2015-03-12 22:09:35 -03:00
Linus Torvalds affb8172de Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm/s390 bugfixes from Marcelo Tosatti.

* git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: s390: non-LPAR case obsolete during facilities mask init
  KVM: s390: include guest facilities in kvm facility test
  KVM: s390: fix in memory copy of facility lists
  KVM: s390/cpacf: Fix kernel bug under z/VM
  KVM: s390/cpacf: Enable key wrapping by default
2015-03-09 18:59:50 -07:00
Linus Torvalds ec0e6bd3f1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
 "One performance optimization for page_clear and a couple of bug fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/mm: fix incorrect ASCE after crst_table_downgrade
  s390/ftrace: fix crashes when switching tracers / add notrace to cpu_relax()
  s390/pci: unify pci_iomap symbol exports
  s390/pci: fix [un]map_resources sequence
  s390: let the compiler do page clearing
  s390/pci: fix possible information leak in mmio syscall
  s390/dcss: array index 'i' is used before limits check.
  s390/scm_block: fix off by one during cluster reservation
  s390/jump label: improve and fix sanity check
  s390/jump label: add missing jump_label_apply_nops() call
2015-03-09 18:55:52 -07:00
Eric Farman 13211ea7b4 KVM: s390: Enable vector support for capable guest
We finally have all the pieces in place, so let's include the
vector facility bit in the mask of available hardware facilities
for the guest to recognize.  Also, enable the vector functionality
in the guest control blocks, to avoid a possible vector data
exception that would otherwise occur when a vector instruction
is issued by the guest operating system.

Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-03-06 13:49:35 +01:00
Eric Farman cd7b4b6106 KVM: s390: Add new SIGP order to kernel counters
The new SIGP order Store Additional Status at Address is totally
handled by user space, but we should still record the occurrence
of this order in the kernel code.

Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-03-06 13:49:34 +01:00
Eric Farman 403c8648cb KVM: s390: Vector exceptions
A new exception type for vector instructions is introduced with
the new processor, but is handled exactly like a Data Exception
which is already handled by the system.

Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-03-06 13:49:33 +01:00
Eric Farman 68c557501b KVM: s390: Allocate and save/restore vector registers
Define and allocate space for both the host and guest views of
the vector registers for a given vcpu.  The 32 vector registers
occupy 128 bits each (512 bytes total), but architecturally are
paired with 512 additional bytes of reserved space for future
expansion.

The kvm_sync_regs structs containing the registers are union'ed
with 1024 bytes of padding in the common kvm_run struct.  The
addition of 1024 bytes of new register information clearly exceeds
the existing union, so an expansion of that padding is required.

When changing environments, we need to appropriately save and
restore the vector registers viewed by both the host and guest,
into and out of the sync_regs space.

The floating point registers overlay the upper half of vector
registers 0-15, so there's a bit of data duplication here that
needs to be carefully avoided.

Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-03-06 13:49:33 +01:00
Marcelo Tosatti bfb8fb4775 KVM: s390: Fixups for changes in merge window for 4.0
Here are some fixups/improvements for
 
 commit 658b6eda20 ("KVM: s390: add cpu model support")
 commit 9d8d578605 ("KVM: s390: use facilities and cpu_id per KVM")
 commit a374e892c3 ("KVM: s390/cpacf: Enable/disable protected key
 functions for kvm guest")
 commit 45c9b47c58 ("KVM: s390/CPACF: Choose crypto control block format")
 
 which all have been merged during the merge window for 4.0.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJU9ucxAAoJEBF7vIC1phx8OOoP/iJhd73h4QsjjU/ZaXSNW+65
 pwgOhD3eSXdUkGNsKUMp9nsyDo89puSZdUo9+5hvtbal5P89BpLx2VQxryZY0SAU
 21D/AjaaxJ6YcL5mfT6lyp1W5jznOobZLl8wSHoIAWyKrmqv0hLG1M/Ip5TxmdE0
 7rSeABHJyQNcTnOLL9Iq/uut8Qaf8ApD2rxzNfZILjlyQveS1I+l2qRAyi4MYTU0
 E3wZAOirzzNZfhbe049hvyNYd086GfOucwf1wsezvhbVJeqi2MNRf61yCGpedRdw
 2V5ce6LefMSx+sdqBOoKCYziO/vnTMdQjBfsm9315fTMHlGVBY0PgPWLQFVe77pI
 2enB20/T4hTJ203Nwaa0jBkQ2IyunSrE1rP6M+E7g90ZgiXrR9TsURAfl+u5EzE4
 QZe7BcQRJ+Wrj1lrM7W67Mr+hzIlRvWeEJjKsHa/q8BPD4mbQRacPN2OKW2zcbrx
 Ye4Vr0Fanom5nyZqPDbMl5dWlN0xuboFksSMkdaR7KDp5fb4WMITOl0WeOZPRAn9
 16TTO6Mr19I3uCCuJFhZaI1HPyUQPJja2JZqB/HUcribTcp+GGPt8od0Yxvd07kP
 B4T5OlpiYbOIimIIkukGvbdNIBOcfvyiInq3offrE1S/q4PWo3uWf2HyeQE+/G8V
 d9gCVWo+st4Sj5UVRCm0
 =ZaGi
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-master-20150303' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux

KVM: s390: Fixups for changes in merge window for 4.0

Here are some fixups/improvements for

commit 658b6eda20 ("KVM: s390: add cpu model support")
commit 9d8d578605 ("KVM: s390: use facilities and cpu_id per KVM")
commit a374e892c3 ("KVM: s390/cpacf: Enable/disable protected key
functions for kvm guest")
commit 45c9b47c58 ("KVM: s390/CPACF: Choose crypto control block format")

which all have been merged during the merge window for 4.0.
2015-03-05 14:42:48 -03:00
Michael Mueller 981467c930 KVM: s390: include guest facilities in kvm facility test
Most facility related decisions in KVM have to take into account:

- the facilities offered by the underlying run container (LPAR/VM)
- the facilities supported by the KVM code itself
- the facilities requested by a guest VM

This patch adds the KVM driver requested facilities to the test routine.

It additionally renames struct s390_model_fac to kvm_s390_fac and its field
names to be more meaningful.

The semantics of the facilities stored in the KVM architecture structure
is changed. The address arch.model.fac->list now points to the guest
facility list and arch.model.fac->mask points to the KVM facility mask.

This patch fixes the behaviour of KVM for some facilities for guests
that ignore the guest visible facility bits, e.g. guests could use
transactional memory intructions on hosts supporting them even if the
chosen cpu model would not offer them.

The userspace interface is not affected by this change.

Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-03-04 10:33:25 +01:00
Martin Schwidefsky 691d526415 s390/mm: fix incorrect ASCE after crst_table_downgrade
The switch_mm function does nothing in case the prev and next mm
are the same. It can happen that a crst_table_downgrade has changed
the top-level pgd in the meantime on a different CPU. Always store
the new ASCE to be picked up in entry.S.

[heiko.carstens@de.ibm.com]: Bug was introduced with git commit
53e857f308 ("s390/mm,tlb: race of lazy TLB flush vs. recreation
of TLB entries") and causes random crashes due to broken page tables
being used.

Reported-by: Dominik Vogt <vogt@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2015-03-02 11:35:57 -08:00
Kirill A. Shutemov c07af4f1ce mm: add missing __PAGETABLE_{PUD,PMD}_FOLDED defines
Core mm expects __PAGETABLE_{PUD,PMD}_FOLDED to be defined if these page
table levels folded.  Usually, these defines are provided by
<asm-generic/pgtable-nopmd.h> and <asm-generic/pgtable-nopud.h>.

But some architectures fold page table levels in a custom way.  They
need to define these macros themself.  This patch adds missing defines.

The patch fixes mm->nr_pmds underflow and eliminates dead __pmd_alloc()
and __pud_alloc() on architectures without these page table levels.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Aaro Koskinen <aaro.koskinen@iki.fi>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-28 09:57:51 -08:00
Christian Borntraeger fb3d1c085c s390: let the compiler do page clearing
The hardware folks told me that for page clearing "when you exactly
know what to do, hand written xc+pfd is usally faster then mvcl for
page clearing, as it saves millicode overhead and parameter parsing
and checking" as long as you dont need the cache bypassing.
Turns out that gcc already does a proper xc,pfd loop.

A small test on z196 that does

buff = mmap(NULL, bufsize,PROT_EXEC|PROT_WRITE|PROT_READ,AP_PRIVATE| MAP_ANONYMOUS,0,0);
for ( i = 0; i < bufsize; i+= 256)
    buff[i] = 0x5;

gets 20% faster (touches every cache line of a page)

and

buff = mmap(NULL, bufsize,PROT_EXEC|PROT_WRITE|PROT_READ,AP_PRIVATE| MAP_ANONYMOUS,0,0);
for ( i = 0; i < bufsize; i+= 4096)
    buff[i] = 0x5;

is within noise ratio (touches one cache line of a page).

As the clear_page is usually called for first memory accesses
we can assume that at least one cache line is used afterwards,
so this change should be always better.
Another benchmark, a make -j 40 of my testsuite in tmpfs with
hot caches on a 32cpu system:

 -- unpatched --       --  patched  --
real     0m1.017s     real     0m0.994s   (~2% faster, but in noise)
user     0m5.339s     user     0m5.016s   (~6% faster)
sys      0m0.691s     sys      0m0.632s   (~8% faster)

Let use the same define to memset as the asm-generic variant

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-02-26 09:24:49 +01:00
Linus Torvalds d34696c220 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
 "Two patches to save some memory if CONFIG_NR_CPUS is large, a changed
  default for the use of compare-and-delay, and a couple of bug fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/spinlock: disabled compare-and-delay by default
  s390/mm: align 64-bit PIE binaries to 4GB
  s390/cacheinfo: coding style changes
  s390/cacheinfo: fix shared cpu masks
  s390/smp: reduce size of struct pcpu
  s390/topology: convert cpu_topology array to per cpu variable
  s390/topology: delay initialization of topology cpu masks
  s390/vdso: fix clock_gettime for CLOCK_THREAD_CPUTIME_ID, -2 and -3
2015-02-21 11:18:26 -08:00
Linus Torvalds 53861af9a1 OK, this has the big virtio 1.0 implementation, as specified by OASIS.
On top of tht is the major rework of lguest, to use PCI and virtio 1.0, to
 double-check the implementation.
 
 Then comes the inevitable fixes and cleanups from that work.
 
 Thanks,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU5B9cAAoJENkgDmzRrbjxPacP/jajliXX353JJ/g/hkZ6oDN5
 o7FhELBKiUMr7enVZYwj2BBYk5OM36nB9pQkiqHMSbjJGoS5IK70enxb4YRxSHBn
 YCLblZMNqutGS0kclZ9DDysztjAhxH7CvLM6pMZ7eHP0f3+FM/QhbxHfbG9DTBUH
 2U/nybvd3M/+YBe7ptwQdrH8aOCAD6RTIsXellfm99dNMK6K/5lqnWQ98WSXmNXq
 vyvdaAQsqqUkmxtajjcBumaCH4/SehOJJjUqojCMsR3aBkgOBWDZJURMek+KA5Dt
 X996fBsTAlvTtCUKRrmLTb2ScDH7fu+jwbWRqMYDk8zpEr3XqiLTTPV4/TiHGmi7
 Wiw3g1wIY1YbETlZyongB5MIoVyUfmDAd+bT8nBsj3KIITD84gOUQFDMl6d63c0I
 z6A9Pu/UzpJGsXZT3WoFLi6TO67QyhOseqZnhS4wBgLabjxffNM7yov9RVKUVH/n
 JHunnpUk2iTtSgscBarOBz5867dstuurnaUIspZthVBo6y6N0z+GrU+agJ8Y4DXx
 mvwzeYLhQH2208PjxPFiah/kA/gHNm1m678TbpS+CUsgmpQiJ4gTwtazDSi4TwZY
 Hs9T9GulkzpZIzEyKL3qG2TsfyDhW5Avn+GvKInAT9+Fkig4BnP3DUONBxcwGZ78
 eI3FDUWsE36NqE5ECWmz
 =ivCe
 -----END PGP SIGNATURE-----

Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull virtio updates from Rusty Russell:
 "OK, this has the big virtio 1.0 implementation, as specified by OASIS.

  On top of tht is the major rework of lguest, to use PCI and virtio
  1.0, to double-check the implementation.

  Then comes the inevitable fixes and cleanups from that work"

* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (80 commits)
  virtio: don't set VIRTIO_CONFIG_S_DRIVER_OK twice.
  virtio_net: unconditionally define struct virtio_net_hdr_v1.
  tools/lguest: don't use legacy definitions for net device in example launcher.
  virtio: Don't expose legacy net features when VIRTIO_NET_NO_LEGACY defined.
  tools/lguest: use common error macros in the example launcher.
  tools/lguest: give virtqueues names for better error messages
  tools/lguest: more documentation and checking of virtio 1.0 compliance.
  lguest: don't look in console features to find emerg_wr.
  tools/lguest: don't start devices until DRIVER_OK status set.
  tools/lguest: handle indirect partway through chain.
  tools/lguest: insert driver references from the 1.0 spec (4.1 Virtio Over PCI)
  tools/lguest: insert device references from the 1.0 spec (4.1 Virtio Over PCI)
  tools/lguest: rename virtio_pci_cfg_cap field to match spec.
  tools/lguest: fix features_accepted logic in example launcher.
  tools/lguest: handle device reset correctly in example launcher.
  virtual: Documentation: simplify and generalize paravirt_ops.txt
  lguest: remove NOTIFY call and eventfd facility.
  lguest: remove NOTIFY facility from demonstration launcher.
  lguest: use the PCI console device's emerg_wr for early boot messages.
  lguest: always put console in PCI slot #1.
  ...
2015-02-18 09:24:01 -08:00
Linus Torvalds b9085bcbf5 Fairly small update, but there are some interesting new features.
Common: Optional support for adding a small amount of polling on each HLT
 instruction executed in the guest (or equivalent for other architectures).
 This can improve latency up to 50% on some scenarios (e.g. O_DSYNC writes
 or TCP_RR netperf tests).  This also has to be enabled manually for now,
 but the plan is to auto-tune this in the future.
 
 ARM/ARM64: the highlights are support for GICv3 emulation and dirty page
 tracking
 
 s390: several optimizations and bugfixes.  Also a first: a feature
 exposed by KVM (UUID and long guest name in /proc/sysinfo) before
 it is available in IBM's hypervisor! :)
 
 MIPS: Bugfixes.
 
 x86: Support for PML (page modification logging, a new feature in
 Broadwell Xeons that speeds up dirty page tracking), nested virtualization
 improvements (nested APICv---a nice optimization), usual round of emulation
 fixes.  There is also a new option to reduce latency of the TSC deadline
 timer in the guest; this needs to be tuned manually.
 
 Some commits are common between this pull and Catalin's; I see you
 have already included his tree.
 
 ARM has other conflicts where functions are added in the same place
 by 3.19-rc and 3.20 patches.  These are not large though, and entirely
 within KVM.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJU28rkAAoJEL/70l94x66DXqQH/1TDOfJIjW7P2kb0Sw7Fy1wi
 cEX1KO/VFxAqc8R0E/0Wb55CXyPjQJM6xBXuFr5cUDaIjQ8ULSktL4pEwXyyv/s5
 DBDkN65mriry2w5VuEaRLVcuX9Wy+tqLQXWNkEySfyb4uhZChWWHvKEcgw5SqCyg
 NlpeHurYESIoNyov3jWqvBjr4OmaQENyv7t2c6q5ErIgG02V+iCux5QGbphM2IC9
 LFtPKxoqhfeB2xFxTOIt8HJiXrZNwflsTejIlCl/NSEiDVLLxxHCxK2tWK/tUXMn
 JfLD9ytXBWtNMwInvtFm4fPmDouv2VDyR0xnK2db+/axsJZnbxqjGu1um4Dqbak=
 =7gdx
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM update from Paolo Bonzini:
 "Fairly small update, but there are some interesting new features.

  Common:
     Optional support for adding a small amount of polling on each HLT
     instruction executed in the guest (or equivalent for other
     architectures).  This can improve latency up to 50% on some
     scenarios (e.g. O_DSYNC writes or TCP_RR netperf tests).  This
     also has to be enabled manually for now, but the plan is to
     auto-tune this in the future.

  ARM/ARM64:
     The highlights are support for GICv3 emulation and dirty page
     tracking

  s390:
     Several optimizations and bugfixes.  Also a first: a feature
     exposed by KVM (UUID and long guest name in /proc/sysinfo) before
     it is available in IBM's hypervisor! :)

  MIPS:
     Bugfixes.

  x86:
     Support for PML (page modification logging, a new feature in
     Broadwell Xeons that speeds up dirty page tracking), nested
     virtualization improvements (nested APICv---a nice optimization),
     usual round of emulation fixes.

     There is also a new option to reduce latency of the TSC deadline
     timer in the guest; this needs to be tuned manually.

     Some commits are common between this pull and Catalin's; I see you
     have already included his tree.

  Powerpc:
     Nothing yet.

     The KVM/PPC changes will come in through the PPC maintainers,
     because I haven't received them yet and I might end up being
     offline for some part of next week"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits)
  KVM: ia64: drop kvm.h from installed user headers
  KVM: x86: fix build with !CONFIG_SMP
  KVM: x86: emulate: correct page fault error code for NoWrite instructions
  KVM: Disable compat ioctl for s390
  KVM: s390: add cpu model support
  KVM: s390: use facilities and cpu_id per KVM
  KVM: s390/CPACF: Choose crypto control block format
  s390/kernel: Update /proc/sysinfo file with Extended Name and UUID
  KVM: s390: reenable LPP facility
  KVM: s390: floating irqs: fix user triggerable endless loop
  kvm: add halt_poll_ns module parameter
  kvm: remove KVM_MMIO_SIZE
  KVM: MIPS: Don't leak FPU/DSP to guest
  KVM: MIPS: Disable HTW while in guest
  KVM: nVMX: Enable nested posted interrupt processing
  KVM: nVMX: Enable nested virtual interrupt delivery
  KVM: nVMX: Enable nested apic register virtualization
  KVM: nVMX: Make nested control MSRs per-cpu
  KVM: nVMX: Enable nested virtualize x2apic mode
  KVM: nVMX: Prepare for using hardware MSR bitmap
  ...
2015-02-13 09:55:09 -08:00
Rasmus Villemoes af3cd13501 lib/string.c: remove strnicmp()
Now that all in-tree users of strnicmp have been converted to
strncasecmp, the wrapper can be removed.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: David Howells <dhowells@redhat.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:14 -08:00
Andy Lutomirski f56141e3e2 all arches, signal: move restart_block to struct task_struct
If an attacker can cause a controlled kernel stack overflow, overwriting
the restart block is a very juicy exploit target.  This is because the
restart_block is held in the same memory allocation as the kernel stack.

Moving the restart block to struct task_struct prevents this exploit by
making the restart_block harder to locate.

Note that there are other fields in thread_info that are also easy
targets, at least on some architectures.

It's also a decent simplification, since the restart code is more or less
identical on all architectures.

[james.hogan@imgtec.com: metag: align thread_info::supervisor_stack]
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: David Miller <davem@davemloft.net>
Acked-by: Richard Weinberger <richard@nod.at>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:12 -08:00
Heiko Carstens da0c636ea7 s390/topology: convert cpu_topology array to per cpu variable
Convert the per cpu topology cpu masks to a per cpu variable.
At least for machines which do have less possible cpus than NR_CPUS this can
save a bit of memory (z/VM: max 64 vs 512 for performance_defconfig).

This reduces the kernel image size by 100k.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-02-12 09:37:22 +01:00
Heiko Carstens d05d15da18 s390/topology: delay initialization of topology cpu masks
There is no reason to initialize the topology cpu masks already while
setup_arch() is being called. It is sufficient to initialize the masks
before the scheduler becomes SMP aware.
Therefore a pre-SMP initcall aka early_initcall is suffucient.

This also allows to convert the cpu_topology array into a per cpu
variable with a later patch. Without this patch this wouldn't be
possible since the per cpu memory areas are not allocated while setup_arch
is executed.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-02-12 09:37:22 +01:00
Linus Torvalds 59d53737a8 Merge branch 'akpm' (patches from Andrew)
Merge second set of updates from Andrew Morton:
 "More of MM"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (83 commits)
  mm/nommu.c: fix arithmetic overflow in __vm_enough_memory()
  mm/mmap.c: fix arithmetic overflow in __vm_enough_memory()
  vmstat: Reduce time interval to stat update on idle cpu
  mm/page_owner.c: remove unnecessary stack_trace field
  Documentation/filesystems/proc.txt: describe /proc/<pid>/map_files
  mm: incorporate read-only pages into transparent huge pages
  vmstat: do not use deferrable delayed work for vmstat_update
  mm: more aggressive page stealing for UNMOVABLE allocations
  mm: always steal split buddies in fallback allocations
  mm: when stealing freepages, also take pages created by splitting buddy page
  mincore: apply page table walker on do_mincore()
  mm: /proc/pid/clear_refs: avoid split_huge_page()
  mm: pagewalk: fix misbehavior of walk_page_range for vma(VM_PFNMAP)
  mempolicy: apply page table walker on queue_pages_range()
  arch/powerpc/mm/subpage-prot.c: use walk->vma and walk_page_vma()
  memcg: cleanup preparation for page table walk
  numa_maps: remove numa_maps->vma
  numa_maps: fix typo in gather_hugetbl_stats
  pagemap: use walk->vma instead of calling find_vma()
  clear_refs: remove clear_refs_private->vma and introduce clear_refs_test_walk()
  ...
2015-02-11 18:23:28 -08:00
Linus Torvalds b3d6524ff7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:

 - The remaining patches for the z13 machine support: kernel build
   option for z13, the cache synonym avoidance, SMT support,
   compare-and-delay for spinloops and the CES5S crypto adapater.

 - The ftrace support for function tracing with the gcc hotpatch option.
   This touches common code Makefiles, Steven is ok with the changes.

 - The hypfs file system gets an extension to access diagnose 0x0c data
   in user space for performance analysis for Linux running under z/VM.

 - The iucv hvc console gets wildcard spport for the user id filtering.

 - The cacheinfo code is converted to use the generic infrastructure.

 - Cleanup and bug fixes.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (42 commits)
  s390/process: free vx save area when releasing tasks
  s390/hypfs: Eliminate hypfs interval
  s390/hypfs: Add diagnose 0c support
  s390/cacheinfo: don't use smp_processor_id() in preemptible context
  s390/zcrypt: fixed domain scanning problem (again)
  s390/smp: increase maximum value of NR_CPUS to 512
  s390/jump label: use different nop instruction
  s390/jump label: add sanity checks
  s390/mm: correct missing space when reporting user process faults
  s390/dasd: cleanup profiling
  s390/dasd: add locking for global_profile access
  s390/ftrace: hotpatch support for function tracing
  ftrace: let notrace function attribute disable hotpatching if necessary
  ftrace: allow architectures to specify ftrace compile options
  s390: reintroduce diag 44 calls for cpu_relax()
  s390/zcrypt: Add support for new crypto express (CEX5S) adapter.
  s390/zcrypt: Number of supported ap domains is not retrievable.
  s390/spinlock: add compare-and-delay to lock wait loops
  s390/tape: remove redundant if statement
  s390/hvc_iucv: add simple wildcard matches to the iucv allow filter
  ...
2015-02-11 17:42:32 -08:00
Kirill A. Shutemov d016bf7ece mm: make FIRST_USER_ADDRESS unsigned long on all archs
LKP has triggered a compiler warning after my recent patch "mm: account
pmd page tables to the process":

    mm/mmap.c: In function 'exit_mmap':
 >> mm/mmap.c:2857:2: warning: right shift count >= width of type [enabled by default]

The code:

 > 2857                WARN_ON(mm_nr_pmds(mm) >
   2858                                round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT);

In this, on tile, we have FIRST_USER_ADDRESS defined as 0.  round_up() has
the same type -- int.  PUD_SHIFT.

I think the best way to fix it is to define FIRST_USER_ADDRESS as unsigned
long.  On every arch for consistency.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:03 -08:00
Kirill A. Shutemov 6e76d4b20b s390: drop pte_file()-related helpers
We've replaced remap_file_pages(2) implementation with emulation.  Nobody
creates non-linear mapping anymore.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-10 14:30:33 -08:00
Michael Mueller 658b6eda20 KVM: s390: add cpu model support
This patch enables cpu model support in kvm/s390 via the vm attribute
interface.

During KVM initialization, the host properties cpuid, IBC value and the
facility list are stored in the architecture specific cpu model structure.

During vcpu setup, these properties are taken to initialize the related SIE
state. This mechanism allows to adjust the properties from user space and thus
to implement different selectable cpu models.

This patch uses the IBC functionality to block instructions that have not
been implemented at the requested CPU type and GA level compared to the
full host capability.

Userspace has to initialize the cpu model before vcpu creation. A cpu model
change of running vcpus is not possible.

Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-02-09 12:44:13 +01:00
Michael Mueller 9d8d578605 KVM: s390: use facilities and cpu_id per KVM
The patch introduces facilities and cpu_ids per virtual machine.
Different virtual machines may want to expose different facilities and
cpu ids to the guest, so let's make them per-vm instead of global.

Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-02-09 12:44:12 +01:00
Tony Krowiak 45c9b47c58 KVM: s390/CPACF: Choose crypto control block format
We need to specify a different format for the crypto control block
depending on whether the APXA facility is installed or not. Let's
test for it by executing the PQAP(QCI) function and use either a
format-1 or a format-2 crypto control block accordingly. This is a
host only change for z13 and does not affect the guest view.

Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-02-09 12:44:12 +01:00
Ekaterina Tumanova f3d0bd6c7f s390/kernel: Update /proc/sysinfo file with Extended Name and UUID
A new architecture extends STSI 3.2.2 with UUID and long names. KVM
will provide the first implementation. This patch adds the additional
data  fields (Extended Name and UUID) from the 4KB block returned by
the STSI 3.2.2 command and reflect this information in the
/proc/sysinfo file accordingly.

Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-02-09 12:44:11 +01:00
Paolo Bonzini f781951299 kvm: add halt_poll_ns module parameter
This patch introduces a new module parameter for the KVM module; when it
is present, KVM attempts a bit of polling on every HLT before scheduling
itself out via kvm_vcpu_block.

This parameter helps a lot for latency-bound workloads---in particular
I tested it with O_DSYNC writes with a battery-backed disk in the host.
In this case, writes are fast (because the data doesn't have to go all
the way to the platters) but they cannot be merged by either the host or
the guest.  KVM's performance here is usually around 30% of bare metal,
or 50% if you use cache=directsync or cache=writethrough (these
parameters avoid that the guest sends pointless flush requests, and
at the same time they are not slow because of the battery-backed cache).
The bad performance happens because on every halt the host CPU decides
to halt itself too.  When the interrupt comes, the vCPU thread is then
migrated to a new physical CPU, and in general the latency is horrible
because the vCPU thread has to be scheduled back in.

With this patch performance reaches 60-65% of bare metal and, more
important, 99% of what you get if you use idle=poll in the guest.  This
means that the tunable gets rid of this particular bottleneck, and more
work can be done to improve performance in the kernel or QEMU.

Of course there is some price to pay; every time an otherwise idle vCPUs
is interrupted by an interrupt, it will poll unnecessarily and thus
impose a little load on the host.  The above results were obtained with
a mostly random value of the parameter (500000), and the load was around
1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.

The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
that can be used to tune the parameter.  It counts how many HLT
instructions received an interrupt during the polling period; each
successful poll avoids that Linux schedules the VCPU thread out and back
in, and may also avoid a likely trip to C1 and back for the physical CPU.

While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
Of these halts, almost all are failed polls.  During the benchmark,
instead, basically all halts end within the polling period, except a more
or less constant stream of 50 per second coming from vCPUs that are not
running the benchmark.  The wasted time is thus very low.  Things may
be slightly different for Windows VMs, which have a ~10 ms timer tick.

The effect is also visible on Marcelo's recently-introduced latency
test for the TSC deadline timer.  Though of course a non-RT kernel has
awful latency bounds, the latency of the timer is around 8000-10000 clock
cycles compared to 20000-120000 without setting halt_poll_ns.  For the TSC
deadline timer, thus, the effect is both a smaller average latency and
a smaller variance.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-06 13:08:37 +01:00
Joonsoo Kim 7b02190c27 mm/debug_pagealloc: fix build failure on ppc and some other archs
Kim Phillips reported following build failure.

  LD      init/built-in.o
  mm/built-in.o: In function `free_pages_prepare':
  mm/page_alloc.c:770: undefined reference to `.kernel_map_pages'
  mm/built-in.o: In function `prep_new_page':
  mm/page_alloc.c:933: undefined reference to `.kernel_map_pages'
  mm/built-in.o: In function `map_pages':
  mm/compaction.c:61: undefined reference to `.kernel_map_pages'
  make: *** [vmlinux] Error 1

Reason for this problem is that commit 031bc5743f
("mm/debug-pagealloc: make debug-pagealloc boottime configurable")
forgot to remove the old declaration of kernel_map_pages() for some
architectures.  This patch removes them to fix build failure.

Reported-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-05 13:35:30 -08:00
Heiko Carstens d5caa4dbf9 s390/jump label: use different nop instruction
Use a brcl 0,2 instruction for jump label nops during compile time,
so we don't mix up the different nops during mcount/hotpatch call
site detection.
The initial jump label code instruction replacement will exchange
these instructions with either a branch or a brcl 0,0 instruction.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-01-29 16:33:34 +01:00
Heiko Carstens e6d60b368b s390/ftrace: hotpatch support for function tracing
Make use of gcc's hotpatch support to generate better code for ftrace
function tracing.
The generated code now contains only a six byte nop in each function
prologue instead of a 24 byte code block which will be runtime patched to
support function tracing.
With the new code generation the runtime overhead for supporting function
tracing is close to zero, while the original code did show a significant
performance impact.

Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-01-29 09:19:25 +01:00
Heiko Carstens 4d92f50249 s390: reintroduce diag 44 calls for cpu_relax()
Christian Borntraeger reported that the now missing diag 44 calls (voluntary
time slice end) does cause a performance regression for stop_machine() calls
if a machine has more virtual cpus than the host has physical cpus.

This patch mainly reverts 57f2ffe14f ("s390: remove diag 44 calls from
cpu_relax()") with the exception that we still do not issue diag 44 calls if
running with smt enabled. Due to group scheduling algorithms when running in
LPAR this would lead to significant latencies.
However, when running in LPAR we do not have more virtual than physical cpus.

Reported-and-tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-01-29 09:19:16 +01:00
Martin Schwidefsky 2c72a44ecd s390/spinlock: add compare-and-delay to lock wait loops
Add the compare-and-delay instruction to the spin-lock and rw-lock
retry loops. A CPU executing the compare-and-delay instruction stops
until the lock value has changed. This is done to make the locking
code for contended locks to behave better in regard to the multi-
hreading facility. A thread of a core executing a compare-and-delay
will allow the other threads of a core to get a larger share of the
core resources.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-01-23 15:17:04 +01:00
Tony Krowiak a374e892c3 KVM: s390/cpacf: Enable/disable protected key functions for kvm guest
Created new KVM device attributes for indicating whether the AES and
DES/TDES protected key functions are available for programs running
on the KVM guest.  The attributes are used to set up the controls in
the guest SIE block that specify whether programs running on the
guest will be given access to the protected key functions available
on the s390 hardware.

Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[split MSA4/protected key into two patches]
2015-01-23 13:25:40 +01:00
Jason J. Herne 72f250206f KVM: s390: Provide guest TOD Clock Get/Set Controls
Provide controls for setting/getting the guest TOD clock based on the VM
attribute interface.

Provide TOD and TOD_HIGH vm attributes on s390 for managing guest Time Of
Day clock value.

TOD_HIGH is presently always set to 0. In the future it will contain a high
order expansion of the tod clock value after it overflows the 64-bits of
the TOD.

Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:40 +01:00
David Hildenbrand 2444b352c3 KVM: s390: forward most SIGP orders to user space
Most SIGP orders are handled partially in kernel and partially in
user space. In order to:
- Get a correct SIGP SET PREFIX handler that informs user space
- Avoid race conditions between concurrently executed SIGP orders
- Serialize SIGP orders per VCPU

We need to handle all "slow" SIGP orders in user space. The remaining
ones to be handled completely in kernel are:
- SENSE
- SENSE RUNNING
- EXTERNAL CALL
- EMERGENCY SIGNAL
- CONDITIONAL EMERGENCY SIGNAL
According to the PoP, they have to be fast. They can be executed
without conflicting to the actions of other pending/concurrently
executing orders (e.g. STOP vs. START).

This patch introduces a new capability that will - when enabled -
forward all but the mentioned SIGP orders to user space. The
instruction counters in the kernel are still updated.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:37 +01:00
David Hildenbrand 9fbd80828c KVM: s390: clear the pfault queue if user space sets the invalid token
We need a way to clear the async pfault queue from user space (e.g.
for resets and SIGP SET ARCHITECTURE).

This patch simply clears the queue as soon as user space sets the
invalid pfault token. The definition of the invalid token is moved
to uapi.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:36 +01:00
David Hildenbrand ea5f496925 KVM: s390: only one external call may be pending at a time
Only one external call may be pending at a vcpu at a time. For this
reason, we have to detect whether the SIGP externcal call interpretation
facility is available. If so, all external calls have to be injected
using this mechanism.

SIGP EXTERNAL CALL orders have to return whether another external
call is already pending. This check was missing until now.

SIGP SENSE hasn't returned yet in all conditions whether an external
call was pending.

If a SIGP EXTERNAL CALL irq is to be injected and one is already
pending, -EBUSY is returned.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:36 +01:00
David Hildenbrand d614be05c8 s390/sclp: introduce check for the SIGP Interpretation Facility
This patch introduces the infrastructure to check whether the SIGP
Interpretation Facility is installed on all VCPUs in the configuration.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:35 +01:00
David Hildenbrand 6cddd432e3 KVM: s390: handle stop irqs without action_bits
This patch removes the famous action_bits and moves the handling of
SIGP STOP AND STORE STATUS directly into the SIGP STOP interrupt.

The new local interrupt infrastructure is used to track pending stop
requests.

STOP irqs are the only irqs that don't get actively delivered. They
remain pending until the stop function is executed (=stop intercept).

If another STOP irq is already pending, -EBUSY will now be returned
(needed for the SIGP handling code).

Migration of pending SIGP STOP (AND STORE STATUS) orders should now
be supported out of the box.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:33 +01:00
David Hildenbrand 2822545f9f KVM: s390: new parameter for SIGP STOP irqs
In order to get rid of the action_flags and to properly migrate pending SIGP
STOP irqs triggered e.g. by SIGP STOP AND STORE STATUS, we need to remember
whether to store the status when stopping.

For this reason, a new parameter (flags) for the SIGP STOP irq is introduced.
These flags further define details of the requested STOP and can be easily
migrated.

Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:33 +01:00
Martin Schwidefsky 10ad34bc76 s390: add SMT support
The multi-threading facility is introduced with the z13 processor family.
This patch adds code to detect the multi-threading facility. With the
facility enabled each core will surface multiple hardware threads to the
system. Each hardware threads looks like a normal CPU to the operating
system with all its registers and properties.

The SCLP interface reports the SMT topology indirectly via the maximum
thread id. Each reported CPU in the result of a read-scp-information
is a core representing a number of hardware threads.

To reflect the reduced CPU capacity if two hardware threads run on a
single core the MT utilization counter set is used to normalize the
raw cputime obtained by the CPU timer deltas. This scaled cputime is
reported via the taskstats interface. The normal /proc/stat numbers
are based on the raw cputime and are not affected by the normalization.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-01-22 12:16:01 +01:00
Martin Schwidefsky 1f6b83e5e4 s390: avoid z13 cache aliasing
Avoid cache aliasing on z13 by aligning shared objects to multiples
of 512K. The virtual addresses of a page from a shared file needs
to have identical bits in the range 2^12 to 2^18.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-01-22 12:15:59 +01:00
Michael S. Tsirkin 8cfc99b583 s390: add pci_iomap_range
Virtio drivers should map the part of the range they need, not
necessarily all of it.
To this end, support mapping ranges within BAR on s390.
Since multiple ranges can now be mapped within a BAR, we keep track of
the number of mappings created, and only clear out the mapping for a BAR
when this number reaches 0.

Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Tested-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-01-21 16:28:49 +10:30
Linus Torvalds 6fb400d36d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
 "Two small performance tweaks, the plumbing for the execveat system
  call and a couple of bug fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/uprobes: fix user space PER events
  s390/bpf: Fix JMP_JGE_X (A > X) and JMP_JGT_X (A >= X)
  s390/bpf: Fix ALU_NEG (A = -A)
  s390/mm: avoid using pmd_to_page for !USE_SPLIT_PMD_PTLOCKS
  s390/timex: fix get_tod_clock_ext() inline assembly
  s390: wire up execveat syscall
  s390/kernel: use stnsm 255 instead of stosm 0
  s390/vtime: Get rid of redundant WARN_ON
  s390/zcrypt: kernel oops at insmod of the z90crypt device driver
2015-01-15 10:50:29 +13:00
Chen Gang fbf87dff67 s390/sclp: fix declaration of _sclp_print_early()
_sclp_print_early() has return value: at present, return 0 for OK, 1 for
failure. It returns '%r2', so use 'long' as return value (upper caller
can check '%r2' directly). The related warning:

    CC      arch/s390/boot/compressed/misc.o
  arch/s390/boot/compressed/misc.c:66:8: warning: type defaults to 'int' in declaration of '_sclp_print_early' [-Wimplicit-int]
   extern _sclp_print_early(const char *);
          ^

At present, _sclp_print_early() is only used by puts(), so can still
remain its declaration in 'misc.c' file.

[heiko.carstens@de.ibm.com]: move declaration to sclp.h header file

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-01-08 10:02:51 +01:00
Chen Gang e38f978133 s390/timex: fix get_tod_clock_ext() inline assembly
For C language, it treats array parameter as a pointer, so sizeof for an
array parameter is equal to sizeof for a pointer, which causes compiler
warning (with allmodconfig by gcc 5):

  ./arch/s390/include/asm/timex.h: In function 'get_tod_clock_ext':
  ./arch/s390/include/asm/timex.h:76:32: warning: 'sizeof' on array function parameter 'clk' will return size of 'char *' [-Wsizeof-array-argument]
    typedef struct { char _[sizeof(clk)]; } addrtype;
                                  ^
Can use macro CLOCK_STORE_SIZE instead of all related hard code numbers,
which also can avoid this warning. And also add a tab to CLOCK_TICK_RATE
definition to match coding styles.

[heiko.carstens@de.ibm.com]:
Chen's patch actually fixes a bug within the get_tod_clock_ext() inline assembly
where we incorrectly tell the compiler that only 8 bytes of memory get changed
instead of 16 bytes.
This would allow gcc to generate incorrect code. Right now this doesn't seem to
be the case.
Also slightly changed the patch a bit.
- renamed CLOCK_STORE_SIZE to STORE_CLOCK_EXT_SIZE
- changed get_tod_clock_ext() to receive a char pointer parameter

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-01-07 09:52:47 +01:00
Linus Torvalds 66dcff86ba 3.19 changes for KVM:
- spring cleaning: removed support for IA64, and for hardware-assisted
 virtualization on the PPC970
 - ARM, PPC, s390 all had only small fixes
 
 For x86:
 - small performance improvements (though only on weird guests)
 - usual round of hardware-compliancy fixes from Nadav
 - APICv fixes
 - XSAVES support for hosts and guests.  XSAVES hosts were broken because
 the (non-KVM) XSAVES patches inadvertently changed the KVM userspace
 ABI whenever XSAVES was enabled; hence, this part is going to stable.
 Guest support is just a matter of exposing the feature and CPUID leaves
 support.
 
 Right now KVM is broken for PPC BookE in your tree (doesn't compile).
 I'll reply to the pull request with a patch, please apply it either
 before the pull request or in the merge commit, in order to preserve
 bisectability somewhat.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJUkpg+AAoJEL/70l94x66DUmoH/jzXYkptSW9NGgm79KqxGJlD
 lzLnLBkitVvx++Mz5YBhdJEhKKLUlCtifFT1zPJQ/pthQhIRSaaAwZyNGgUs5w5x
 yMGKHiPQFyZRbmQtZhCInW0BftJoYHHciO3nUfHCZnp34My9MP2D55W7/z+fYFfQ
 DuqBSE9ThyZJtZ4zh8NRA9fCOeuqwVYRyoBs820Wbsh4cpIBoIK63Dg7k+CLE+ZV
 MZa/mRL6bAfsn9W5bnOUAgHJ3SPznnWbO3/g0aV+roL/5pffblprJx9lKNR08xUM
 6hDFLop2gDehDJesDkY/o8Ckp1hEouvfsVpSShry4vcgtn0hgh2O5/6Orbmj6vE=
 =Zwq1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM update from Paolo Bonzini:
 "3.19 changes for KVM:

   - spring cleaning: removed support for IA64, and for hardware-
     assisted virtualization on the PPC970

   - ARM, PPC, s390 all had only small fixes

  For x86:
   - small performance improvements (though only on weird guests)
   - usual round of hardware-compliancy fixes from Nadav
   - APICv fixes
   - XSAVES support for hosts and guests.  XSAVES hosts were broken
     because the (non-KVM) XSAVES patches inadvertently changed the KVM
     userspace ABI whenever XSAVES was enabled; hence, this part is
     going to stable.  Guest support is just a matter of exposing the
     feature and CPUID leaves support"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (179 commits)
  KVM: move APIC types to arch/x86/
  KVM: PPC: Book3S: Enable in-kernel XICS emulation by default
  KVM: PPC: Book3S HV: Improve H_CONFER implementation
  KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register
  KVM: PPC: Book3S HV: Remove code for PPC970 processors
  KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions
  KVM: PPC: Book3S HV: Simplify locking around stolen time calculations
  arch: powerpc: kvm: book3s_paired_singles.c: Remove unused function
  arch: powerpc: kvm: book3s_pr.c: Remove unused function
  arch: powerpc: kvm: book3s.c: Remove some unused functions
  arch: powerpc: kvm: book3s_32_mmu.c: Remove unused function
  KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked
  KVM: PPC: Book3S HV: ptes are big endian
  KVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPI
  KVM: PPC: Book3S HV: Fix KSM memory corruption
  KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI
  KVM: PPC: Book3S HV: Fix computation of tlbie operand
  KVM: PPC: Book3S HV: Add missing HPTE unlock
  KVM: PPC: BookE: Improve irq inject tracepoint
  arm/arm64: KVM: Require in-kernel vgic for the arch timers
  ...
2014-12-18 16:05:28 -08:00
Christian Borntraeger 81fc77fbfc s390/kernel: use stnsm 255 instead of stosm 0
On some models, stnsm 255 might be slightly faster than stosm 0.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-12-18 13:37:15 +01:00
Linus Torvalds f96fe22567 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull another networking update from David Miller:
 "Small follow-up to the main merge pull from the other day:

  1) Alexander Duyck's DMA memory barrier patch set.

  2) cxgb4 driver fixes from Karen Xie.

  3) Add missing export of fixed_phy_register() to modules, from Mark
     Salter.

  4) DSA bug fixes from Florian Fainelli"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (24 commits)
  net/macb: add TX multiqueue support for gem
  linux/interrupt.h: remove the definition of unused tasklet_hi_enable
  jme: replace calls to redundant function
  net: ethernet: davicom: Allow to select DM9000 for nios2
  net: ethernet: smsc: Allow to select SMC91X for nios2
  cxgb4: Add support for QSA modules
  libcxgbi: fix freeing skb prematurely
  cxgb4i: use set_wr_txq() to set tx queues
  cxgb4i: handle non-pdu-aligned rx data
  cxgb4i: additional types of negative advice
  cxgb4/cxgb4i: set the max. pdu length in firmware
  cxgb4i: fix credit check for tx_data_wr
  cxgb4i: fix tx immediate data credit check
  net: phy: export fixed_phy_register()
  fib_trie: Fix trie balancing issue if new node pushes down existing node
  vlan: Add ability to always enable TSO/UFO
  r8169:update rtl8168g pcie ephy parameter
  net: dsa: bcm_sf2: force link for all fixed PHY devices
  fm10k/igb/ixgbe: Use dma_rmb on Rx descriptor reads
  r8169: Use dma_rmb() and dma_wmb() for DescOwn checks
  ...
2014-12-12 16:11:12 -08:00
Alexander Duyck 1077fa36f2 arch: Add lightweight memory barriers dma_rmb() and dma_wmb()
There are a number of situations where the mandatory barriers rmb() and
wmb() are used to order memory/memory operations in the device drivers
and those barriers are much heavier than they actually need to be.  For
example in the case of PowerPC wmb() calls the heavy-weight sync
instruction when for coherent memory operations all that is really needed
is an lsync or eieio instruction.

This commit adds a coherent only version of the mandatory memory barriers
rmb() and wmb().  In most cases this should result in the barrier being the
same as the SMP barriers for the SMP case, however in some cases we use a
barrier that is somewhere in between rmb() and smp_rmb().  For example on
ARM the rmb barriers break down as follows:

  Barrier   Call     Explanation
  --------- -------- ----------------------------------
  rmb()     dsb()    Data synchronization barrier - system
  dma_rmb() dmb(osh) data memory barrier - outer sharable
  smp_rmb() dmb(ish) data memory barrier - inner sharable

These new barriers are not as safe as the standard rmb() and wmb().
Specifically they do not guarantee ordering between coherent and incoherent
memories.  The primary use case for these would be to enforce ordering of
reads and writes when accessing coherent memory that is shared between the
CPU and a device.

It may also be noted that there is no dma_mb().  Most architectures don't
provide a good mechanism for performing a coherent only full barrier without
resorting to the same mechanism used in mb().  As such there isn't much to
be gained in trying to define such a function.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: David Miller <davem@davemloft.net>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 21:15:06 -05:00
Alexander Duyck 8a44971841 arch: Cleanup read_barrier_depends() and comments
This patch is meant to cleanup the handling of read_barrier_depends and
smp_read_barrier_depends.  In multiple spots in the kernel headers
read_barrier_depends is defined as "do {} while (0)", however we then go
into the SMP vs non-SMP sections and have the SMP version reference
read_barrier_depends, and the non-SMP define it as yet another empty
do/while.

With this commit I went through and cleaned out the duplicate definitions
and reduced the number of definitions down to 2 per header.  In addition I
moved the 50 line comments for the macro from the x86 and mips headers that
defined it as an empty do/while to those that were actually defining the
macro, alpha and blackfin.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 21:15:05 -05:00
Linus Torvalds 27afc5dbda Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
 "The most notable change for this pull request is the ftrace rework
  from Heiko.  It brings a small performance improvement and the ground
  work to support a new gcc option to replace the mcount blocks with a
  single nop.

  Two new s390 specific system calls are added to emulate user space
  mmio for PCI, an artifact of the how PCI memory is accessed.

  Two patches for the memory management with changes to common code.
  For KVM mm_forbids_zeropage is added which disables the empty zero
  page for an mm that is used by a KVM process.  And an optimization,
  pmdp_get_and_clear_full is added analog to ptep_get_and_clear_full.

  Some micro optimization for the cmpxchg and the spinlock code.

  And as usual bug fixes and cleanups"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (46 commits)
  s390/cputime: fix 31-bit compile
  s390/scm_block: make the number of reqs per HW req configurable
  s390/scm_block: handle multiple requests in one HW request
  s390/scm_block: allocate aidaw pages only when necessary
  s390/scm_block: use mempool to manage aidaw requests
  s390/eadm: change timeout value
  s390/mm: fix memory leak of ptlock in pmd_free_tlb
  s390: use local symbol names in entry[64].S
  s390/ptrace: always include vector registers in core files
  s390/simd: clear vector register pointer on fork/clone
  s390: translate cputime magic constants to macros
  s390/idle: convert open coded idle time seqcount
  s390/idle: add missing irq off lockdep annotation
  s390/debug: avoid function call for debug_sprintf_*
  s390/kprobes: fix instruction copy for out of line execution
  s390: remove diag 44 calls from cpu_relax()
  s390/dasd: retry partition detection
  s390/dasd: fix list corruption for sleep_on requests
  s390/dasd: fix infinite term I/O loop
  s390/dasd: remove unused code
  ...
2014-12-11 17:30:55 -08:00
Linus Torvalds 70e71ca0af Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) New offloading infrastructure and example 'rocker' driver for
    offloading of switching and routing to hardware.

    This work was done by a large group of dedicated individuals, not
    limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend,
    Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu

 2) Start making the networking operate on IOV iterators instead of
    modifying iov objects in-situ during transfers.  Thanks to Al Viro
    and Herbert Xu.

 3) A set of new netlink interfaces for the TIPC stack, from Richard
    Alpe.

 4) Remove unnecessary looping during ipv6 routing lookups, from Martin
    KaFai Lau.

 5) Add PAUSE frame generation support to gianfar driver, from Matei
    Pavaluca.

 6) Allow for larger reordering levels in TCP, which are easily
    achievable in the real world right now, from Eric Dumazet.

 7) Add a variable of napi_schedule that doesn't need to disable cpu
    interrupts, from Eric Dumazet.

 8) Use a doubly linked list to optimize neigh_parms_release(), from
    Nicolas Dichtel.

 9) Various enhancements to the kernel BPF verifier, and allow eBPF
    programs to actually be attached to sockets.  From Alexei
    Starovoitov.

10) Support TSO/LSO in sunvnet driver, from David L Stevens.

11) Allow controlling ECN usage via routing metrics, from Florian
    Westphal.

12) Remote checksum offload, from Tom Herbert.

13) Add split-header receive, BQL, and xmit_more support to amd-xgbe
    driver, from Thomas Lendacky.

14) Add MPLS support to openvswitch, from Simon Horman.

15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen
    Klassert.

16) Do gro flushes on a per-device basis using a timer, from Eric
    Dumazet.  This tries to resolve the conflicting goals between the
    desired handling of bulk vs.  RPC-like traffic.

17) Allow userspace to ask for the CPU upon what a packet was
    received/steered, via SO_INCOMING_CPU.  From Eric Dumazet.

18) Limit GSO packets to half the current congestion window, from Eric
    Dumazet.

19) Add a generic helper so that all drivers set their RSS keys in a
    consistent way, from Eric Dumazet.

20) Add xmit_more support to enic driver, from Govindarajulu
    Varadarajan.

21) Add VLAN packet scheduler action, from Jiri Pirko.

22) Support configurable RSS hash functions via ethtool, from Eyal
    Perry.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits)
  Fix race condition between vxlan_sock_add and vxlan_sock_release
  net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header
  net/mlx4: Add support for A0 steering
  net/mlx4: Refactor QUERY_PORT
  net/mlx4_core: Add explicit error message when rule doesn't meet configuration
  net/mlx4: Add A0 hybrid steering
  net/mlx4: Add mlx4_bitmap zone allocator
  net/mlx4: Add a check if there are too many reserved QPs
  net/mlx4: Change QP allocation scheme
  net/mlx4_core: Use tasklet for user-space CQ completion events
  net/mlx4_core: Mask out host side virtualization features for guests
  net/mlx4_en: Set csum level for encapsulated packets
  be2net: Export tunnel offloads only when a VxLAN tunnel is created
  gianfar: Fix dma check map error when DMA_API_DEBUG is enabled
  cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call
  net: fec: only enable mdio interrupt before phy device link up
  net: fec: clear all interrupt events to support i.MX6SX
  net: fec: reset fep link status in suspend function
  net: sock: fix access via invalid file descriptor
  net: introduce helper macro for_each_cmsghdr
  ...
2014-12-11 14:27:06 -08:00
Daniel Borkmann 0cb6c969ed net, lib: kill arch_fast_hash library bits
As there are now no remaining users of arch_fast_hash(), lets kill
it entirely.

This basically reverts commit 71ae8aac3e ("lib: introduce arch
optimized hash library") and follow-up work, that is f.e., commit
237217546d ("lib: hash: follow-up fixups for arch hash"),
commit e3fec2f74f ("lib: Add missing arch generic-y entries for
asm-generic/hash.h") and last but not least commit 6a02652df5
("perf tools: Fix include for non x86 architectures").

Cc: Francesco Fusco <fusco@ntop.org>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 15:17:46 -05:00
Linus Torvalds 3eb5b893eb Merge branch 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 MPX support from Thomas Gleixner:
 "This enables support for x86 MPX.

  MPX is a new debug feature for bound checking in user space.  It
  requires kernel support to handle the bound tables and decode the
  bound violating instruction in the trap handler"

* 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  asm-generic: Remove asm-generic arch_bprm_mm_init()
  mm: Make arch_unmap()/bprm_mm_init() available to all architectures
  x86: Cleanly separate use of asm-generic/mm_hooks.h
  x86 mpx: Change return type of get_reg_offset()
  fs: Do not include mpx.h in exec.c
  x86, mpx: Add documentation on Intel MPX
  x86, mpx: Cleanup unused bound tables
  x86, mpx: On-demand kernel allocation of bounds tables
  x86, mpx: Decode MPX instruction to get bound violation information
  x86, mpx: Add MPX-specific mmap interface
  x86, mpx: Introduce VM_MPX to indicate that a VMA is MPX specific
  x86, mpx: Add MPX to disabled features
  ia64: Sync struct siginfo with general version
  mips: Sync struct siginfo with general version
  mpx: Extend siginfo structure to include bound violation information
  x86, mpx: Rename cfg_reg_u and status_reg
  x86: mpx: Give bndX registers actual names
  x86: Remove arbitrary instruction size limit in instruction decoder
2014-12-10 09:34:43 -08:00
Martin Schwidefsky 3519978101 s390/cputime: fix 31-bit compile
git commit 8461b63ca0
"s390: translate cputime magic constants to macros"
introduce a built error for 31-bit:

  kernel/built-in.o: In function `posix_cpu_timer_set':
  posix-cpu-timers.c:(.text+0x2a8cc): undefined reference to `__udivdi3'

The original code is actually broken for 31-bit and has been
corrected by the above commit by forcing the compiler to use
64-bit arithmetic through the CPUTIME_PER_USEC define.

To fix the compile error replace the 64-bit division with
a call to __div().

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-12-08 14:03:43 +01:00
Martin Schwidefsky 9de45f736f s390/mm: fix memory leak of ptlock in pmd_free_tlb
The pmd_free_tlb function fails to call pgtable_pmd_page_dtor.
Without the call the ptlock for the pmd tables will not be freed.
Add the missing call.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-12-08 09:42:40 +01:00
Frederic Weisbecker 8461b63ca0 s390: translate cputime magic constants to macros
Make the code more self-explanatory by naming magic constants.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-12-08 09:42:34 +01:00
Frederic Weisbecker 1ce2180498 s390/idle: convert open coded idle time seqcount
s390 uses open coded seqcount to synchronize idle time accounting.
Lets consolidate it with the standard API.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-12-08 09:42:32 +01:00
Christian Borntraeger 832a771034 s390/debug: avoid function call for debug_sprintf_*
debug_sprintf_event/exception are called even for debug events
with a disabling debug level. All other functions already do
the check in a wrapper function. Lets do the same here.
Due to the var_args the compiler rejects to make this function
inline. So let's wrap this via a macro.
This patch saves around 80 ns on my z196 for a KVM round trip (we
have two debug statements for entry and exit) when KVM is build as
a module.
The savings for built-in drivers is smaller as we then avoid the
PLT overhead for a function call.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-12-08 09:42:29 +01:00
Jens Freimann 383d0b0501 KVM: s390: handle pending local interrupts via bitmap
This patch adapts handling of local interrupts to be more compliant with
the z/Architecture Principles of Operation and introduces a data
structure
which allows more efficient handling of interrupts.

* get rid of li->active flag, use bitmap instead
* Keep interrupts in a bitmap instead of a list
* Deliver interrupts in the order of their priority as defined in the
  PoP
* Use a second bitmap for sigp emergency requests, as a CPU can have
  one request pending from every other CPU in the system.

Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-11-28 13:59:04 +01:00
Jens Freimann c0e6159d51 KVM: s390: add bitmap for handling cpu-local interrupts
Adds a bitmap to the vcpu structure which is used to keep track
of local pending interrupts. Also add enum with all interrupt
types sorted in order of priority (highest to lowest)

Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-11-28 13:59:04 +01:00
Jason J. Herne 9fcf93b5de KVM: S390: Create helper function get_guest_storage_key
Define get_guest_storage_key which can be used to get the value of a guest
storage key. This compliments the functionality provided by the helper function
set_guest_storage_key. Both functions are needed for live migration of s390
guests that use storage keys.

Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-11-28 13:58:48 +01:00
Thomas Huth 04b41acd06 KVM: s390: Fix rewinding of the PSW pointing to an EXECUTE instruction
A couple of our interception handlers rewind the PSW to the beginning
of the instruction to run the intercepted instruction again during the
next SIE entry. This normally works fine, but there is also the
possibility that the instruction did not get run directly but via an
EXECUTE instruction.
In this case, the PSW does not point to the instruction that caused the
interception, but to the EXECUTE instruction! So we've got to rewind the
PSW to the beginning of the EXECUTE instruction instead.
This is now accomplished with a new helper function kvm_s390_rewind_psw().

Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-11-28 12:32:56 +01:00
Heiko Carstens 57f2ffe14f s390: remove diag 44 calls from cpu_relax()
Simplify cpu_relax() to a simple barrier(). Performance wise this doesn't
seem to make any big difference anymore, since nearly all lock variants
have directed yield semantics in the meantime.
Also this makes s390 behave like all other architectures.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-11-28 09:47:49 +01:00
Heiko Carstens 7eed2e09ab s390/ftrace: provide working ftrace_return_address()
The common code ftrace_return_address(n), which is just a wrapper for
__builtin_return_address(n), will only work for n > 0 if CONFIG_FRAME_POINTER
is set to 'y'. Otherwise it will return 0.
Since on s390 we will never have that config option set to 'y'
ftrace_return_address() won't work at all for n > 0.

Luckily we always compile the kernel with -mkernel-backchain which
in turn means that __builtin_return_address(n) will always work.

So let ftrace_return_address(n) map to __builtin_return_address(n).

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-11-28 09:47:15 +01:00
Dave Hansen 62e88b1c00 mm: Make arch_unmap()/bprm_mm_init() available to all architectures
The x86 MPX patch set calls arch_unmap() and arch_bprm_mm_init()
from fs/exec.c, so we need at least a stub for them in all
architectures.  They are only called under an #ifdef for
CONFIG_MMU=y, so we can at least restict this to architectures
with MMU support.

blackfin/c6x have no MMU support, so do not call arch_unmap().
They also do not include mm_hooks.h or mmu_context.h at all and
do not need to be touched.

s390, um and unicore32 do not use asm-generic/mm_hooks.h, so got
their own arch_unmap() versions.  (I also moved um's
arch_dup_mmap() to be closer to the other mm_hooks.h functions).

xtensa only includes mm_hooks when MMU=y, which should be fine
since arch_unmap() is called only from MMU=y code.

For the rest, we use the stub copies of these functions in
asm-generic/mm_hook.h.

I cross compiled defconfigs for cris (to check NOMMU) and s390
to make sure that this works.  I also checked a 64-bit build
of UML and all my normal x86 builds including PARAVIRT on and
off.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: linux-arch@vger.kernel.org
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20141118182350.8B4AA2C2@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-19 11:54:13 +01:00
Sebastian Ott afaa7d29bc s390/irq: use irq 0
Irq 0 is currently unused on s390. Since there is no reason to
do this start counting at the beginning and gain an additional
irq. Also correctly report the smallest usable irq number for
dynamic allocation.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-11-18 18:23:03 +01:00
Frank Blaschka 99e97b7106 s390/io: add ioport_map stubs
add ioport_map stubs to make vfio build on s390.

Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-11-18 18:23:01 +01:00
Arnd Bergmann 1c8d29696f Merge branch 'io' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into asm-generic
* 'io' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux:
  documentation: memory-barriers: clarify relaxed io accessor semantics
  x86: io: implement dummy relaxed accessor macros for writes
  tile: io: implement dummy relaxed accessor macros for writes
  sparc: io: implement dummy relaxed accessor macros for writes
  powerpc: io: implement dummy relaxed accessor macros for writes
  parisc: io: implement dummy relaxed accessor macros for writes
  mn10300: io: implement dummy relaxed accessor macros for writes
  m68k: io: implement dummy relaxed accessor macros for writes
  m32r: io: implement dummy relaxed accessor macros for writes
  ia64: io: implement dummy relaxed accessor macros for writes
  cris: io: implement dummy relaxed accessor macros for writes
  frv: io: implement dummy relaxed accessor macros for writes
  xtensa: io: remove dummy relaxed accessor macros for reads
  s390: io: remove dummy relaxed accessor macros for reads
  microblaze: io: remove dummy relaxed accessor macros
  asm-generic: io: implement relaxed accessor macros as conditional wrappers

Conflicts:
	include/asm-generic/io.h

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2014-11-11 19:55:45 +01:00
Thierry Reding 4707a341b4 /dev/mem: Use more consistent data types
The xlate_dev_{kmem,mem}_ptr() functions take either a physical address
or a kernel virtual address, so data types should be phys_addr_t and
void *. They both return a kernel virtual address which is only ever
used in calls to copy_{from,to}_user(), so make variables that store it
void * rather than char * for consistency.

Also only define a weak unxlate_dev_mem_ptr() function if architectures
haven't overridden them in the asm/io.h header file.

Signed-off-by: Thierry Reding <treding@nvidia.com>
2014-11-10 15:59:21 +01:00
Martin Schwidefsky 5b9f2081e0 s390/pci: add sparse annotations
Fix the following warnings from the sparse code checker:

arch/s390/include/asm/pci_io.h:165:49: warning: cast removes address space of expression
arch/s390/pci/pci.c:476:44: warning: cast removes address space of expression
arch/s390/pci/pci.c:491:36: warning: incorrect type in argument 2 (different address spaces)
arch/s390/pci/pci.c:491:36:    expected void [noderef] <asn:2>*addr
arch/s390/pci/pci.c:491:36:    got void *<noident>

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-11-03 13:30:23 +01:00
Sebastian Ott b19148f6e2 s390/pci: improve irq number check for msix
s390s arch_setup_msi_irqs function ensures that we don't return with
more irqs than the PCI architecture allows and that a single PCI
function doesn't consume more irqs than the kernel is configured for.

At least the last check doesn't help much and should take the sum of
all irqs into account. Since that's already done by irq_alloc_desc
we can remove this check.

As for the first check we should use the value provided by the
firmware which can be less than what the PCI architecture allows.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-11-03 13:30:12 +01:00
Martin Schwidefsky f318a1229b s390/cmpxchg: use compiler builtins
The kernel build for s390 fails for gcc compilers with version 3.x,
set the minimum required version of gcc to version 4.3.

As the atomic builtins are available with all gcc 4.x compilers,
use the __sync_val_compare_and_swap and __sync_bool_compare_and_swap
functions to replace the complex macro and inline assembler magic
in include/asm/cmpxchg.h. The compiler can just-do-it and generates
better code with the builtins.

While we are at it use __sync_bool_compare_and_swap for the
_raw_compare_and_swap function in the spinlock code as well.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-11-03 13:29:47 +01:00
David Hildenbrand 42cb0c9ff9 KVM: s390: sigp: instruction counters for all sigp orders
This patch introduces instruction counters for all known sigp orders and also a
separate one for unknown orders that are passed to user space.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-10-28 13:09:13 +01:00
David Hildenbrand b898383082 KVM: s390: sigp: separate preparation handlers
This patch introduces in preparation for further code changes separate handler
functions for:
- SIGP (RE)START - will not be allowed to terminate pending orders
- SIGP (INITIAL) CPU RESET - will be allowed to terminate certain pending orders
- unknown sigp orders

All sigp orders that require user space intervention are logged.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-10-28 13:09:13 +01:00
Thomas Huth a6b7e459ff KVM: s390: Make the simple ipte mutex specific to a VM instead of global
The ipte-locking should be done for each VM seperately, not globally.
This way we avoid possible congestions when the simple ipte-lock is used
and multiple VMs are running.

Suggested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-10-28 13:08:59 +01:00
Martin Schwidefsky fcbe08d66f s390/mm: pmdp_get_and_clear_full optimization
Analog to ptep_get_and_clear_full define a variant of the
pmpd_get_and_clear primitive which gets the full hint from the
mmu_gather struct. This allows s390 to avoid a costly instruction
when destroying an address space.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-27 13:27:30 +01:00
Heiko Carstens c933146a5e s390/ftrace,kprobes: allow to patch first instruction
If the function tracer is enabled, allow to set kprobes on the first
instruction of a function (which is the function trace caller):

If no kprobe is set handling of enabling and disabling function tracing
of a function simply patches the first instruction. Either it is a nop
(right now it's an unconditional branch, which skips the mcount block),
or it's a branch to the ftrace_caller() function.

If a kprobe is being placed on a function tracer calling instruction
we encode if we actually have a nop or branch in the remaining bytes
after the breakpoint instruction (illegal opcode).
This is possible, since the size of the instruction used for the nop
and branch is six bytes, while the size of the breakpoint is only
two bytes.
Therefore the first two bytes contain the illegal opcode and the last
four bytes contain either "0" for nop or "1" for branch. The kprobes
code will then execute/simulate the correct instruction.

Instruction patching for kprobes and function tracer is always done
with stop_machine(). Therefore we don't have any races where an
instruction is patched concurrently on a different cpu.
Besides that also the program check handler which executes the function
trace caller instruction won't be executed concurrently to any
stop_machine() execution.

This allows to keep full fault based kprobes handling which generates
correct pt_regs contents automatically.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-27 13:27:27 +01:00
Dominik Dingel 3ac8e38015 s390/mm: disable KSM for storage key enabled pages
When storage keys are enabled unmerge already merged pages and prevent
new pages from being merged.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-27 13:27:26 +01:00
Dominik Dingel 2faee8ff9d s390/mm: prevent and break zero page mappings in case of storage keys
As soon as storage keys are enabled we need to stop working on zero page
mappings to prevent inconsistencies between storage keys and pgste.

Otherwise following data corruption could happen:
1) guest enables storage key
2) guest sets storage key for not mapped page X
   -> change goes to PGSTE
3) guest reads from page X
   -> as X was not dirty before, the page will be zero page backed,
      storage key from PGSTE for X will go to storage key for zero page
4) guest sets storage key for not mapped page Y (same logic as above
5) guest reads from page Y
   -> as Y was not dirty before, the page will be zero page backed,
      storage key from PGSTE for Y will got to storage key for zero page
      overwriting storage key for X

While holding the mmap sem, we are safe against changes on entries we
already fixed, as every fault would need to take the mmap_sem (read).

Other vCPUs executing storage key instructions will get a one time interception
and be serialized also with mmap_sem.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-27 13:27:25 +01:00
Dominik Dingel a13cff318c s390/mm: recfactor global pgste updates
Replace the s390 specific page table walker for the pgste updates
with a call to the common code walk_page_range function.
There are now two pte modification functions, one for the reset
of the CMMA state and another one for the initialization of the
storage keys.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-27 13:27:23 +01:00
Will Deacon 916136b374 s390: io: remove dummy relaxed accessor macros for reads
These are now defined by asm-generic/io.h, so we don't need the private
definitions anymore.

Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2014-10-20 18:49:17 +01:00
Linus Torvalds 0429fbc0bd Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu consistent-ops changes from Tejun Heo:
 "Way back, before the current percpu allocator was implemented, static
  and dynamic percpu memory areas were allocated and handled separately
  and had their own accessors.  The distinction has been gone for many
  years now; however, the now duplicate two sets of accessors remained
  with the pointer based ones - this_cpu_*() - evolving various other
  operations over time.  During the process, we also accumulated other
  inconsistent operations.

  This pull request contains Christoph's patches to clean up the
  duplicate accessor situation.  __get_cpu_var() uses are replaced with
  with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr().

  Unfortunately, the former sometimes is tricky thanks to C being a bit
  messy with the distinction between lvalues and pointers, which led to
  a rather ugly solution for cpumask_var_t involving the introduction of
  this_cpu_cpumask_var_ptr().

  This converts most of the uses but not all.  Christoph will follow up
  with the remaining conversions in this merge window and hopefully
  remove the obsolete accessors"

* 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits)
  irqchip: Properly fetch the per cpu offset
  percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix
  ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write.
  percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t
  Revert "powerpc: Replace __get_cpu_var uses"
  percpu: Remove __this_cpu_ptr
  clocksource: Replace __this_cpu_ptr with raw_cpu_ptr
  sparc: Replace __get_cpu_var uses
  avr32: Replace __get_cpu_var with __this_cpu_write
  blackfin: Replace __get_cpu_var uses
  tile: Use this_cpu_ptr() for hardware counters
  tile: Replace __get_cpu_var uses
  powerpc: Replace __get_cpu_var uses
  alpha: Replace __get_cpu_var
  ia64: Replace __get_cpu_var uses
  s390: cio driver &__get_cpu_var replacements
  s390: Replace __get_cpu_var uses
  mips: Replace __get_cpu_var uses
  MIPS: Replace __get_cpu_var uses in FPU emulator.
  arm: Replace __this_cpu_ptr with raw_cpu_ptr
  ...
2014-10-15 07:48:18 +02:00
Linus Torvalds 1ee07ef6b5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
 "This patch set contains the main portion of the changes for 3.18 in
  regard to the s390 architecture.  It is a bit bigger than usual,
  mainly because of a new driver and the vector extension patches.

  The interesting bits are:
   - Quite a bit of work on the tracing front.  Uprobes is enabled and
     the ftrace code is reworked to get some of the lost performance
     back if CONFIG_FTRACE is enabled.
   - To improve boot time with CONFIG_DEBIG_PAGEALLOC, support for the
     IPTE range facility is added.
   - The rwlock code is re-factored to improve writer fairness and to be
     able to use the interlocked-access instructions.
   - The kernel part for the support of the vector extension is added.
   - The device driver to access the CD/DVD on the HMC is added, this
     will hopefully come in handy to improve the installation process.
   - Add support for control-unit initiated reconfiguration.
   - The crypto device driver is enhanced to enable the additional AP
     domains and to allow the new crypto hardware to be used.
   - Bug fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (39 commits)
  s390/ftrace: simplify enabling/disabling of ftrace_graph_caller
  s390/ftrace: remove 31 bit ftrace support
  s390/kdump: add support for vector extension
  s390/disassembler: add vector instructions
  s390: add support for vector extension
  s390/zcrypt: Toleration of new crypto hardware
  s390/idle: consolidate idle functions and definitions
  s390/nohz: use a per-cpu flag for arch_needs_cpu
  s390/vtime: do not reset idle data on CPU hotplug
  s390/dasd: add support for control unit initiated reconfiguration
  s390/dasd: fix infinite loop during format
  s390/mm: make use of ipte range facility
  s390/setup: correct 4-level kernel page table detection
  s390/topology: call set_sched_topology early
  s390/uprobes: architecture backend for uprobes
  s390/uprobes: common library for kprobes and uprobes
  s390/rwlock: use the interlocked-access facility 1 instructions
  s390/rwlock: improve writer fairness
  s390/rwlock: remove interrupt-enabling rwlock variant.
  s390/mm: remove change bit override support
  ...
2014-10-14 03:47:00 +02:00
Linus Torvalds faafcba3b5 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Optimized support for Intel "Cluster-on-Die" (CoD) topologies (Dave
     Hansen)

   - Various sched/idle refinements for better idle handling (Nicolas
     Pitre, Daniel Lezcano, Chuansheng Liu, Vincent Guittot)

   - sched/numa updates and optimizations (Rik van Riel)

   - sysbench speedup (Vincent Guittot)

   - capacity calculation cleanups/refactoring (Vincent Guittot)

   - Various cleanups to thread group iteration (Oleg Nesterov)

   - Double-rq-lock removal optimization and various refactorings
     (Kirill Tkhai)

   - various sched/deadline fixes

  ... and lots of other changes"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
  sched/dl: Use dl_bw_of() under rcu_read_lock_sched()
  sched/fair: Delete resched_cpu() from idle_balance()
  sched, time: Fix build error with 64 bit cputime_t on 32 bit systems
  sched: Improve sysbench performance by fixing spurious active migration
  sched/x86: Fix up typo in topology detection
  x86, sched: Add new topology for multi-NUMA-node CPUs
  sched/rt: Use resched_curr() in task_tick_rt()
  sched: Use rq->rd in sched_setaffinity() under RCU read lock
  sched: cleanup: Rename 'out_unlock' to 'out_free_new_mask'
  sched: Use dl_bw_of() under RCU read lock
  sched/fair: Remove duplicate code from can_migrate_task()
  sched, mips, ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW
  sched: print_rq(): Don't use tasklist_lock
  sched: normalize_rt_tasks(): Don't use _irqsave for tasklist_lock, use task_rq_lock()
  sched: Fix the task-group check in tg_has_rt_tasks()
  sched/fair: Leverage the idle state info when choosing the "idlest" cpu
  sched: Let the scheduler see CPU idle states
  sched/deadline: Fix inter- exclusive cpusets migrations
  sched/deadline: Clear dl_entity params when setscheduling to different class
  sched/numa: Kill the wrong/dead TASK_DEAD check in task_numa_fault()
  ...
2014-10-13 16:23:15 +02:00
Linus Torvalds 754c780953 Merge branch 'for-v3.18' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping
Pull dma-mapping update from Marek Szyprowski:
 "Provide the dma write coherent api (available previously on ARM
  architecture) for all other architectures, which use dma_ops-based dma
  mapping implementation.

  This lets one to use the same code in the device drivers regardless of
  the selected architecture"

* 'for-v3.18' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
  dma-mapping: Provide write-combine allocations
  s390: Implement dma_{alloc,free}_attrs()
2014-10-10 16:56:08 -04:00
Linus Torvalds afa3536be8 Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
 "Main changes:

  - Fix the deadlock reported by Dave Jones et al
  - Clean up and fix nohz_full interaction with arch abilities
  - nohz init code consolidation/cleanup"

* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  nohz: nohz full depends on irq work self IPI support
  nohz: Consolidate nohz full init code
  arm64: Tell irq work about self IPI support
  arm: Tell irq work about self IPI support
  x86: Tell irq work about self IPI support
  irq_work: Force raised irq work to run on irq work interrupt
  irq_work: Introduce arch_irq_work_has_interrupt()
  nohz: Move nohz full init call to tick init
2014-10-09 06:30:57 -04:00
Heiko Carstens 53255c9a4d s390/ftrace: remove 31 bit ftrace support
31 bit and 64 bit diverge more and more and it is rather painful
to keep both parts running.
To make things simpler just remove the 31 bit support which nobody
uses anyway.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-09 09:14:18 +02:00
Michael Holzheu a62bc07392 s390/kdump: add support for vector extension
With this patch for kdump the s390 vector registers are stored into the
prepared save areas in the old kernel and into the REGSET_VX_LOW and
REGSET_VX_HIGH ELF notes for /proc/vmcore in the new kernel.

The NT_S390_VXRS_LOW note contains the lower halves of the first 16 vector
registers 0-15. The higher halves are stored in the floating point register
ELF note.  The NT_S390_VXRS_HIGH contains the full vector registers 16-31.

The kernel provides a save area for storing vector register in case of
machine checks. A pointer to this save are is stored in the CPU lowcore
at offset 0x11b0. This save area is also used to save the registers for
kdump. In case of a dumped crashed kdump those areas are used to extract
the registers of the production system.

The vector registers for remote CPUs are stored using the "store additional
status at address" SIGP. For the dump CPU the vector registers are stored
with the VSTM instruction.

With this patch also zfcpdump stores the vector registers.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-09 09:14:16 +02:00
Martin Schwidefsky 3585cb0280 s390/disassembler: add vector instructions
Add the instruction introduced with the vector extension to the in-kernel
disassembler.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-09 09:14:15 +02:00
Martin Schwidefsky 8070361799 s390: add support for vector extension
The vector extension introduces 32 128-bit vector registers and a set of
instruction to operate on the vector registers.

The kernel can control the use of vector registers for the problem state
program with a bit in control register 0. Once enabled for a process the
kernel needs to retain the content of the vector registers on context
switch. The signal frame is extended to include the vector registers.
Two new register sets NT_S390_VXRS_LOW and NT_S390_VXRS_HIGH are added
to the regset interface for the debugger and core dumps.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-09 09:14:13 +02:00
Martin Schwidefsky b5f87f15e2 s390/idle: consolidate idle functions and definitions
Move the C functions and definitions related to the idle state handling
to arch/s390/include/asm/idle.h and arch/s390/kernel/idle.c. The function
s390_get_idle_time is renamed to arch_cpu_idle_time and vtime_stop_cpu to
enabled_wait.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-09 09:14:03 +02:00
Martin Schwidefsky fe0f49768d s390/nohz: use a per-cpu flag for arch_needs_cpu
Move the nohz_delay bit from the s390_idle data structure to the
per-cpu flags. Clear the nohz delay flag in __cpu_disable and
remove the cpu hotplug notifier that used to do this.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-09 09:14:02 +02:00
Linus Torvalds e4e65676f2 Fixes and features for 3.18.
Apart from the usual cleanups, here is the summary of new features:
 
 - s390 moves closer towards host large page support
 
 - PowerPC has improved support for debugging (both inside the guest and
   via gdbstub) and support for e6500 processors
 
 - ARM/ARM64 support read-only memory (which is necessary to put firmware
   in emulated NOR flash)
 
 - x86 has the usual emulator fixes and nested virtualization improvements
   (including improved Windows support on Intel and Jailhouse hypervisor
   support on AMD), adaptive PLE which helps overcommitting of huge guests.
   Also included are some patches that make KVM more friendly to memory
   hot-unplug, and fixes for rare caching bugs.
 
 Two patches have trivial mm/ parts that were acked by Rik and Andrew.
 
 Note: I will soon switch to a subkey for signing purposes.  To verify
 future signed pull requests from me, please update my key with
 "gpg --recv-keys 9B4D86F2".  You should see 3 new subkeys---the
 one for signing will be a 2048-bit RSA key, 4E6B09D7.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJUL5sPAAoJEBvWZb6bTYbyfkEP/3MNhSyn6HCjPjtjLNPAl9KL
 WpExZSUFL2+4CztpdGIsek1BeJYHmqv3+c5S+WvaWVA1aqh2R7FT1D1ErBLjgLQq
 lq23IOr+XxmC3dXQUEEk+TlD+283UzypzEG4l4UD3JYg79fE3UrXAz82SeyewJDY
 x7aPYhkZG3RHu+wAyMPasG6E3zS5LySdUtGWbiPwz5BejrhBJoJdeb2WIL/RwnUK
 7ppSLB5EoFj/uMkuyeAAdAbdfSrhHA6faDZxNdxS9k9wGutrhhfUoQ49ONrKG4dV
 sFo1tSPTVgRs8QFYUZ2fJUPBAmUVddsgqh2K9d0NftGTq7b8YszaCsfFrs2/Y4MU
 YxssWEhxsfszerCu12bbAJrv6JBZYQ7TwGvI9L7P0iFU6IVw/djmukU4AkM9/e91
 YS/cue/PN+9Pn2ccXzL9J7xRtZb8FsOuRsCXTCmbOwDkLmrKPDBN2t3RUbeF+Eam
 ABrpWnLKX13kZSo4LKU+/niarzmPMp7odQfHVdr8ea0fiYLp4iN8puA20WaSPIgd
 CLvm+RAvXe5Lm91L4mpFotJ2uFyK6QlIYJV4FsgeWv/0D0qppWQi0Utb/aCNHCgy
 z8MyUMD48y7EpoQrFYr/7cddXIu0/NegnM8I1coVjIPEk4NfeebGUlCJ/V3D8wMG
 BgEfS2x6jRc5zB3hjwDr
 =iEVi
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "Fixes and features for 3.18.

  Apart from the usual cleanups, here is the summary of new features:

   - s390 moves closer towards host large page support

   - PowerPC has improved support for debugging (both inside the guest
     and via gdbstub) and support for e6500 processors

   - ARM/ARM64 support read-only memory (which is necessary to put
     firmware in emulated NOR flash)

   - x86 has the usual emulator fixes and nested virtualization
     improvements (including improved Windows support on Intel and
     Jailhouse hypervisor support on AMD), adaptive PLE which helps
     overcommitting of huge guests.  Also included are some patches that
     make KVM more friendly to memory hot-unplug, and fixes for rare
     caching bugs.

  Two patches have trivial mm/ parts that were acked by Rik and Andrew.

  Note: I will soon switch to a subkey for signing purposes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (157 commits)
  kvm: do not handle APIC access page if in-kernel irqchip is not in use
  KVM: s390: count vcpu wakeups in stat.halt_wakeup
  KVM: s390/facilities: allow TOD-CLOCK steering facility bit
  KVM: PPC: BOOK3S: HV: CMA: Reserve cma region only in hypervisor mode
  arm/arm64: KVM: Report correct FSC for unsupported fault types
  arm/arm64: KVM: Fix VTTBR_BADDR_MASK and pgd alloc
  kvm: Fix kvm_get_page_retry_io __gup retval check
  arm/arm64: KVM: Fix set_clear_sgi_pend_reg offset
  kvm: x86: Unpin and remove kvm_arch->apic_access_page
  kvm: vmx: Implement set_apic_access_page_addr
  kvm: x86: Add request bit to reload APIC access page address
  kvm: Add arch specific mmu notifier for page invalidation
  kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and make it non-static
  kvm: Fix page ageing bugs
  kvm/x86/mmu: Pass gfn and level to rmapp callback.
  x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-only
  kvm: x86: use macros to compute bank MSRs
  KVM: x86: Remove debug assertion of non-PAE reserved bits
  kvm: don't take vcpu mutex for obviously invalid vcpu ioctls
  kvm: Faults which trigger IO release the mmap_sem
  ...
2014-10-08 05:27:39 -04:00
Rik van Riel 347abad981 sched, time: Fix build error with 64 bit cputime_t on 32 bit systems
On 32 bit systems cmpxchg cannot handle 64 bit values, so
some additional magic is required to allow a 32 bit system
with CONFIG_VIRT_CPU_ACCOUNTING_GEN=y enabled to build.

Make sure the correct cmpxchg function is used when doing
an atomic swap of a cputime_t.

Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: umgwanakikbuti@gmail.com
Cc: fweisbec@gmail.com
Cc: srao@redhat.com
Cc: lwoodman@redhat.com
Cc: atheurer@redhat.com
Cc: oleg@redhat.com
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linux390@de.ibm.com
Cc: linux-arch@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Link: http://lkml.kernel.org/r/20140930155947.070cdb1f@annuminas.surriel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-03 05:46:55 +02:00
David Hildenbrand ce2e4f0b75 KVM: s390: count vcpu wakeups in stat.halt_wakeup
This patch introduces the halt_wakeup counter used by common code and uses it to
count vcpu wakeups done in s390 arch specific code.

Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-10-01 14:42:14 +02:00
Heiko Carstens cfb0b24143 s390/mm: make use of ipte range facility
Invalidate several pte entries at once if the ipte range facility
is available. Currently this works only for DEBUG_PAGE_ALLOC where
several up to 2 ^ MAX_ORDER may be invalidated at once.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-09-30 10:19:29 +02:00
Jan Willeke 2a0a5b2299 s390/uprobes: architecture backend for uprobes
Signed-off-by: Jan Willeke <willeke@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-09-25 10:52:17 +02:00
Jan Willeke 975fab1739 s390/uprobes: common library for kprobes and uprobes
This patch moves common functions from kprobes.c to probes.c.
Thus its possible for uprobes to use them without enabling kprobes.

Signed-off-by: Jan Willeke <willeke@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-09-25 10:52:14 +02:00
Martin Schwidefsky bbae71bf9c s390/rwlock: use the interlocked-access facility 1 instructions
Make use of the load-and-add, load-and-or and load-and-and instructions
to atomically update the read-write lock without a compare-and-swap loop.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-09-25 10:52:13 +02:00
Martin Schwidefsky 2684e73a86 s390/rwlock: remove interrupt-enabling rwlock variant.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-09-25 10:52:10 +02:00
Heiko Carstens 6a5c1482e2 s390/mm: remove change bit override support
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-09-25 10:52:09 +02:00
Martin Schwidefsky d59b93da5e s390/rwlock: use directed yield for write-locked rwlocks
Add an owner field to the arch_rwlock_t to be able to pass the timeslice
of a virtual CPU with diagnose 0x9c to the lock owner in case the rwlock
is write-locked. The undirected yield in case the rwlock is acquired
writable but the lock is read-locked is removed.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-09-25 10:52:05 +02:00
Ralf Hoppe 8f933b1043 s390/hmcdrv: HMC drive CD/DVD access
This device driver allows accessing a HMC drive CD/DVD-ROM.
It can be used in a LPAR and z/VM environment.

Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Ralf Hoppe <rhoppe@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-09-25 10:52:02 +02:00
Peter Zijlstra c5c38ef3d7 irq_work: Introduce arch_irq_work_has_interrupt()
The nohz full code needs irq work to trigger its own interrupt so that
the subsystem can work even when the tick is stopped.

Lets introduce arch_irq_work_has_interrupt() that archs can override to
tell about their support for this ability.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-09-13 18:38:07 +02:00
Tony Krowiak 5102ee8795 KVM: CPACF: Enable MSA4 instructions for kvm guest
We have to provide a per guest crypto block for the CPUs to
enable MSA4 instructions. According to icainfo on z196 or
later this enables CCM-AES-128, CMAC-AES-128, CMAC-AES-192
and CMAC-AES-256.

Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[split MSA4/protected key into two patches]
2014-09-10 12:19:05 +02:00
Heiko Carstens 4423028203 s390/spinlock: optimize spin_unlock code
Use a memory barrier + store sequence instead of a load + compare and swap
sequence to unlock a spinlock and an rw lock.
For the spinlock case this saves us two memory reads and a not needed cpu
serialization after the compare and swap instruction stored the new value.

The kernel size (performance_defconfig) gets reduced by ~14k.

Average execution time of a tight inlined spin_unlock loop drops from
5.8ns to 0.7ns on a zEC12 machine.

An artificial stress test case where several counters are protected with
a single spinlock and which are only incremented while holding the spinlock
shows ~30% improvement on a 4 cpu machine.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-09-09 08:53:30 +02:00
Heiko Carstens 3d1e220d08 s390/ftrace: optimize mcount code
Reduce the number of executed instructions within the mcount block if
function tracing is enabled. We achieve that by using a non-standard
C function call ABI. Since the called function is also written in
assembler this is not a problem.
This also allows to replace the unconditional store at the beginning
of the mcount block with a larl instruction, which doesn't touch
memory.

In theory we could also patch the first instruction of the mcount block
to enable and disable function tracing. However this would break kprobes.
This could be fixed with implementing the "kprobes_on_ftrace" feature;
however keeping the odd jprobes working seems not to be possible without
a lot of code churn. Therefore keep the code easy and simply accept one
wasted 1-cycle "larl" instruction per function prologue.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-09-09 08:53:30 +02:00
Heiko Carstens 10dec7dbd5 s390/ftrace: add HAVE_DYNAMIC_FTRACE_WITH_REGS support
This code is based on a patch from Vojtech Pavlik.
http://marc.info/?l=linux-s390&m=140438885114413&w=2

The actual implementation now differs significantly:
Instead of adding a second function "ftrace_regs_caller" which would be nearly
identical to the existing ftrace_caller function, the current ftrace_caller
function is now an alias to ftrace_regs_caller and always passes the needed
pt_regs structure and function_trace_op parameters unconditionally.

Besides that also use asm offsets to correctly allocate and access the new
struct pt_regs on the stack.

While at it we can make use of new instruction to get rid of some indirect
loads if compiled for new machines.

The passed struct pt_regs can be changed by the called function and it's new
contents will replace the current contents.

Note: to change the return address the embedded psw member of the pt_regs
structure must be changed. The psw member is right now incomplete, since
the mask part is missing. For all current use cases this should be sufficent.
Providing and restoring a sane mask would mean we need to add an epsw/lpswe
pair to the mcount code. Only these two instruction would cost us ~120 cycles
which currently seems not necessary.

Cc: Vojtech Pavlik <vojtech@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-09-09 08:53:28 +02:00
Heiko Carstens 2481a87b02 s390/ftrace: optimize function graph caller code
When the function graph tracer is disabled we can skip three additional
instructions. So let's just do this.

So if function tracing is enabled but function graph tracing is
runtime disabled, we get away with a single unconditional branch.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-09-09 08:53:28 +02:00
Martin Schwidefsky b7eacb59cd s390/vdso: add vdso support for coarse clocks
Add CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE optimization to
the 64-bit and 31-bit vdso.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-09-09 08:53:27 +02:00
Heiko Carstens b7d5006de1 s390: remove unused MACHINE_FLAG_RRBM
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-09-09 08:53:26 +02:00
Linus Torvalds 35af25616c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
 "A bug fix for the vdso code, the loadparm for booting from SCSI is
  added and the access permissions for the dasd module parameters are
  corrected"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/vdso: remove NULL pointer check from clock_gettime
  s390/ipl: Add missing SCSI loadparm attributes to /sys/firmware
  s390/dasd: Make module parameter visible in sysfs
2014-09-08 08:27:00 -07:00
Christian Borntraeger 1951497d90 KVM: s390/mm: Fix guest storage key corruption in ptep_set_access_flags
commit 0944fe3f4a ("s390/mm: implement software referenced bits")
triggered another paging/storage key corruption. There is an
unhandled invalid->valid pte change where we have to set the real
storage key from the pgste.
When doing paging a guest page might be swapcache or swap and when
faulted in it might be read-only and due to a parallel scan old.
An do_wp_page will make it writeable and young. Due to software
reference tracking this page was invalid and now becomes valid.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: stable@vger.kernel.org # v3.12+
2014-09-02 10:30:43 +02:00
Christian Borntraeger 3e03d4c46d KVM: s390/mm: Fix storage key corruption during swapping
Since 3.12 or more precisely  commit 0944fe3f4a ("s390/mm:
implement software referenced bits") guest storage keys get
corrupted during paging. This commit added another valid->invalid
translation for page tables - namely ptep_test_and_clear_young.
We have to transfer the storage key into the pgste in that case.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: stable@vger.kernel.org # v3.12+
2014-09-02 10:30:43 +02:00
Michael Holzheu 6992860167 s390/ipl: Add missing SCSI loadparm attributes to /sys/firmware
Currently the loadparm is only supported for CCW IPL. But also for SCSI
IPL it can be specified either on the HMC load panel respectively
z/VM console or via diagnose 308.

So fix this for SCSI and add the required sysfs attributes for reading the
IPL loadparm and for setting the loadparm for re-IPL.

With this patch the following two sysfs attributes are introduced:

 - /sys/firmware/ipl/loadparm (for system that have been IPLed from SCSI)
 - /sys/firmware/reipl/fcp/loadparm

Because the loadparm is now available for SCSI and CCW it is moved
now from "struct ipl_block_ccw" to the generic "struct ipl_list_hdr".

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-09-01 09:56:29 +02:00
Radim Krčmář 13a34e067e KVM: remove garbage arg to *hardware_{en,dis}able
In the beggining was on_each_cpu(), which required an unused argument to
kvm_arch_ops.hardware_{en,dis}able, but this was soon forgotten.

Remove unnecessary arguments that stem from this.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-29 16:35:55 +02:00
Radim Krčmář 0865e636ae KVM: static inline empty kvm_arch functions
Using static inline is going to save few bytes and cycles.
For example on powerpc, the difference is 700 B after stripping.
(5 kB before)

This patch also deals with two overlooked empty functions:
kvm_arch_flush_shadow was not removed from arch/mips/kvm/mips.c
  2df72e9bc KVM: split kvm_arch_flush_shadow
and kvm_arch_sched_in never made it into arch/ia64/kvm/kvm-ia64.c.
  e790d9ef6 KVM: add kvm_arch_sched_in

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-29 16:35:55 +02:00
Paolo Bonzini 656473003b KVM: forward declare structs in kvm_types.h
Opaque KVM structs are useful for prototypes in asm/kvm_host.h, to avoid
"'struct foo' declared inside parameter list" warnings (and consequent
breakage due to conflicting types).

Move them from individual files to a generic place in linux/kvm_types.h.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-29 16:35:53 +02:00
Christoph Lameter eb7e7d7663 s390: Replace __get_cpu_var uses
__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x).  This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.

Other use cases are for storing and retrieving data from the current
processors percpu area.  __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.

__get_cpu_var() is defined as :

#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))

__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.

this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.

This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset.  Thereby address calculations are avoided and less registers
are used when code is generated.

At the end of the patch set all uses of __get_cpu_var have been removed so
the macro is removed too.

The patch set includes passes over all arches as well. Once these operations
are used throughout then specialized macros can be defined in non -x86
arches as well in order to optimize per cpu access by f.e.  using a global
register that may be set to the per cpu base.

Transformations done to __get_cpu_var()

1. Determine the address of the percpu instance of the current processor.

	DEFINE_PER_CPU(int, y);
	int *x = &__get_cpu_var(y);

    Converts to

	int *x = this_cpu_ptr(&y);

2. Same as #1 but this time an array structure is involved.

	DEFINE_PER_CPU(int, y[20]);
	int *x = __get_cpu_var(y);

    Converts to

	int *x = this_cpu_ptr(y);

3. Retrieve the content of the current processors instance of a per cpu
variable.

	DEFINE_PER_CPU(int, y);
	int x = __get_cpu_var(y)

   Converts to

	int x = __this_cpu_read(y);

4. Retrieve the content of a percpu struct

	DEFINE_PER_CPU(struct mystruct, y);
	struct mystruct x = __get_cpu_var(y);

   Converts to

	memcpy(&x, this_cpu_ptr(&y), sizeof(x));

5. Assignment to a per cpu variable

	DEFINE_PER_CPU(int, y)
	__get_cpu_var(y) = x;

   Converts to

	this_cpu_write(y, x);

6. Increment/Decrement etc of a per cpu variable

	DEFINE_PER_CPU(int, y);
	__get_cpu_var(y)++

   Converts to

	this_cpu_inc(y)

Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
CC: linux390@de.ibm.com
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-26 13:45:52 -04:00
Martin Schwidefsky f079e95214 KVM: s390/mm: remove outdated gmap data structures
The radix tree rework removed all code that uses the gmap_rmap
and gmap_pgtable data structures. Remove these outdated definitions.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-26 10:09:03 +02:00
Martin Schwidefsky c6c956b80b KVM: s390/mm: support gmap page tables with less than 5 levels
Add an addressing limit to the gmap address spaces and only allocate
the page table levels that are needed for the given limit. The limit
is fixed and can not be changed after a gmap has been created.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-26 10:09:03 +02:00
Martin Schwidefsky 527e30b41d KVM: s390/mm: use radix trees for guest to host mappings
Store the target address for the gmap segments in a radix tree
instead of using invalid segment table entries. gmap_translate
becomes a simple radix_tree_lookup, gmap_fault is split into the
address translation with gmap_translate and the part that does
the linking of the gmap shadow page table with the process page
table.
A second radix tree is used to keep the pointers to the segment
table entries for segments that are mapped in the guest address
space. On unmap of a segment the pointer is retrieved from the
radix tree and is used to carry out the segment invalidation in
the gmap shadow page table. As the radix tree can only store one
pointer, each host segment may only be mapped to exactly one
guest location.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-26 10:09:02 +02:00
Thierry Reding 90114d65fe s390: Implement dma_{alloc,free}_attrs()
The S390 architecture advertises support for HAVE_DMA_ATTRS when PCI is
enabled. Patches to unify some of the DMA API would like to rely on the
dma_alloc_attrs() and dma_free_attrs() functions to be provided when an
architecture supports DMA attributes.

Rename dma_alloc_coherent() and dma_free_coherent() to dma_alloc_attrs()
and dma_free_attrs() since they are functionally equivalent and alias
the former to the latter for compatibility.

For consistency with other architectures, also reuse the existing symbol
HAVE_DMA_ATTRS defined in arch/Kconfig instead of providing a duplicate.
Select it when PCI is enabled.

While at it, drop a redundant 'default n' from the PCI Kconfig symbol.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Acked-By: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2014-08-26 07:39:12 +02:00
Martin Schwidefsky 6e0a0431bf KVM: s390/mm: cleanup gmap function arguments, variable names
Make the order of arguments for the gmap calls more consistent,
if the gmap pointer is passed it is always the first argument.
In addition distinguish between guest address and user address
by naming the variables gaddr for a guest address and vmaddr for
a user address.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-25 14:35:58 +02:00
Martin Schwidefsky 9da4e38076 KVM: s390/mm: readd address parameter to gmap_do_ipte_notify
Revert git commit c3a23b9874c1 ("remove unnecessary parameter from
gmap_do_ipte_notify").

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-25 14:35:57 +02:00
Martin Schwidefsky 55dbbdd9a8 KVM: s390/mm: readd address parameter to pgste_ipte_notify
Revert git commit 1b7fd6952063 ("remove unecessary parameter from
pgste_ipte_notify")

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-25 14:35:57 +02:00
Andy Lutomirski a6c19dfe39 arm64,ia64,ppc,s390,sh,tile,um,x86,mm: remove default gate area
The core mm code will provide a default gate area based on
FIXADDR_USER_START and FIXADDR_USER_END if
!defined(__HAVE_ARCH_GATE_AREA) && defined(AT_SYSINFO_EHDR).

This default is only useful for ia64.  arm64, ppc, s390, sh, tile, 64-bit
UML, and x86_32 have their own code just to disable it.  arm, 32-bit UML,
and x86_64 have gate areas, but they have their own implementations.

This gets rid of the default and moves the code into ia64.

This should save some code on architectures without a gate area: it's now
possible to inline the gate_area functions in the default case.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Nathan Lynch <nathan_lynch@mentor.com>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [in principle]
Acked-by: Richard Weinberger <richard@nod.at> [for um]
Acked-by: Will Deacon <will.deacon@arm.com> [for arm64]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Nathan Lynch <Nathan_Lynch@mentor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:27 -07:00
Laura Abbott 308c09f17d lib/scatterlist: make ARCH_HAS_SG_CHAIN an actual Kconfig
Rather than have architectures #define ARCH_HAS_SG_CHAIN in an
architecture specific scatterlist.h, make it a proper Kconfig option and
use that instead.  At same time, remove the header files are are now
mostly useless and just include asm-generic/scatterlist.h.

[sfr@canb.auug.org.au: powerpc files now need asm/dma.h]
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>			[x86]
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	[powerpc]
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:26 -07:00
Linus Torvalds ebb067d2f4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
 "Mostly cleanups and bug-fixes, with two exceptions.

  The first is lazy flushing of I/O-TLBs for PCI to improve performance,
  the second is software dirty bits in the pmd for the madvise-free
  implementation"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (24 commits)
  s390/locking: Reenable optimistic spinning
  s390/mm: implement dirty bits for large segment table entries
  KVM: s390/mm: Fix page table locking vs. split pmd lock
  s390/dasd: fix camel case
  s390/3215: fix hanging console issue
  s390/irq: improve displayed interrupt order in /proc/interrupts
  s390/seccomp: fix error return for filtered system calls
  s390/pci: introduce lazy IOTLB flushing for DMA unmap
  dasd: fix error recovery for alias devices during format
  dasd: fix list_del corruption during format
  dasd: fix unresponsive device during format
  dasd: use aliases for formatted devices during format
  s390/pci: fix kmsg component
  s390/kdump: Return NOTIFY_OK for all actions other than MEM_GOING_OFFLINE
  s390/watchdog: Fix module name in Kconfig help text
  s390/dasd: replace seq_printf by seq_puts
  s390/dasd: replace pr_warning by pr_warn
  s390/dasd: Move EXPORT_SYMBOL after function/variable
  s390/dasd: remove unnecessary null test before debugfs_remove
  s390/zfcp: use qdio buffer helpers
  ...
2014-08-07 08:41:00 -07:00
Linus Torvalds 8efb90cf1e Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The main changes in this cycle are:

   - big rtmutex and futex cleanup and robustification from Thomas
     Gleixner
   - mutex optimizations and refinements from Jason Low
   - arch_mutex_cpu_relax() removal and related cleanups
   - smaller lockdep tweaks"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  arch, locking: Ciao arch_mutex_cpu_relax()
  locking/lockdep: Only ask for /proc/lock_stat output when available
  locking/mutexes: Optimize mutex trylock slowpath
  locking/mutexes: Try to acquire mutex only if it is unlocked
  locking/mutexes: Delete the MUTEX_SHOW_NO_WAITER macro
  locking/mutexes: Correct documentation on mutex optimistic spinning
  rtmutex: Make the rtmutex tester depend on BROKEN
  futex: Simplify futex_lock_pi_atomic() and make it more robust
  futex: Split out the first waiter attachment from lookup_pi_state()
  futex: Split out the waiter check from lookup_pi_state()
  futex: Use futex_top_waiter() in lookup_pi_state()
  futex: Make unlock_pi more robust
  rtmutex: Avoid pointless requeueing in the deadlock detection chain walk
  rtmutex: Cleanup deadlock detector debug logic
  rtmutex: Confine deadlock logic to futex
  rtmutex: Simplify remove_waiter()
  rtmutex: Document pi chain walk
  rtmutex: Clarify the boost/deboost part
  rtmutex: No need to keep task ref for lock owner check
  rtmutex: Simplify and document try_to_take_rtmutex()
  ...
2014-08-04 16:09:06 -07:00
Linus Torvalds 8533ce7271 These are the x86, MIPS and s390 changes; PPC and ARM will come in a
few days.
 
 MIPS and s390 have little going on this release; just bugfixes, some
 small, some larger.
 
 The highlights for x86 are nested VMX improvements (Jan Kiszka), optimizations
 for old processor (up to Nehalem, by me and Bandan Das), and a lot of x86
 emulator bugfixes (Nadav Amit).
 
 Stephen Rothwell reported a trivial conflict with the tracing branch.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJT300XAAoJEBvWZb6bTYby3V8QAJz+XyajnhJ8wH55Vxczz22L
 i2gtUGmBLhEXsBcaVKO4BBfek88lLzg0SGLjfW5wCMQmKtxVlrwTCXNkBoPGjapd
 NwHtWkMKym44PDhRovn7zkSumkxC43uFIBR/ebrhP6Bvhh9s+MnkQUxfw9ILB+YV
 EeKyEG8sSgxFCciuHbp3mIXpDcO6r/ldy6I7009OdyhLoMY+Kvmk7kRe9wtAivdg
 CGJi60QvGOn2RGRPOCEtF6UWr8Ae8fe1t84o0hkXPv/j3jtabzAatXKJa4dYNbIs
 7Mp4NQpxaGV6rq3WCYVeZRxGs+UReGDAS3Il4Z8C9eTOTooSfxdVr8acpM8PY6I8
 UmLT6ECLGycc4ELXrETtR+QLmiXACyJqyVxz4aiLV3kWSWfamKD3hBeQK9NizNcE
 VoPDl+PyISvR1tW4KstBuzfUWAEXi+gO78cqqFr/VW6cl7HKpA1DFQaPfGkYKDae
 2CPwcLwI5/M6RtSgkyXTkEqNZLc2BjldqSeM1lmWjhZVW56X2iqePUL46Vab3Yvt
 U+sELtwEE560NLN3hbaHUsLR1tcUix5w8vTzcXPxgoHQBszHCcAZTWd1XHulr64F
 rp/cangqtkPKcu5j1mNhQs38oLjHI1MUsbQrqFoD4tmHjQ75iXHRFzYGoIVKXyHG
 AnGbQzJzBcdAANhm3LW0
 =UXxV
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM changes from Paolo Bonzini:
 "These are the x86, MIPS and s390 changes; PPC and ARM will come in a
  few days.

  MIPS and s390 have little going on this release; just bugfixes, some
  small, some larger.

  The highlights for x86 are nested VMX improvements (Jan Kiszka),
  optimizations for old processor (up to Nehalem, by me and Bandan Das),
  and a lot of x86 emulator bugfixes (Nadav Amit).

  Stephen Rothwell reported a trivial conflict with the tracing branch"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (104 commits)
  x86/kvm: Resolve shadow warnings in macro expansion
  KVM: s390: rework broken SIGP STOP interrupt handling
  KVM: x86: always exit on EOIs for interrupts listed in the IOAPIC redir table
  KVM: vmx: remove duplicate vmx_mpx_supported() prototype
  KVM: s390: Fix memory leak on busy SIGP stop
  x86/kvm: Resolve shadow warning from min macro
  kvm: Resolve missing-field-initializers warnings
  Replace NR_VMX_MSR with its definition
  KVM: x86: Assertions to check no overrun in MSR lists
  KVM: x86: set rflags.rf during fault injection
  KVM: x86: Setting rflags.rf during rep-string emulation
  KVM: x86: DR6/7.RTM cannot be written
  KVM: nVMX: clean up nested_release_vmcs12 and code around it
  KVM: nVMX: fix lifetime issues for vmcs02
  KVM: x86: Defining missing x86 vectors
  KVM: x86: emulator injects #DB when RFLAGS.RF is set
  KVM: x86: Cleanup of rflags.rf cleaning
  KVM: x86: Clear rflags.rf on emulated instructions
  KVM: x86: popf emulation should not change RF
  KVM: x86: Clearing rflags.rf upon skipped emulated instruction
  ...
2014-08-04 12:16:46 -07:00
Martin Schwidefsky 152125b7a8 s390/mm: implement dirty bits for large segment table entries
The large segment table entry format has block of bits for the
ACC/F values for the large page. These bits are valid only if
another bit (AV bit 0x10000) of the segment table entry is set.
The ACC/F bits do not have a meaning if the AV bit is off.
This allows to put the THP splitting bit, the segment young bit
and the new segment dirty bit into the ACC/F bits as long as
the AV bit stays off. The dirty and young information is only
available if the pmd is large.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-08-01 14:55:05 +02:00
Jan Willeke dc295880c6 s390/seccomp: fix error return for filtered system calls
The syscall_set_return_value function of s390 negates the error argument
before storing the value to the return register gpr2. This is incorrect,
the seccomp code already passes the negative error value.
Store the unmodified error value to gpr2.

Signed-off-by: Jan Willeke <willeke@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-07-28 10:02:31 +02:00
Sebastian Ott 5245c924c2 s390/qdio: add helpers to manage qdio buffers
Users of qdio buffers employ different strategies to manage these
buffers. The qeth driver uses huge contiguous buffers which leads
to high order allocations with all their downsides.

This patch provides helpers to allocate, free, and reset arrays of
qdio buffers using non contiguous pages.

Reviewed-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-07-22 09:26:13 +02:00
David Hildenbrand ea74c0ea1b KVM: s390: remove the tasklet used by the hrtimer
We can get rid of the tasklet used for waking up a VCPU in the hrtimer
code but wakeup the VCPU directly.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-07-21 13:22:42 +02:00
David Hildenbrand 0759d0681c KVM: s390: cleanup handle_wait by reusing kvm_vcpu_block
This patch cleans up the code in handle_wait by reusing the common code
function kvm_vcpu_block.

signal_pending(), kvm_cpu_has_pending_timer() and kvm_arch_vcpu_runnable() are
sufficient for checking if we need to wake-up that VCPU. kvm_vcpu_block
uses these functions, so no checks are lost.

The flag "timer_due" can be removed - kvm_cpu_has_pending_timer() tests whether
the timer is pending, thus the vcpu is correctly woken up.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-07-21 13:22:16 +02:00
Davidlohr Bueso 3a6bfbc91d arch, locking: Ciao arch_mutex_cpu_relax()
The arch_mutex_cpu_relax() function, introduced by 34b133f, is
hacky and ugly. It was added a few years ago to address the fact
that common cpu_relax() calls include yielding on s390, and thus
impact the optimistic spinning functionality of mutexes. Nowadays
we use this function well beyond mutexes: rwsem, qrwlock, mcs and
lockref. Since the macro that defines the call is in the mutex header,
any users must include mutex.h and the naming is misleading as well.

This patch (i) renames the call to cpu_relax_lowlatency  ("relax, but
only if you can do it with very low latency") and (ii) defines it in
each arch's asm/processor.h local header, just like for regular cpu_relax
functions. On all archs, except s390, cpu_relax_lowlatency is simply cpu_relax,
and thus we can take it out of mutex.h. While this can seem redundant,
I believe it is a good choice as it allows us to move out arch specific
logic from generic locking primitives and enables future(?) archs to
transparently define it, similarly to System Z.

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bharat Bhushan <r65777@freescale.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Howells <dhowells@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Joseph Myers <joseph@codesourcery.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Qais Yousef <qais.yousef@imgtec.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Stratos Karafotis <stratosk@semaphore.gr>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vasily Kulikov <segoon@openwall.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Cc: Waiman Long <Waiman.Long@hp.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Wolfram Sang <wsa@the-dreams.de>
Cc: adi-buildroot-devel@lists.sourceforge.net
Cc: linux390@de.ibm.com
Cc: linux-alpha@vger.kernel.org
Cc: linux-am33-list@redhat.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linux-cris-kernel@axis.com
Cc: linux-hexagon@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux@lists.openrisc.net
Cc: linux-m32r-ja@ml.linux-m32r.org
Cc: linux-m32r@ml.linux-m32r.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-metag@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1404079773.2619.4.camel@buesod1.americas.hpqcorp.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-17 12:32:47 +02:00
Martin Schwidefsky 9f86745722 s390: fix restore of invalid floating-point-control
The fixup of the inline assembly to restore the floating-point-control
register needs to check for instruction address *after* the lfcp
instruction as the specification and data exceptions are suppresssing.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-07-16 10:48:12 +02:00
David Hildenbrand 6352e4d2dd KVM: s390: implement KVM_(S|G)ET_MP_STATE for user space state control
This patch
- adds s390 specific MP states to linux headers and documents them
- implements the KVM_{SET,GET}_MP_STATE ioctls
- enables KVM_CAP_MP_STATE
- allows user space to control the VCPU state on s390.

If user space sets the VCPU state using the ioctl KVM_SET_MP_STATE, we can disable
manual changing of the VCPU state and trust user space to do the right thing.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-07-10 14:11:17 +02:00
Martin Schwidefsky f8b1350560 s390/uaccess: always load the kernel ASCE after task switch
This patch fixes a problem introduced with git commit beef560b4c
"s390/uaccess: simplify control register updates".

The switch_mm function is not called if the next process is a kernel
thread without an attached mm or is a nop if the mm does not change.
But CR1 still needs to be loaded with the kernel ASCE in case the
code returns to a uaccess function that uses the secondary space mode.

In addition move the set_fs call from finish_arch_switch to
finish_arch_post_lock_switch and then remove finish_arch_switch.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-06-10 10:48:28 +02:00
Linus Torvalds b05d59dfce At over 200 commits, covering almost all supported architectures, this
was a pretty active cycle for KVM.  Changes include:
 
 - a lot of s390 changes: optimizations, support for migration,
   GDB support and more
 
 - ARM changes are pretty small: support for the PSCI 0.2 hypercall
   interface on both the guest and the host (the latter acked by Catalin)
 
 - initial POWER8 and little-endian host support
 
 - support for running u-boot on embedded POWER targets
 
 - pretty large changes to MIPS too, completing the userspace interface
   and improving the handling of virtualized timer hardware
 
 - for x86, a larger set of changes is scheduled for 3.17.  Still,
   we have a few emulator bugfixes and support for running nested
   fully-virtualized Xen guests (para-virtualized Xen guests have
   always worked).  And some optimizations too.
 
 The only missing architecture here is ia64.  It's not a coincidence
 that support for KVM on ia64 is scheduled for removal in 3.17.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJTjtlBAAoJEBvWZb6bTYbyMOUP/2NAePghE3IjG99ikHFdn+BX
 BfrURsuR6GD0AhYQnBidBmpFbAmN/LwSJxv/M7sV7OBRWLu3qbt69DrPTU2e/FK1
 j9q25peu8jRyHzJ1q9rBroo74nD9lQYuVr3uXNxxcg0DRnw14JHGlM3y8LDEknO8
 W+gpWTeAQ+2AuOX98MpRbCRMuzziCSv5bP5FhBVnsWHiZfvMbcUrbeJt+zYSiDAZ
 0tHm/5dFKzfj/vVrrnjD4EZcRr688Bs5rztG96hY6aoVJryjZGLtLp92wCWkRRmH
 CCvZwd245NmNthuKHzcs27/duSWfU0uOlu7AMrD44QYhzeDGyB/2nbCxbGqLLoBA
 nnOviXH4cC65/CnisZ79zfo979HbZcX+Lzg747EjBgCSxJmLlwgiG8yXtDvk5otB
 TH6GUeGDiEEPj//JD3XtgSz0sF2NvjREWRyemjDMvhz6JC/bLytXKb3sn+NXSj8m
 ujzF9eQoa4qKDcBL4IQYGTJ4z5nY3Pd68dHFIPHB7n82OxFLSQUBKxXw8/1fb5og
 VVb8PL4GOcmakQlAKtTMlFPmuy4bbL2r/2iV5xJiOZKmXIu8Hs1JezBE3SFAltbl
 3cAGwSM9/dDkKxUbTFblyOE9bkKbg4WYmq0LkdzsPEomb3IZWntOT25rYnX+LrBz
 bAknaZpPiOrW11Et1htY
 =j5Od
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm into next

Pull KVM updates from Paolo Bonzini:
 "At over 200 commits, covering almost all supported architectures, this
  was a pretty active cycle for KVM.  Changes include:

   - a lot of s390 changes: optimizations, support for migration, GDB
     support and more

   - ARM changes are pretty small: support for the PSCI 0.2 hypercall
     interface on both the guest and the host (the latter acked by
     Catalin)

   - initial POWER8 and little-endian host support

   - support for running u-boot on embedded POWER targets

   - pretty large changes to MIPS too, completing the userspace
     interface and improving the handling of virtualized timer hardware

   - for x86, a larger set of changes is scheduled for 3.17.  Still, we
     have a few emulator bugfixes and support for running nested
     fully-virtualized Xen guests (para-virtualized Xen guests have
     always worked).  And some optimizations too.

  The only missing architecture here is ia64.  It's not a coincidence
  that support for KVM on ia64 is scheduled for removal in 3.17"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (203 commits)
  KVM: add missing cleanup_srcu_struct
  KVM: PPC: Book3S PR: Rework SLB switching code
  KVM: PPC: Book3S PR: Use SLB entry 0
  KVM: PPC: Book3S HV: Fix machine check delivery to guest
  KVM: PPC: Book3S HV: Work around POWER8 performance monitor bugs
  KVM: PPC: Book3S HV: Make sure we don't miss dirty pages
  KVM: PPC: Book3S HV: Fix dirty map for hugepages
  KVM: PPC: Book3S HV: Put huge-page HPTEs in rmap chain for base address
  KVM: PPC: Book3S HV: Fix check for running inside guest in global_invalidates()
  KVM: PPC: Book3S: Move KVM_REG_PPC_WORT to an unused register number
  KVM: PPC: Book3S: Add ONE_REG register names that were missed
  KVM: PPC: Add CAP to indicate hcall fixes
  KVM: PPC: MPIC: Reset IRQ source private members
  KVM: PPC: Graciously fail broken LE hypercalls
  PPC: ePAPR: Fix hypercall on LE guest
  KVM: PPC: BOOK3S: Remove open coded make_dsisr in alignment handler
  KVM: PPC: BOOK3S: Always use the saved DAR value
  PPC: KVM: Make NX bit available with magic page
  KVM: PPC: Disable NX for old magic page using guests
  KVM: PPC: BOOK3S: HV: Add mixed page-size support for guest
  ...
2014-06-04 08:47:12 -07:00
Linus Torvalds c84a1e32ee Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull scheduler updates from Ingo Molnar:
 "The main scheduling related changes in this cycle were:

   - various sched/numa updates, for better performance

   - tree wide cleanup of open coded nice levels

   - nohz fix related to rq->nr_running use

   - cpuidle changes and continued consolidation to improve the
     kernel/sched/idle.c high level idle scheduling logic.  As part of
     this effort I pulled cpuidle driver changes from Rafael as well.

   - standardized idle polling amongst architectures

   - continued work on preparing better power/energy aware scheduling

   - sched/rt updates

   - misc fixlets and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits)
  sched/numa: Decay ->wakee_flips instead of zeroing
  sched/numa: Update migrate_improves/degrades_locality()
  sched/numa: Allow task switch if load imbalance improves
  sched/rt: Fix 'struct sched_dl_entity' and dl_task_time() comments, to match the current upstream code
  sched: Consolidate open coded implementations of nice level frobbing into nice_to_rlimit() and rlimit_to_nice()
  sched: Initialize rq->age_stamp on processor start
  sched, nohz: Change rq->nr_running to always use wrappers
  sched: Fix the rq->next_balance logic in rebalance_domains() and idle_balance()
  sched: Use clamp() and clamp_val() to make sys_nice() more readable
  sched: Do not zero sg->cpumask and sg->sgp->power in build_sched_groups()
  sched/numa: Fix initialization of sched_domain_topology for NUMA
  sched: Call select_idle_sibling() when not affine_sd
  sched: Simplify return logic in sched_read_attr()
  sched: Simplify return logic in sched_copy_attr()
  sched: Fix exec_start/task_hot on migrated tasks
  arm64: Remove TIF_POLLING_NRFLAG
  metag: Remove TIF_POLLING_NRFLAG
  sched/idle: Make cpuidle_idle_call() void
  sched/idle: Reflow cpuidle_idle_call()
  sched/idle: Delay clearing the polling bit
  ...
2014-06-03 14:00:15 -07:00
Linus Torvalds 776edb5931 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull core locking updates from Ingo Molnar:
 "The main changes in this cycle were:

   - reduced/streamlined smp_mb__*() interface that allows more usecases
     and makes the existing ones less buggy, especially in rarer
     architectures

   - add rwsem implementation comments

   - bump up lockdep limits"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
  rwsem: Add comments to explain the meaning of the rwsem's count field
  lockdep: Increase static allocations
  arch: Mass conversion of smp_mb__*()
  arch,doc: Convert smp_mb__*()
  arch,xtensa: Convert smp_mb__*()
  arch,x86: Convert smp_mb__*()
  arch,tile: Convert smp_mb__*()
  arch,sparc: Convert smp_mb__*()
  arch,sh: Convert smp_mb__*()
  arch,score: Convert smp_mb__*()
  arch,s390: Convert smp_mb__*()
  arch,powerpc: Convert smp_mb__*()
  arch,parisc: Convert smp_mb__*()
  arch,openrisc: Convert smp_mb__*()
  arch,mn10300: Convert smp_mb__*()
  arch,mips: Convert smp_mb__*()
  arch,metag: Convert smp_mb__*()
  arch,m68k: Convert smp_mb__*()
  arch,m32r: Convert smp_mb__*()
  arch,ia64: Convert smp_mb__*()
  ...
2014-06-03 12:57:53 -07:00
Matthew Rosato 5a5e65361f KVM: s390: Intercept the tprot instruction
Based on original patch from Jeng-fang (Nick) Wang

When standby memory is specified for a guest Linux, but no virtual memory has
been allocated on the Qemu host backing that guest, the guest memory detection
process encounters a memory access exception which is not thrown from the KVM
handle_tprot() instruction-handler function. The access exception comes from
sie64a returning EFAULT, which then passes an addressing exception to the guest.
Unfortunately this does not the proper PSW fixup (nullifying vs.
suppressing) so the guest will get a fault for the wrong address.

Let's just intercept the tprot instruction all the time to do the right thing
and not go the page fault handler path for standby memory. tprot is only used
by Linux during startup so some exits should be ok.
Without this patch, standby memory cannot be used with KVM.

Signed-off-by: Nick Wang <jfwang@us.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Tested-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-30 09:39:40 +02:00
Martin Schwidefsky 63aef00b55 s390/lowcore: replace lowcore irb array with a per-cpu variable
Remove the 96-byte irb array from the lowcore and create a per-cpu
variable instead. That way we will pick up any change in the definition
of the struct irb automatically.

Acked-By: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-05-28 10:39:16 +02:00
Christian Borntraeger 993072ee67 s390/lowcore: reserve 96 bytes for IRB in lowcore
The IRB might be 96 bytes if the extended-I/O-measurement facility is
used. This feature is currently not used by Linux, but struct irb
already has the emw defined. So let's make the irb in lowcore match the
size of the internal data structure to be future proof.
We also have to add a pad, to correctly align the paste.

The bigger irb field also circumvents a bug in some QEMU versions that
always write the emw field on test subchannel and therefore destroy the
paste definitions of this CPU. Running under these QEMU version broke
some timing functions in the VDSO and all users of these functions,
e.g. some JREs.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: stable@vger.kernel.org
2014-05-28 10:38:59 +02:00
Ingo Molnar 65c2ce7004 Linux 3.15-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJTfR2zAAoJEHm+PkMAQRiG3noH/2s+KUge3qO2M+AmxttUo74B
 +npAMdbqYR3MdEiwxYZfsHcMu4Ye/IKLcrh4pydB5hI2mdjtGkH1bnmia0f1ve/c
 Z/a0256+W8gWp7mcUBqSNztqLPAWa7wKOqNdLjj5idr1BSj6u8im+fQ9FBh2woki
 1fyYAuq/60lq4CMOKJvkA95V1Ome/jO+8tS4PguOgsCETQxCVFGurZcBbG3Mx5Y3
 v+ioCqeRc6GvxPFR6YngnTZCrsLxSRT3tnO2Qy5zX7dxjIQkCEbvIckpBQv01Y3R
 wNUaX+2Jae207igxrEv8CjmCFnmZFuUI15aWWCy6fOS/j8bjuk6ThYJO8N4ZBM0=
 =2ShG
 -----END PGP SIGNATURE-----

Merge tag 'v3.15-rc6' into sched/core, to pick up the latest fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 10:28:56 +02:00
Martin Schwidefsky bae8f56734 s390/spinlock,rwlock: always to a load-and-test first
In case a lock is contended it is better to do a load-and-test first
before trying to get the lock with compare-and-swap. This helps to avoid
unnecessary cache invalidations of the cacheline for the lock if the
CPU has to wait for the lock. For an uncontended lock doing the
compare-and-swap directly is a bit better, if the CPU does not have the
cacheline in its cache yet the compare-and-swap will get it read-write
immediately while a load-and-test would get it read-only first.

Always to the load-and-test first to avoid the cacheline invalidations
for the contended case outweight the potential read-only to read-write
cacheline upgrade for the uncontended case.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-05-20 08:58:53 +02:00
Sebastian Ott 2bf29df746 s390/cio: fix multiple structure definitions
Fix multiple definitions of struct channel_path_desc by moving it
to asm/chpid.h . Also change ccw_device_get_chp_desc to use proper
types.

Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-05-20 08:58:53 +02:00
Heiko Carstens c9ca78415a s390/uaccess: provide inline variants of get_user/put_user
This shortens the code by ~17k (performace_defconfig, march=z196).
The number of exception table entries however increases from 164
entries to 2500 entries (+~18k).
However the executed code is shorter and also faster since we save
the branches to the out-of-line copy_to/from_user implementations.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-05-20 08:58:50 +02:00
Sebastian Ott ac4995b9d5 s390/pci: add some new arch specific pci attributes
Add a bunch of s390 specific pci attributes to help
identifying pci functions.

Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-05-20 08:58:50 +02:00
Sebastian Ott 93065d045a s390/pci: use pdev->dev.groups for attribute creation
Let the driver core handle attribute creation by putting all s390
specific pci attributes in an attribute group which is referenced
by pdev->dev.groups in pcibios_add_device.

Link: https://lkml.kernel.org/r/alpine.LFD.2.11.1404141101500.1529@denkbrett
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-05-20 08:58:49 +02:00
Martin Schwidefsky d3a73acbc2 s390: split TIF bits into CIF, PIF and TIF bits
The oi and ni instructions used in entry[64].S to set and clear bits
in the thread-flags are not guaranteed to be atomic in regard to other
CPUs. Split the TIF bits into CPU, pt_regs and thread-info specific
bits. Updates on the TIF bits are done with atomic instructions,
updates on CPU and pt_regs bits are done with non-atomic instructions.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-05-20 08:58:47 +02:00
Martin Schwidefsky beef560b4c s390/uaccess: simplify control register updates
Always switch to the kernel ASCE in switch_mm. Load the secondary
space ASCE in finish_arch_post_lock_switch after checking that
any pending page table operations have completed. The primary
ASCE is loaded in entry[64].S. With this the update_primary_asce
call can be removed from the switch_to macro and from the start
of switch_mm function. Remove the load_primary argument from
update_user_asce/clear_user_asce, rename update_user_asce to
set_user_asce and rename update_primary_asce to load_kernel_asce.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-05-20 08:58:46 +02:00
Michael Holzheu f4192bf2dc s390/smp: Avoid busy loop after halt and "begin" on z/VM
Currently the smp_stop_cpu() function for SMP kernels enters a busy
loop when "begin" is entered on the z/VM console after Linux is halted.
To avoid this behavior, use the non-SMP variant of smp_stop_cpu()
which stops the CPU again after "begin" is entered. As a side
effect we now have consistent behavior for SMP and non-SMP Linux.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-05-20 08:58:45 +02:00
Randy Dunlap 0d234a2896 s390: fix new ccwgroup.h kernel-doc warning
Fix new s390 kernel-doc warning:

Warning(arch/s390/include/asm/ccwgroup.h:27): No description found for parameter 'ungroup_work'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc:	Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc:	Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Cc:	linux-s390@vger.kernel.org
Cc:	Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc:	Heiko Carstens <heiko.carstens@de.ibm.com>
Cc:	linux390@de.ibm.com
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-05-20 08:58:45 +02:00
Philipp Hachtmann 6c8cd5bbda s390/spinlock: optimize spinlock code sequence
Use lowcore constant to improve the code generated for spinlocks.

[ Martin Schwidefsky: patch breakdown and code beautification ]

Signed-off-by: Philipp Hachtmann <phacht@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-05-20 08:58:42 +02:00
Philipp Hachtmann 5b3f683e69 s390/spinlock: cleanup spinlock code
Improve the spinlock code in several aspects:
 - Have _raw_compare_and_swap return true if the operation has been
   successful instead of returning the old value.
 - Remove the "volatile" from arch_spinlock_t and arch_rwlock_t
 - Rename 'owner_cpu' to 'lock'
 - Add helper functions arch_spin_trylock_once / arch_spin_tryrelease_once

[ Martin Schwidefsky: patch breakdown and code beautification ]

Signed-off-by: Philipp Hachtmann <phacht@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-05-20 08:58:41 +02:00
Philipp Hachtmann 50be634507 s390/mm: Convert bootmem to memblock
The original bootmem allocator is getting replaced by memblock. To
cover the needs of the s390 kdump implementation the physical memory
list is used.
With this patch the bootmem allocator and its bitmaps are completely
removed from s390.

Signed-off-by: Philipp Hachtmann <phacht@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-05-20 08:58:40 +02:00
Michael Mueller fda902cb83 KVM: s390: split SIE state guest prefix field
This patch splits the SIE state guest prefix at offset 4
into a prefix bit field. Additionally it provides the
access functions:

 - kvm_s390_get_prefix()
 - kvm_s390_set_prefix()

to access the prefix per vcpu.

Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-16 14:57:31 +02:00
Michael Mueller 570126d370 s390/sclp: add sclp_get_ibc function
The patch adds functionality to retrieve the IBC configuration
by means of function sclp_get_ibc().

Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-16 14:57:30 +02:00
David Hildenbrand 4953919fee KVM: s390: interpretive execution of SIGP EXTERNAL CALL
If the sigp interpretation facility is installed, most SIGP EXTERNAL CALL
operations will be interpreted instead of intercepted. A partial execution
interception will occurr at the sending cpu only if the target cpu is in the
wait state ("W" bit in the cpuflags set). Instruction interception will only
happen in error cases (e.g. cpu addr invalid).

As a sending cpu might set the external call interrupt pending flags at the
target cpu at every point in time, we can't handle this kind of interrupt using
our kvm interrupt injection mechanism. The injection will be done automatically
by the SIE when preparing the start of the target cpu.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
CC: Thomas Huth <thuth@linux.vnet.ibm.com>
[Adopt external call injection to check for sigp interpretion]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-16 14:57:28 +02:00
Vincent Guittot 2dfd747629 sched, s390: Create a dedicated topology table
BOOK level is only relevant for s390 so we create a dedicated topology table
with BOOK level and remove it from default table.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Philipp Hachtmann <phacht@linux.vnet.ibm.com>
Cc: cmetcalf@tilera.com
Cc: benh@kernel.crashing.org
Cc: dietmar.eggemann@arm.com
Cc: preeti@linux.vnet.ibm.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: linux390@de.ibm.com
Cc: linux-s390@vger.kernel.org
Link: http://lkml.kernel.org/r/1397209481-28542-3-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-07 13:33:50 +02:00
Vincent Guittot 143e1e28cb sched: Rework sched_domain topology definition
We replace the old way to configure the scheduler topology with a new method
which enables a platform to declare additionnal level (if needed).

We still have a default topology table definition that can be used by platform
that don't want more level than the SMT, MC, CPU and NUMA ones. This table can
be overwritten by an arch which either wants to add new level where a load
balance make sense like BOOK or powergating level or wants to change the flags
configuration of some levels.

For each level, we need a function pointer that returns cpumask for each cpu,
a function pointer that returns the flags for the level and a name. Only flags
that describe topology, can be set by an architecture. The current topology
flags are:

 SD_SHARE_CPUPOWER
 SD_SHARE_PKG_RESOURCES
 SD_NUMA
 SD_ASYM_PACKING

Then, each level must be a subset on the next one. The build sequence of the
sched_domain will take care of removing useless levels like those with 1 CPU
and those with the same CPU span and no more relevant information for
load balancing than its children.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hanjun Guo <hanjun.guo@linaro.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux390@de.ibm.com
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Link: http://lkml.kernel.org/r/1397209481-28542-2-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-07 13:33:49 +02:00
Thomas Huth f14d82e06a KVM: s390: Fix external interrupt interception
The external interrupt interception can only occur in rare cases, e.g.
when the PSW of the interrupt handler has a bad value. The old handler
for this interception simply ignored these events (except for increasing
the exit_external_interrupt counter), but for proper operation we either
have to inject the interrupts manually or we should drop to userspace in
case of errors.

Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-06 14:58:10 +02:00
David Hildenbrand 8ad3575517 KVM: s390: enable IBS for single running VCPUs
This patch enables the IBS facility when a single VCPU is running.
The facility is dynamically turned on/off as soon as other VCPUs
enter/leave the stopped state.

When this facility is operating, some instructions can be executed
faster for single-cpu guests.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-29 15:01:54 +02:00
Linus Torvalds ac6c9e2bed Merge branch 'safe-dirty-tlb-flush'
This merges the patch to fix possible loss of dirty bit on munmap() or
madvice(DONTNEED).  If there are concurrent writers on other CPU's that
have the unmapped/unneeded page in their TLBs, their writes to the page
could possibly get lost if a third CPU raced with the TLB flush and did
a page_mkclean() before the page was fully written.

Admittedly, if you unmap() or madvice(DONTNEED) an area _while_ another
thread is still busy writing to it, you deserve all the lost writes you
could get.  But we kernel people hold ourselves to higher quality
standards than "crazy people deserve to lose", because, well, we've seen
people do all kinds of crazy things.

So let's get it right, just because we can, and we don't have to worry
about it.

* safe-dirty-tlb-flush:
  mm: split 'tlb_flush_mmu()' into tlb flushing and memory freeing parts
2014-04-27 15:08:12 -07:00
Linus Torvalds 1cf35d4771 mm: split 'tlb_flush_mmu()' into tlb flushing and memory freeing parts
The mmu-gather operation 'tlb_flush_mmu()' has done two things: the
actual tlb flush operation, and the batched freeing of the pages that
the TLB entries pointed at.

This splits the operation into separate phases, so that the forced
batched flushing done by zap_pte_range() can now do the actual TLB flush
while still holding the page table lock, but delay the batched freeing
of all the pages to after the lock has been dropped.

This in turn allows us to avoid a race condition between
set_page_dirty() (as called by zap_pte_range() when it finds a dirty
shared memory pte) and page_mkclean(): because we now flush all the
dirty page data from the TLB's while holding the pte lock,
page_mkclean() will be held up walking the (recently cleaned) page
tables until after the TLB entries have been flushed from all CPU's.

Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-25 16:05:40 -07:00
Christian Borntraeger 0c8c77d355 s390/ccwgroup: Fix memory corruption
commit 0b60f9ead5 (s390: use
device_remove_file_self() instead of device_schedule_callback())

caused random memory corruption on my s390 box. Turns out that the
last element of the ccwgroup structure is of dynamic size, so we
must move the newly introduced work structure _before_ the zero
length array.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
CC: Tejun Heo <tj@kernel.org>
CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
CC: Heiko Carstens <heiko.carstens@de.ibm.com>
CC: Sebastian Ott <sebott@linux.vnet.ibm.com>
CC: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-25 12:25:14 -07:00
David Hildenbrand 27291e2165 KVM: s390: hardware support for guest debugging
This patch adds support to debug the guest using the PER facility on s390.
Single-stepping, hardware breakpoints and hardware watchpoints are supported. In
order to use the PER facility of the guest without it noticing it, the control
registers of the guest have to be patched and access to them has to be
intercepted(stctl, stctg, lctl, lctlg).

All PER program interrupts have to be intercepted and only the relevant PER
interrupts for the guest have to be given back. Special care has to be taken
about repeated exits on the same hardware breakpoint. The intervention of the
host in the guests PER configuration is not fully transparent. PER instruction
nullification can not be used by the guest and too many storage alteration
events may be reported to the guest (if it is activated for special address
ranges only) when the host concurrently debugging it.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:51 +02:00
David Hildenbrand aba0750889 KVM: s390: emulate stctl and stctg
Introduce the methods to emulate the stctl and stctg instruction. Added tracing
code.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:50 +02:00
David Hildenbrand 8712836b30 KVM: s390: deliver program irq parameters and use correct ilc
When a program interrupt was to be delivered until now, no program interrupt
parameters were stored in the low-core of the target vcpu.

This patch enables the delivery of those program interrupt parameters, takes
care of concurrent PER events which can be injected in addition to any program
interrupt and uses the correct instruction length code (depending on the
interception code) for the injection of program interrupts.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:49 +02:00
David Hildenbrand 439716a5ca KVM: s390: extract irq parameters of intercepted program irqs
Whenever a program interrupt is intercepted, some parameters are stored in the
sie control block. These parameters have to be extracted in order to be
reinjected correctly. This patch also takes care of intercepted PER events which
can occurr in addition to any program interrupt.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:49 +02:00
Jens Freimann 21ee7ffd17 s390: rename and split lowcore field per_perc_atmid
per_perc_atmid is currently a two-byte field that combines two
fields, the PER code and the PER Addressing-and-Translation-Mode
Identification (ATMID)

Let's make them accessible indepently and also rename per_cause to
per_code.

Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:48 +02:00
Jens Freimann 3d53b46ce8 s390: fix name of lowcore field at offset 0xa3
According to the Principles of Operation, at offset 0xA3
in the lowcore we have the "Architectural-Mode identification",
not an "access identification".

Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:48 +02:00
Heiko Carstens 8a242234b4 KVM: s390: make use of ipte lock
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:39 +02:00
Heiko Carstens 217a440683 KVM: s390/sclp: correctly set eca siif bit
Check if siif is available before setting.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:38 +02:00
Heiko Carstens 1b0462e574 KVM: s390: add 'pgm' member to kvm_vcpu_arch and helper function
Add a 'struct kvm_s390_pgm_info pgm' member to kvm_vcpu_arch. This
structure will be used if during instruction emulation in the context
of a vcpu exception data needs to be stored somewhere.

Also add a helper function kvm_s390_inject_prog_cond() which can inject
vcpu's last exception if needed.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:37 +02:00
Heiko Carstens 5f4e87a227 s390/ctl_reg: add union type for control register 0
Add 'union ctlreg0_bits' to easily allow setting and testing bits of
control register 0 bits.
This patch only adds the bits needed for the new guest access functions.
Other bits and control registers can be added when needed.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:36 +02:00
Heiko Carstens 1365632bde s390/ptrace: add struct psw and accessor function
Introduce a 'struct psw' which makes it easier to decode and test if
certain bits in a psw are set or are not set.
In addition also add a 'psw_bits()' helper define which allows to
directly modify and test a psw_t structure. E.g.

psw_t psw;
psw_bits(psw).t = 1; /* set dat bit */

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:36 +02:00
Jens Freimann bcd846837c KVM: s390: allow injecting every kind of interrupt
Add a new data structure and function that allows to inject
all kinds of interrupt as defined in the PoP

Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:34 +02:00
Dominik Dingel b31605c12f KVM: s390: make cmma usage conditionally
When userspace reset the guest without notifying kvm, the CMMA state
of the pages might be unused, resulting in guest data corruption.
To avoid this, CMMA must be enabled only if userspace understands
the implications.

CMMA must be enabled before vCPU creation. It can't be switched off
once enabled.  All subsequently created vCPUs will be enabled for
CMMA according to the CMMA state of the VM.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[remove now unnecessary calls to page_table_reset_pgste]
2014-04-22 13:24:13 +02:00
Dominik Dingel a0bf4f149b KVM: s390/mm: new gmap_test_and_clear_dirty function
For live migration kvm needs to test and clear the dirty bit of guest pages.

That for is ptep_test_and_clear_user_dirty, to be sure we are not racing with
other code, we protect the pte. This needs to be done within
the architecture memory management code.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 09:36:27 +02:00
Martin Schwidefsky 0a61b222df KVM: s390/mm: use software dirty bit detection for user dirty tracking
Switch the user dirty bit detection used for migration from the hardware
provided host change-bit in the pgste to a fault based detection method.
This reduced the dependency of the host from the storage key to a point
where it becomes possible to enable the RCP bypass for KVM guests.

The fault based dirty detection will only indicate changes caused
by accesses via the guest address space. The hardware based method
can detect all changes, even those caused by I/O or accesses via the
kernel page table. The KVM/qemu code needs to take this into account.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 09:36:26 +02:00
Dominik Dingel 693ffc0802 KVM: s390: Don't enable skeys by default
The first invocation of storage key operations on a given cpu will be intercepted.

On these intercepts we will enable storage keys for the guest and remove the
previously added intercepts.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 09:36:26 +02:00
Dominik Dingel 934bc131ef KVM: s390: Allow skeys to be enabled for the current process
Introduce a new function s390_enable_skey(), which enables storage key
handling via setting the use_skey flag in the mmu context.

This function is only useful within the context of kvm.

Note that enabling storage keys will cause a one-time hickup when
walking the page table; however, it saves us special effort for cases
like clear reset while making it possible for us to be architecture
conform.

s390_enable_skey() takes the page table lock to prevent reseting
storage keys triggered from multiple vcpus.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 09:36:25 +02:00
Dominik Dingel d4cb11340b KVM: s390: Clear storage keys
page_table_reset_pgste() already does a complete page table walk to
reset the pgste. Enhance it to initialize the storage keys to
PAGE_DEFAULT_KEY if requested by the caller. This will be used
for lazy storage key handling. Also provide an empty stub for
!CONFIG_PGSTE

Lets adopt the current code (diag 308) to not clear the keys.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 09:36:24 +02:00
Dominik Dingel 65eef33550 KVM: s390: Adding skey bit to mmu context
For lazy storage key handling, we need a mechanism to track if the
process ever issued a storage key operation.

This patch adds the basic infrastructure for making the storage
key handling optional, but still leaves it enabled for now by default.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 09:36:23 +02:00
Peter Zijlstra 0e530747c6 arch,s390: Convert smp_mb__*()
As per the existing implementation; implement the new one using
smp_mb().

AFAICT the s390 compare-and-swap does imply a barrier, however there
are some immediate ops that seem to be singly-copy atomic and do not
imply a barrier. One such is the "ni" op (which would be
and-immediate) which is used for the constant clear_bit
implementation. Therefore s390 needs full barriers for the
{before,after} atomic ops.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-kme5dz5hcobpnufnnkh1ech2@git.kernel.org
Cc: Chen Gang <gang.chen@asianux.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux390@de.ibm.com
Cc: linux-kernel@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-18 14:20:42 +02:00
Linus Torvalds 0f689a33ad Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 patches from Martin Schwidefsky:
 "An update to the oops output with additional information about the
  crash.  The renameat2 system call is enabled.  Two patches in regard
  to the PTR_ERR_OR_ZERO cleanup.  And a bunch of bug fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/sclp_cmd: replace PTR_RET with PTR_ERR_OR_ZERO
  s390/sclp: replace PTR_RET with PTR_ERR_OR_ZERO
  s390/sclp_vt220: Fix kernel panic due to early terminal input
  s390/compat: fix typo
  s390/uaccess: fix possible register corruption in strnlen_user_srst()
  s390: add 31 bit warning message
  s390: wire up sys_renameat2
  s390: show_registers() should not map user space addresses to kernel symbols
  s390/mm: print control registers and page table walk on crash
  s390/smp: fix smp_stop_cpu() for !CONFIG_SMP
  s390: fix control register update
2014-04-16 11:28:25 -07:00
Linus Torvalds 0b747172dc Merge git://git.infradead.org/users/eparis/audit
Pull audit updates from Eric Paris.

* git://git.infradead.org/users/eparis/audit: (28 commits)
  AUDIT: make audit_is_compat depend on CONFIG_AUDIT_COMPAT_GENERIC
  audit: renumber AUDIT_FEATURE_CHANGE into the 1300 range
  audit: do not cast audit_rule_data pointers pointlesly
  AUDIT: Allow login in non-init namespaces
  audit: define audit_is_compat in kernel internal header
  kernel: Use RCU_INIT_POINTER(x, NULL) in audit.c
  sched: declare pid_alive as inline
  audit: use uapi/linux/audit.h for AUDIT_ARCH declarations
  syscall_get_arch: remove useless function arguments
  audit: remove stray newline from audit_log_execve_info() audit_panic() call
  audit: remove stray newlines from audit_log_lost messages
  audit: include subject in login records
  audit: remove superfluous new- prefix in AUDIT_LOGIN messages
  audit: allow user processes to log from another PID namespace
  audit: anchor all pid references in the initial pid namespace
  audit: convert PPIDs to the inital PID namespace.
  pid: get pid_t ppid of task in init_pid_ns
  audit: rename the misleading audit_get_context() to audit_take_context()
  audit: Add generic compat syscall support
  audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL
  ...
2014-04-12 12:38:53 -07:00
Heiko Carstens e7c46c66db s390/smp: fix smp_stop_cpu() for !CONFIG_SMP
smp_stop_cpu() should stop the current cpu even for !CONFIG_SMP.
Otherwise machine_halt() will return and and the machine generates a
panic instread of simply stopping the current cpu:

Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000000

CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 3.14.0-01527-g2b6ef16a6bc5 #10
[...]
Call Trace:
([<0000000000110db0>] show_trace+0xf8/0x158)
 [<0000000000110e7a>] show_stack+0x6a/0xe8
 [<000000000074dba8>] panic+0xe4/0x268
 [<0000000000140570>] do_exit+0xa88/0xb2c
 [<000000000016e12c>] SyS_reboot+0x1f0/0x234
 [<000000000075da70>] sysc_nr_ok+0x22/0x28
 [<000000007d5a09b4>] 0x7d5a09b4

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-04-09 10:19:12 +02:00
Linus Torvalds d586c86d50 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull second set of s390 patches from Martin Schwidefsky:
 "The second part of Heikos uaccess rework, the page table walker for
  uaccess is now a thing of the past (yay!)

  The code change to fix the theoretical TLB flush problem allows us to
  add a TLB flush optimization for zEC12, this machine has new
  instructions that allow to do CPU local TLB flushes for single pages
  and for all pages of a specific address space.

  Plus the usual bug fixing and some more cleanup"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/uaccess: rework uaccess code - fix locking issues
  s390/mm,tlb: optimize TLB flushing for zEC12
  s390/mm,tlb: safeguard against speculative TLB creation
  s390/irq: Use defines for external interruption codes
  s390/irq: Add defines for external interruption codes
  s390/sclp: add timeout for queued requests
  kvm/s390: also set guest pages back to stable on kexec/kdump
  lcs: Add missing destroy_timer_on_stack()
  s390/tape: Add missing destroy_timer_on_stack()
  s390/tape: Use del_timer_sync()
  s390/3270: fix crash with multiple reset device requests
  s390/bitops,atomic: add missing memory barriers
  s390/zcrypt: add length check for aligned data to avoid overflow in msg-type 6
2014-04-08 12:02:28 -07:00
Heiko Carstens 457f218095 s390/uaccess: rework uaccess code - fix locking issues
The current uaccess code uses a page table walk in some circumstances,
e.g. in case of the in atomic futex operations or if running on old
hardware which doesn't support the mvcos instruction.

However it turned out that the page table walk code does not correctly
lock page tables when accessing page table entries.
In other words: a different cpu may invalidate a page table entry while
the current cpu inspects the pte. This may lead to random data corruption.

Adding correct locking however isn't trivial for all uaccess operations.
Especially copy_in_user() is problematic since that requires to hold at
least two locks, but must be protected against ABBA deadlock when a
different cpu also performs a copy_in_user() operation.

So the solution is a different approach where we change address spaces:

User space runs in primary address mode, or access register mode within
vdso code, like it currently already does.

The kernel usually also runs in home space mode, however when accessing
user space the kernel switches to primary or secondary address mode if
the mvcos instruction is not available or if a compare-and-swap (futex)
instruction on a user space address is performed.
KVM however is special, since that requires the kernel to run in home
address space while implicitly accessing user space with the sie
instruction.

So we end up with:

User space:
- runs in primary or access register mode
- cr1 contains the user asce
- cr7 contains the user asce
- cr13 contains the kernel asce

Kernel space:
- runs in home space mode
- cr1 contains the user or kernel asce
  -> the kernel asce is loaded when a uaccess requires primary or
     secondary address mode
- cr7 contains the user or kernel asce, (changed with set_fs())
- cr13 contains the kernel asce

In case of uaccess the kernel changes to:
- primary space mode in case of a uaccess (copy_to_user) and uses
  e.g. the mvcp instruction to access user space. However the kernel
  will stay in home space mode if the mvcos instruction is available
- secondary space mode in case of futex atomic operations, so that the
  instructions come from primary address space and data from secondary
  space

In case of kvm the kernel runs in home space mode, but cr1 gets switched
to contain the gmap asce before the sie instruction gets executed. When
the sie instruction is finished cr1 will be switched back to contain the
user asce.

A context switch between two processes will always load the kernel asce
for the next process in cr1. So the first exit to user space is a bit
more expensive (one extra load control register instruction) than before,
however keeps the code rather simple.

In sum this means there is no need to perform any error prone page table
walks anymore when accessing user space.

The patch seems to be rather large, however it mainly removes the
the page table walk code and restores the previously deleted "standard"
uaccess code, with a couple of changes.

The uaccess without mvcos mode can be enforced with the "uaccess_primary"
kernel parameter.

Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-04-03 14:31:04 +02:00
Martin Schwidefsky 1b948d6cae s390/mm,tlb: optimize TLB flushing for zEC12
The zEC12 machines introduced the local-clearing control for the IDTE
and IPTE instruction. If the control is set only the TLB of the local
CPU is cleared of entries, either all entries of a single address space
for IDTE, or the entry for a single page-table entry for IPTE.
Without the local-clearing control the TLB flush is broadcasted to all
CPUs in the configuration, which is expensive.

The reset of the bit mask of the CPUs that need flushing after a
non-local IDTE is tricky. As TLB entries for an address space remain
in the TLB even if the address space is detached a new bit field is
required to keep track of attached CPUs vs. CPUs in the need of a
flush. After a non-local flush with IDTE the bit-field of attached CPUs
is copied to the bit-field of CPUs in need of a flush. The ordering
of operations on cpu_attach_mask, attach_count and mm_cpumask(mm) is
such that an underindication in mm_cpumask(mm) is prevented but an
overindication in mm_cpumask(mm) is possible.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-04-03 14:31:00 +02:00
Martin Schwidefsky 02a8f3abb7 s390/mm,tlb: safeguard against speculative TLB creation
The principles of operations states that the CPU is allowed to create
TLB entries for an address space anytime while an ASCE is loaded to
the control register. This is true even if the CPU is running in the
kernel and the user address space is not (actively) accessed.

In theory this can affect two aspects of the TLB flush logic.
For full-mm flushes the ASCE of the dying process is still attached.
The approach to flush first with IDTE and then just free all page
tables can in theory lead to stale TLB entries. Use the batched
free of page tables for the full-mm flushes as well.

For operations that can have a stale ASCE in the control register,
e.g. a delayed update_user_asce in switch_mm, load the kernel ASCE
to prevent invalid TLBs from being created.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-04-03 14:30:55 +02:00
Thomas Huth 1dad093b66 s390/irq: Use defines for external interruption codes
Use the new defines for external interruption codes to get rid
of "magic" numbers in the s390 source code. And while we're at it,
also rename the (un-)register_external_interrupt function to
something shorter so that this patch does not exceed the 80
columns all over the place.

Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-04-03 14:30:52 +02:00
Thomas Huth 072c279029 s390/irq: Add defines for external interruption codes
Introduce defines for external interruption codes so that we
can get rid of some "magic" numbers in the s390 source code.

Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-04-03 14:30:49 +02:00
Linus Torvalds 159d8133d0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "Usual rocket science -- mostly documentation and comment updates"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  sparse: fix comment
  doc: fix double words
  isdn: capi: fix "CAPI_VERSION" comment
  doc: DocBook: Fix typos in xml and template file
  Bluetooth: add module name for btwilink
  driver core: unexport static function create_syslog_header
  mmc: core: typo fix in printk specifier
  ARM: spear: clean up editing mistake
  net-sysfs: fix comment typo 'CONFIG_SYFS'
  doc: Insert MODULE_ in module-signing macros
  Documentation: update URL to hfsplus Technote 1150
  gpio: update path to documentation
  ixgbe: Fix format string in ixgbe_fcoe.
  Kconfig: Remove useless "default N" lines
  user_namespace.c: Remove duplicated word in comment
  CREDITS: fix formatting
  treewide: Fix typo in Documentation/DocBook
  mm: Fix warning on make htmldocs caused by slab.c
  ata: ata-samsung_cf: cleanup in header file
  idr: remove unused prototype of idr_free()
2014-04-02 16:23:38 -07:00
Linus Torvalds 7cbb39d4d4 Merge tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
 "PPC and ARM do not have much going on this time.  Most of the cool
  stuff, instead, is in s390 and (after a few releases) x86.

  ARM has some caching fixes and PPC has transactional memory support in
  guests.  MIPS has some fixes, with more probably coming in 3.16 as
  QEMU will soon get support for MIPS KVM.

  For x86 there are optimizations for debug registers, which trigger on
  some Windows games, and other important fixes for Windows guests.  We
  now expose to the guest Broadwell instruction set extensions and also
  Intel MPX.  There's also a fix/workaround for OS X guests, nested
  virtualization features (preemption timer), and a couple kvmclock
  refinements.

  For s390, the main news is asynchronous page faults, together with
  improvements to IRQs (floating irqs and adapter irqs) that speed up
  virtio devices"

* tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (96 commits)
  KVM: PPC: Book3S HV: Save/restore host PMU registers that are new in POWER8
  KVM: PPC: Book3S HV: Fix decrementer timeouts with non-zero TB offset
  KVM: PPC: Book3S HV: Don't use kvm_memslots() in real mode
  KVM: PPC: Book3S HV: Return ENODEV error rather than EIO
  KVM: PPC: Book3S: Trim top 4 bits of physical address in RTAS code
  KVM: PPC: Book3S HV: Add get/set_one_reg for new TM state
  KVM: PPC: Book3S HV: Add transactional memory support
  KVM: Specify byte order for KVM_EXIT_MMIO
  KVM: vmx: fix MPX detection
  KVM: PPC: Book3S HV: Fix KVM hang with CONFIG_KVM_XICS=n
  KVM: PPC: Book3S: Introduce hypervisor call H_GET_TCE
  KVM: PPC: Book3S HV: Fix incorrect userspace exit on ioeventfd write
  KVM: s390: clear local interrupts at cpu initial reset
  KVM: s390: Fix possible memory leak in SIGP functions
  KVM: s390: fix calculation of idle_mask array size
  KVM: s390: randomize sca address
  KVM: ioapic: reinject pending interrupts on KVM_SET_IRQCHIP
  KVM: Bump KVM_MAX_IRQ_ROUTES for s390
  KVM: s390: irq routing for adapter interrupts.
  KVM: s390: adapter interrupt sources
  ...
2014-04-02 14:50:10 -07:00
Linus Torvalds 158e0d3621 Driver core / sysfs patches for 3.15-rc1
Here's the big driver core / sysfs update for 3.15-rc1.
 
 Lots of kernfs updates to make it useful for other subsystems, and a few
 other tiny driver core patches.
 
 All have been in linux-next for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iEYEABECAAYFAlM7A0wACgkQMUfUDdst+ynJNACfZlY+KNKIhNFt1OOW8rQfSZzy
 1PYAnjYuOoly01JlPrpJD5b4TdxaAq71
 =GVUg
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core and sysfs updates from Greg KH:
 "Here's the big driver core / sysfs update for 3.15-rc1.

  Lots of kernfs updates to make it useful for other subsystems, and a
  few other tiny driver core patches.

  All have been in linux-next for a while"

* tag 'driver-core-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (42 commits)
  Revert "sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner()"
  kernfs: cache atomic_write_len in kernfs_open_file
  numa: fix NULL pointer access and memory leak in unregister_one_node()
  Revert "driver core: synchronize device shutdown"
  kernfs: fix off by one error.
  kernfs: remove duplicate dir.c at the top dir
  x86: align x86 arch with generic CPU modalias handling
  cpu: add generic support for CPU feature based module autoloading
  sysfs: create bin_attributes under the requested group
  driver core: unexport static function create_syslog_header
  firmware: use power efficient workqueue for unloading and aborting fw load
  firmware: give a protection when map page failed
  firmware: google memconsole driver fixes
  firmware: fix google/gsmi duplicate efivars_sysfs_init()
  drivers/base: delete non-required instances of include <linux/init.h>
  kernfs: fix kernfs_node_from_dentry()
  ACPI / platform: drop redundant ACPI_HANDLE check
  kernfs: fix hash calculation in kernfs_rename_ns()
  kernfs: add CONFIG_KERNFS
  sysfs, kobject: add sysfs wrapper for kernfs_enable_ns()
  ...
2014-04-01 16:28:19 -07:00
Heiko Carstens 0ccc8b7ac8 s390/bitops,atomic: add missing memory barriers
When reworking the bitops and atomic ops I missed that those instructions
that got atomic behaviour only perform a "specific-operand-serialization"
instead of a full "serialization".
The compare-and-swap instruction used before performs a full serialization
before and after the instruction is executed, which means it has full
memory barrier semantics.
In order to give the new bitops and atomic ops functions also full memory
barrier semantics add a "bcr 14,0" before and after each of those new
instructions which performs full serialization as well.

This restores memory barrier semantics for bitops and atomic ops functions
which return values, like e.g. atomic_add_return(), but not for functions
which do not return a value, like e.g. atomic_add().
This is consistent to other architectures and what common code requires.

Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-04-01 09:23:35 +02:00
Linus Torvalds 1f8c538ed6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
 "There are two memory management related changes, the CMMA support for
  KVM to avoid swap-in of freed pages and the split page table lock for
  the PMD level.  These two come with common code changes in mm/.

  A fix for the long standing theoretical TLB flush problem, this one
  comes with a common code change in kernel/sched/.

  Another set of changes is Heikos uaccess work, included is the initial
  set of patches with more to come.

  And fixes and cleanups as usual"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (36 commits)
  s390/con3270: optionally disable auto update
  s390/mm: remove unecessary parameter from pgste_ipte_notify
  s390/mm: remove unnecessary parameter from gmap_do_ipte_notify
  s390/mm: fixing comment so that parameter name match
  s390/smp: limit number of cpus in possible cpu mask
  hypfs: Add clarification for "weight_min" attribute
  s390: update defconfigs
  s390/ptrace: add support for PTRACE_SINGLEBLOCK
  s390/perf: make print_debug_cf() static
  s390/topology: Remove call to update_cpu_masks()
  s390/compat: remove compat exec domain
  s390: select CONFIG_TTY for use of tty in unconditional keyboard driver
  s390/appldata_os: fix cpu array size calculation
  s390/checksum: remove memset() within csum_partial_copy_from_user()
  s390/uaccess: remove copy_from_user_real()
  s390/sclp_early: Return correct HSA block count also for zero
  s390: add some drivers/subsystems to the MAINTAINERS file
  s390: improve debug feature usage
  s390/airq: add support for irq ranges
  s390/mm: enable split page table lock for PMD level
  ...
2014-03-31 14:35:30 -07:00
Linus Torvalds 190f918660 Merge branch 'compat' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 compat wrapper rework from Heiko Carstens:
 "S390 compat system call wrapper simplification work.

  The intention of this work is to get rid of all hand written assembly
  compat system call wrappers on s390, which perform proper sign or zero
  extension, or pointer conversion of compat system call parameters.
  Instead all of this should be done with C code eg by using Al's
  COMPAT_SYSCALL_DEFINEx() macro.

  Therefore all common code and s390 specific compat system calls have
  been converted to the COMPAT_SYSCALL_DEFINEx() macro.

  In order to generate correct code all compat system calls may only
  have eg compat_ulong_t parameters, but no unsigned long parameters.
  Those patches which change parameter types from unsigned long to
  compat_ulong_t parameters are separate in this series, but shouldn't
  cause any harm.

  The only compat system calls which intentionally have 64 bit
  parameters (preadv64 and pwritev64) in support of the x86/32 ABI
  haven't been changed, but are now only available if an architecture
  defines __ARCH_WANT_COMPAT_SYS_PREADV64/PWRITEV64.

  System calls which do not have a compat variant but still need proper
  zero extension on s390, like eg "long sys_brk(unsigned long brk)" will
  get a proper wrapper function with the new s390 specific
  COMPAT_SYSCALL_WRAPx() macro:

     COMPAT_SYSCALL_WRAP1(brk, unsigned long, brk);

  which generates the following code (simplified):

     asmlinkage long sys_brk(unsigned long brk);
     asmlinkage long compat_sys_brk(long brk)
     {
         return sys_brk((u32)brk);
     }

  Given that the C file which contains all the COMPAT_SYSCALL_WRAP lines
  includes both linux/syscall.h and linux/compat.h, it will generate
  build errors, if the declaration of sys_brk() doesn't match, or if
  there exists a non-matching compat_sys_brk() declaration.

  In addition this will intentionally result in a link error if
  somewhere else a compat_sys_brk() function exists, which probably
  should have been used instead.  Two more BUILD_BUG_ONs make sure the
  size and type of each compat syscall parameter can be handled
  correctly with the s390 specific macros.

  I converted the compat system calls step by step to verify the
  generated code is correct and matches the previous code.  In fact it
  did not always match, however that was always a bug in the hand
  written asm code.

  In result we get less code, less bugs, and much more sanity checking"

* 'compat' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (44 commits)
  s390/compat: add copyright statement
  compat: include linux/unistd.h within linux/compat.h
  s390/compat: get rid of compat wrapper assembly code
  s390/compat: build error for large compat syscall args
  mm/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
  kexec/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
  net/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
  ipc/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
  fs/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
  ipc/compat: convert to COMPAT_SYSCALL_DEFINE
  fs/compat: convert to COMPAT_SYSCALL_DEFINE
  security/compat: convert to COMPAT_SYSCALL_DEFINE
  mm/compat: convert to COMPAT_SYSCALL_DEFINE
  net/compat: convert to COMPAT_SYSCALL_DEFINE
  kernel/compat: convert to COMPAT_SYSCALL_DEFINE
  fs/compat: optional preadv64/pwrite64 compat system calls
  ipc/compat_sys_msgrcv: change msgtyp type from long to compat_long_t
  s390/compat: partial parameter conversion within syscall wrappers
  s390/compat: automatic zero, sign and pointer conversion of syscalls
  s390/compat: add sync_file_range and fallocate compat syscalls
  ...
2014-03-31 14:32:17 -07:00
Paolo Bonzini f7b9ddb8a5 3 fixes
- memory leak on certain SIGP conditions
 - wrong size for idle bitmap (always too big)
 - clear local interrupts on initial CPU reset
 
 1 performance improvement
 - improve performance with many guests on certain workloads
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJTMXvEAAoJEBF7vIC1phx8ii8P/2XI/aXFlhkITD/79rghPbjo
 sE76o5Rlz4UzM8khgEW/4kajehww/1/hO68ojcy1xUE1eMHnjk4sF7c5ozbKX7Qw
 a411B/IrkDEz3aVFLx8KXu2qSdCs22WDgyYFUR4kOBjStP+TvN7KH1NYw84CO+pA
 7Mf+DAxF26X83q6nNvfnEq4cjrQEGmB0aRLkiT4PmHjs4WL+iimJmXeVTewhCtKm
 rE3N9AjpaZpqUQANXvdTJc5Cap/RbVMJf05EbIg5LrsEN+9xQHHRpkkjSRw+7XLx
 iY1uxgA8a92dGWY1RCSZe3lLS0ibg6LWH05hlhYOOCe1DBU0KVQJ5Txixu/ekm5j
 gv2BPpnquF+R2uYptTmF8Xq7TeP15kc3JjahrE8tZ16RH5dpZ7fn6fK/f3590JYB
 4rt0xt5diQwaFRDcgT+8zLGvIq8DZ4ZH6KNElXli8megdY1hiOIlkb1R7Hq/RIt1
 eacX2mycZAlf2ZUp3lVTHYxPL43WH2Qf2s4Y4mHlEmH8LQGGiFOl8Ne0Wgoeha9O
 JVvoHxTEqvzhWTgUi6n8cTlUsYvq6ICXhwCOPM5HLgbxCgbPbEv5EMOZxGAQJ0FK
 fQXasKpjxzPYyJ6XS8xeNel4yFQ+j8G1rvN4Q8kMLY4fjAj5sfm0WRrFw/gCb2ds
 ISfe6UOnoV9scrXvAMyH
 =7smd
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-20140325' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next

3 fixes
- memory leak on certain SIGP conditions
- wrong size for idle bitmap (always too big)
- clear local interrupts on initial CPU reset

1 performance improvement
- improve performance with many guests on certain workloads
2014-03-25 15:44:06 +01:00
Jens Freimann 609433fbed KVM: s390: fix calculation of idle_mask array size
We need BITS_TO_LONGS, not sizeof(long) to calculate
the correct size.

idle_mask is a bitmask, each bit representing the state
of a cpu. The desired outcome is an array of unsigned long
fields that can fit KVM_MAX_VCPUS bits. We should not use
sizeof(long) which returnes the size in bytes, but BITS_TO_LONGS

Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-03-25 13:27:11 +01:00
Cornelia Huck 8422359877 KVM: s390: irq routing for adapter interrupts.
Introduce a new interrupt class for s390 adapter interrupts and enable
irqfds for s390.

This is depending on a new s390 specific vm capability, KVM_CAP_S390_IRQCHIP,
that needs to be enabled by userspace.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2014-03-21 13:43:00 +01:00
Cornelia Huck 841b91c584 KVM: s390: adapter interrupt sources
Add a new interface to register/deregister sources of adapter interrupts
identified by an unique id via the flic. Adapters may also be maskable
and carry a list of pinned pages.

These adapters will be used by irq routing later.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2014-03-21 13:42:49 +01:00
Dominik Dingel 6e5a40a49f s390/mm: remove unecessary parameter from pgste_ipte_notify
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-03-21 09:58:01 +01:00
Dominik Dingel aaeff84a2d s390/mm: remove unnecessary parameter from gmap_do_ipte_notify
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-03-21 09:57:58 +01:00
Eric Paris 579ec9e1ab audit: use uapi/linux/audit.h for AUDIT_ARCH declarations
The syscall.h headers were including linux/audit.h but really only
needed the uapi/linux/audit.h to get the requisite defines.  Switch to
the uapi headers.

Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@linux-mips.org
Cc: linux-s390@vger.kernel.org
Cc: x86@kernel.org
2014-03-20 10:11:59 -04:00
Eric Paris 5e937a9ae9 syscall_get_arch: remove useless function arguments
Every caller of syscall_get_arch() uses current for the task and no
implementors of the function need args.  So just get rid of both of
those things.  Admittedly, since these are inline functions we aren't
wasting stack space, but it just makes the prototypes better.

Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@linux-mips.org
Cc: linux390@de.ibm.com
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-arch@vger.kernel.org
2014-03-20 10:11:59 -04:00
Heiko Carstens cf813db0b4 s390/smp: limit number of cpus in possible cpu mask
Limit the number of bits to the maximum number of cpus a machine
can have.
possible_cpu_mask typically will have more bits set than a machine
may physically have. This results in wasted memory during per-cpu
memory allocations, if the possible mask contains more cpus than
physically possible for a given configuration.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-03-17 15:53:06 +01:00
Martin Schwidefsky 818a330c4e s390/ptrace: add support for PTRACE_SINGLEBLOCK
The PTRACE_SINGLEBLOCK option is used to get control whenever
the inferior has executed a successful branch. The PER option to
implement block stepping is successful-branching event, bit 32
in the PER-event mask.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-03-14 12:59:38 +01:00
Heiko Carstens 9a205286bc s390/compat: build error for large compat syscall args
Enforce 32 bit types for all compat syscall argument types.

This way we can make sure that all arguments get correct sign
or zero extension. Otherwise incorrect code would be generated.

E.g. for a 'long' type the COMPAT_SYSCALL_DEFINE macro wouldn't
generate code that would cause sign extension of the passed in 32
bit user space parameter.
This can cause quite subtle bugs like e.g. the one that was fixed
with dfd948e32a "fs/compat: fix parameter handling for compat
readv/writev syscalls".

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-06 16:30:47 +01:00
Heiko Carstens 932602e238 fs/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
Some fs compat system calls have unsigned long parameters instead of
compat_ulong_t.
In order to allow the COMPAT_SYSCALL_DEFINE macro generate code that
performs proper zero and sign extension convert all 64 bit parameters
their corresponding 32 bit counterparts.

compat_sys_io_getevents() is a bit different: the non-compat version
has signed parameters for the "min_nr" and "nr" parameters while the
compat version has unsigned parameters.
So change this as well. For all practical purposes this shouldn't make
any difference (doesn't fix a real bug).
Also introduce a generic compat_aio_context_t type which can be used
everywhere.
The access_ok() check within compat_sys_io_getevents() got also removed
since the non-compat sys_io_getevents() should be able to handle
everything anyway.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-06 16:30:44 +01:00
Cornelia Huck 96b14536d9 virtio-ccw: virtio-ccw adapter interrupt support.
Implement the new CCW_CMD_SET_IND_ADAPTER command and try to enable
adapter interrupts for every device on the first startup. If the host
does not support adapter interrupts, fall back to normal I/O interrupts.

virtio-ccw adapter interrupts use the same isc as normal I/O subchannels
and share a summary indicator for all devices sharing the same indicator
area.

Indicator bits for the individual virtqueues may be contained in the same
indicator area for different devices.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-03-04 10:41:04 +01:00
Martin Schwidefsky 84ec96a615 s390/airq: add support for irq ranges
Add airq_iv_alloc and airq_iv_free to allocate and free consecutive
ranges of irqs from the interrupt vector.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-03-04 10:41:04 +01:00
Jens Freimann 1ee0bc559d KVM: s390: get rid of local_int array
We can use kvm_get_vcpu() now and don't need the
local_int array in the floating_int struct anymore.
This also means we don't have to hold the float_int.lock
in some places.

Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-03-04 10:41:03 +01:00
Christian Borntraeger 672550fb68 KVM: s390: Provide access to program parameter
commit d208c79d63 (KVM: s390: Enable
the LPP facility for guests) enabled the LPP instruction for guests.
We should expose the program parameter as a pseudo register for
migration/reset etc. Lets also reset this value on initial CPU
reset.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
2014-03-04 10:41:01 +01:00
Heiko Carstens 9c4d62fab4 s390/compat: convert system call wrappers to C part 11
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-04 09:05:44 +01:00
Heiko Carstens d72d2bb5a6 s390/checksum: remove memset() within csum_partial_copy_from_user()
The memset() within csum_partial_copy_from_user() is rather pointless since
copy_from_user() already cleared the rest of the destination buffer if an
exception happened.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-02-24 17:14:08 +01:00
Heiko Carstens 823002023d s390/uaccess: remove copy_from_user_real()
There is no user left, so remove it.
It was also potentially broken, since the function didn't clear destination
memory if copy_from_user() failed. Which would allow for information leaks.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-02-24 17:14:00 +01:00
Martin Schwidefsky fe7c30a420 s390/airq: add support for irq ranges
Add airq_iv_alloc and airq_iv_free to allocate and free consecutive
ranges of irqs from the interrupt vector.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-02-21 08:50:22 +01:00
Martin Schwidefsky ec66ad66a0 s390/mm: enable split page table lock for PMD level
Add the pgtable_pmd_page_ctor/pgtable_pmd_page_dtor calls to the pmd
allocation and free functions and enable ARCH_ENABLE_SPLIT_PMD_PTLOCK
for 64 bit.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-02-21 08:50:22 +01:00
Heiko Carstens db85eaeb52 s390/bitops: fix comment
Fix some numbers in the comments describing the layout of the bit maps.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-02-21 08:50:20 +01:00