This patch introduces the new function memcpy_absolute() that allows to
copy memory using absolute addressing. This means that the prefix swap
does not apply when this function is used.
With this patch also all s390 kernel code that accesses absolute zero
now uses the new memcpy_absolute() function. The old and less generic
copy_to_absolute_zero() function is removed.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Replace __s390x__ with CONFIG_64BIT in all places that are not exported
to userspace or guarded with #ifdef __KERNEL__.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If the task that was found on an initial interrupt doesn't match the
current task execute a WARN_ON_ONCE() and don't put the task to sleep.
When this happened something went wrong between the interface of the
hypervisor and the kernel. In such a case keep the tasks alive to
avoid a hanging system.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Use __set_task_state() instead of set_task_state(). Saves a couple of
instructions, since the memory barrier is not needed here.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Make the code a bit more symmetric and always search for the task of the
reported pid. This simplifies the code a bit.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When setting the current task state to TASK_UNINTERRUPTIBLE this can
race with a different cpu. The other cpu could set the task state after
it inspected it (while it was still TASK_RUNNING) to TASK_RUNNING which
would change the state from TASK_UNINTERRUPTIBLE to TASK_RUNNING again.
This race was always present in the pfault interrupt code but didn't
cause anything harmful before commit f2db2e6c "[S390] pfault: cpu hotplug
vs missing completion interrupts" which relied on the fact that after
setting the task state to TASK_UNINTERRUPTIBLE the task would really
sleep.
Since this is not necessarily the case the result may be a list corruption
of the pfault_list or, as observed, a use-after-free bug while trying to
access the task_struct of a task which terminated itself already.
To fix this, we need to get a reference of the affected task when receiving
the initial pfault interrupt and add special handling if we receive yet
another initial pfault interrupt when the task is already enqueued in the
pfault list.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: <stable@vger.kernel.org> # needed for v3.0 and newer
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
commit c3f0327f8e
mm: add rss counters consistency check
detected the following problem with kvm on s390:
BUG: Bad rss-counter state mm:00000004f73ef000 idx:0 val:-10
BUG: Bad rss-counter state mm:00000004f73ef000 idx:1 val:-5
We have to make sure that we accumulate all rss values into
the mm before we replace the mm to avoid triggering this (harmless)
bug message.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The software large page emulation on s390 did not clear the the
pre-allocated page table in arch_release_hugepage() before freeing
it. This could trigger the WARN_ON(!pte_none(*pte) in mm/vmalloc.c:106
and make vmap_pte_range() fail, because the page table could be reused
in page_table_alloc(). This is fixed now by calling clear_table()
before page_table_free().
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Replace the check for TIF_SIE in the fault handler by a check for PF_VCPU.
With the last user of TIF_SIE gone we can now remove the bit.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Currently dev/mem for s390 provides only real memory access. This means
that the CPU prefix pages are swapped. The prefix swap for real memory
works as follows:
Each CPU owns a prefix register that points to a page aligned memory
location "P". If this CPU accesses the address range [0,0x1fff], it is
translated by the hardware to [P,P+0x1fff]. Accordingly if this CPU
accesses the address range [P,P+0x1fff], it is translated by the hardware
to [0,0x1fff]. Therefore, if [P,P+0x1fff] or [0,0x1fff] is read from
the current /dev/mem device, the incorrectly swapped memory content is
returned.
With this patch the /dev/mem architecture code is modified to provide
absolute memory access. This is done via the arch specific functions
xlate_dev_mem_ptr() and unxlate_dev_mem_ptr(). For swapped pages on
s390 the function xlate_dev_mem_ptr() now returns a new buffer with a
copy of the requested absolute memory. In case the buffer was allocated,
the unxlate_dev_mem_ptr() function frees it after /dev/mem code has
called copy_to_user().
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Git commit 36409f6353 "use generic RCU
page-table freeing code" introduced a tlb flushing bug. Partially revert
the above git commit and go back to s390 specific page table flush code.
For s390 the TLB can contain three types of entries, "normal" TLB
page-table entries, TLB combined region-and-segment-table (CRST) entries
and real-space entries. Linux does not use real-space entries which
leaves normal TLB entries and CRST entries. The CRST entries are
intermediate steps in the page-table translation called translation paths.
For example a 4K page access in a three-level page table setup will
create two CRST TLB entries and one page-table TLB entry. The advantage
of that approach is that a page access next to the previous one can reuse
the CRST entries and needs just a single read from memory to create the
page-table TLB entry. The disadvantage is that the TLB flushing rules are
more complicated, before any page-table may be freed the TLB needs to be
flushed.
In short: the generic RCU page-table freeing code is incorrect for the
CRST entries, in particular the check for mm_users < 2 is troublesome.
This is applicable to 3.0+ kernels.
Cc: <stable@vger.kernel.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Currently in the memcpy_real() function interrupts are disabled with
__arch_local_irq_stnsm(). In order to notify lockdep that interrupts
are disabled, with this patch local_irq_save() is used instead. The
function __arch_local_irq_stnsm() is still used for switching to
real mode.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIVAwUAT3NKzROxKuMESys7AQKElw/+JyDxJSlj+g+nymkx8IVVuU8CsEwNLgRk
8KEnRfLhGtkXFLSJYWO6jzGo16F8Uqli1PdMFte/wagSv0285/HZaKlkkBVHdJ/m
u40oSjgT013bBh6MQ0Oaf8pFezFUiQB5zPOA9QGaLVGDLXCmgqUgd7exaD5wRIwB
ZmyItjZeAVnDfk1R+ZiNYytHAi8A5wSB+eFDCIQYgyulA1Igd1UnRtx+dRKbvc/m
rWQ6KWbZHIdvP1ksd8wHHkrlUD2pEeJ8glJLsZUhMm/5oMf/8RmOCvmo8rvE/qwl
eDQ1h4cGYlfjobxXZMHqAN9m7Jg2bI946HZjdb7/7oCeO6VW3FwPZ/Ic75p+wp45
HXJTItufERYk6QxShiOKvA+QexnYwY0IT5oRP4DrhdVB/X9cl2MoaZHC+RbYLQy+
/5VNZKi38iK4F9AbFamS7kd0i5QszA/ZzEzKZ6VMuOp3W/fagpn4ZJT1LIA3m4A9
Q0cj24mqeyCfjysu0TMbPtaN+Yjeu1o1OFRvM8XffbZsp5bNzuTDEvviJ2NXw4vK
4qUHulhYSEWcu9YgAZXvEWDEM78FXCkg2v/CrZXH5tyc95kUkMPcgG+QZBB5wElR
FaOKpiC/BuNIGEf02IZQ4nfDxE90QwnDeoYeV+FvNj9UEOopJ5z5bMPoTHxm4cCD
NypQthI85pc=
=G9mT
-----END PGP SIGNATURE-----
Merge tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system
Pull "Disintegrate and delete asm/system.h" from David Howells:
"Here are a bunch of patches to disintegrate asm/system.h into a set of
separate bits to relieve the problem of circular inclusion
dependencies.
I've built all the working defconfigs from all the arches that I can
and made sure that they don't break.
The reason for these patches is that I recently encountered a circular
dependency problem that came about when I produced some patches to
optimise get_order() by rewriting it to use ilog2().
This uses bitops - and on the SH arch asm/bitops.h drags in
asm-generic/get_order.h by a circuituous route involving asm/system.h.
The main difficulty seems to be asm/system.h. It holds a number of
low level bits with no/few dependencies that are commonly used (eg.
memory barriers) and a number of bits with more dependencies that
aren't used in many places (eg. switch_to()).
These patches break asm/system.h up into the following core pieces:
(1) asm/barrier.h
Move memory barriers here. This already done for MIPS and Alpha.
(2) asm/switch_to.h
Move switch_to() and related stuff here.
(3) asm/exec.h
Move arch_align_stack() here. Other process execution related bits
could perhaps go here from asm/processor.h.
(4) asm/cmpxchg.h
Move xchg() and cmpxchg() here as they're full word atomic ops and
frequently used by atomic_xchg() and atomic_cmpxchg().
(5) asm/bug.h
Move die() and related bits.
(6) asm/auxvec.h
Move AT_VECTOR_SIZE_ARCH here.
Other arch headers are created as needed on a per-arch basis."
Fixed up some conflicts from other header file cleanups and moving code
around that has happened in the meantime, so David's testing is somewhat
weakened by that. We'll find out anything that got broken and fix it..
* tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits)
Delete all instances of asm/system.h
Remove all #inclusions of asm/system.h
Add #includes needed to permit the removal of asm/system.h
Move all declarations of free_initmem() to linux/mm.h
Disintegrate asm/system.h for OpenRISC
Split arch_align_stack() out from asm-generic/system.h
Split the switch_to() wrapper out of asm-generic/system.h
Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h
Create asm-generic/barrier.h
Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h
Disintegrate asm/system.h for Xtensa
Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt]
Disintegrate asm/system.h for Tile
Disintegrate asm/system.h for Sparc
Disintegrate asm/system.h for SH
Disintegrate asm/system.h for Score
Disintegrate asm/system.h for S390
Disintegrate asm/system.h for PowerPC
Disintegrate asm/system.h for PA-RISC
Disintegrate asm/system.h for MN10300
...
This function is defined for use in exec, not in modules.
No other architecture exports its implementation.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The external interrupt handlers have a parameter called ext_int_code.
Besides the name this paramter does not only contain the ext_int_code
but in addition also the "cpu address" (POP) which caused the external
interrupt.
To make the code a bit more obvious pass a struct instead so the called
function can easily distinguish between external interrupt code and
cpu address. The cpu address field however is named "subcode" since
some external interrupt sources do not pass a cpu address but a
different parameter (or none at all).
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Pull s390 fixes from Martin Schwidefsky
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
[S390] memory hotplug: prevent memory zone interleave
[S390] crash_dump: remove duplicate include
[S390] KEYS: Enable the compat keyctl wrapper on s390x
The new is_compat_task() define for the !COMPAT case in
include/linux/compat.h conflicts with a similar define in
arch/s390/include/asm/compat.h.
This is the minimal patch which fixes the build issues.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes a kernel oops with CONFIG_DEBUG_VM triggered by a
VM_BUG_ON(bad_range()): kernel BUG at mm/page_alloc.c:748.
With memory hotplug on System z, it is possible that the memory
online/offline state is preserved over a system restart, e.g. there
may be offline memory blocks in ZONE_DMA or ZONE_NORMAL. So far,
the offline memory range has always been added to ZONE_MOVABLE during
system start, so that it was possible to have ZONE_MOVABLE interleave
with ZONE_DMA or ZONE_NORMAL. This patch fixes that by checking for
zone overlap before adding memory.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The page_table_free_pgste function is used for kvm processes to free page
tables that have the pgste extension. It calls pgtable_page_ctor instead of
pgtable_page_dtor which increases NR_PAGETABLE instead of decreasing it.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Move the program interruption code and the translation exception identifier
to the pt_regs structure as 'int_code' and 'int_parm_long' and make the
first level interrupt handler in entry[64].S store the two values. That
makes it possible to drop 'prot_addr' and 'trap_no' from the thread_struct
and to reduce the number of arguments to a lot of functions. Finally
un-inline do_trap. Overall this saves 5812 bytes in the .text section of
the 64 bit kernel.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch disables the check for MACHINE_IS_VM when initializing the
pfault infrastructure. The code checks for successful completion of
diag 258 anyway, thus it's safe to try initialization on LPAR anyway.
This is needed to use pfault on kvm
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The kernel address space of a 64 bit kernel currently uses a three level
page table and the vmemmap array has a fixed address and a fixed maximum
size. A three level page table is good enough for systems with less than
3.8TB of memory, for bigger systems four page table levels need to be
used. Each page table level costs a bit of performance, use 3 levels for
normal systems and 4 levels only for the really big systems.
To avoid bloating sparse.o too much set MAX_PHYSMEM_BITS to 46 for a
maximum of 64TB of memory.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Ignore completion interrupts if the initial interrupt hasn't been
received and the addressed task is not running. This case can only
happen if leftover (pending) completion interrupt gets delivered
which wasn't removed with the PFAULT CANCEL operation during cpu
hotplug.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
Revert "tracing: Include module.h in define_trace.h"
irq: don't put module.h into irq.h for tracking irqgen modules.
bluetooth: macroize two small inlines to avoid module.h
ip_vs.h: fix implicit use of module_get/module_put from module.h
nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
include: replace linux/module.h with "struct module" wherever possible
include: convert various register fcns to macros to avoid include chaining
crypto.h: remove unused crypto_tfm_alg_modname() inline
uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
pm_runtime.h: explicitly requires notifier.h
linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
miscdevice.h: fix up implicit use of lists and types
stop_machine.h: fix implicit use of smp.h for smp_processor_id
of: fix implicit use of errno.h in include/linux/of.h
of_platform.h: delete needless include <linux/module.h>
acpi: remove module.h include from platform/aclinux.h
miscdevice.h: delete unnecessary inclusion of module.h
device_cgroup.h: delete needless include <linux/module.h>
net: sch_generic remove redundant use of <linux/module.h>
net: inet_timewait_sock doesnt need <linux/module.h>
...
Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
- drivers/media/dvb/frontends/dibx000_common.c
- drivers/media/video/{mt9m111.c,ov6650.c}
- drivers/mfd/ab3550-core.c
- include/linux/dmaengine.h
This avoids duplicating the function in every arch gup_fast.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
s390 didn't return 0 in that case, if it's rolling back the *nr pointer it
should also return zero to avoid adding pages to the array at the wrong
offset.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Up to this point the code assumed old refcounting for hugepages (pre-thp).
This updates the code directly to the thp mapcount tail page refcounting.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix several compile errors on s390 caused by splitting module.h.
Some include additions [e.g. qdio_setup.c, zfcp_qdio.c] are in
anticipation of pending changes queued for s390 that increase
the modular use footprint.
[PG: added additional obvious changes since Heiko's original patch]
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Linux on System z uses a ballooner based on diagnose 0x10. (aka as
collaborative memory management). This patch implements diagnose
0x10 on the guest address space.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
gmap_fault needs to walk the guest page table. However, parts of
that may change if some other thread does munmap. In that case
gmap_unmap_notifier will also unmap the corresponding parts from
the guest page table. We need to take mmap_sem in order to serialize
these operations.
do_exception now calls __gmap_fault with mmap_sem held which does
not get exported to modules. The exported function, which is called
from KVM, now takes mmap_sem.
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This introduces locking via mm->page_table_lock to protect
the rmap list for guest mappings from being corrupted by concurrent
operations.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fix possible deadlock reported by lockdep:
qemu-system-s39/2963 is trying to acquire lock:
(&mm->mmap_sem){++++++}, at: gmap_alloc_table+0x9c/0x120
but task is already holding lock:
(&mm->mmap_sem){++++++}, at: gmap_map_segment+0xa6/0x27c
Actually gmap_alloc_table is the only called in gmap_map_segment with
mmap_sem held, thus it's safe to simply remove the inner lock.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Split out addressing mode bits from PSW_BASE_BITS, rename PSW_BASE_BITS
to PSW_MASK_BASE, get rid of psw_user32_bits, remove unused function
enabled_wait(), introduce PSW_MASK_USER, and drop PSW_MASK_MERGE macros.
Change psw_kernel_bits / psw_user_bits to contain only the bits that
are always set in the respective mode.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
An instruction with an address right below the adress limit for the
current addressing mode will wrap. The instruction restart logic in
the protection fault handler and the signal code need to follow the
wrapping rules to find the correct instruction address.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch provides the architecture specific part of the s390 kdump
support.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add access function for real memory needed by s390 kdump backend.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The rcu page table free code uses a couple of bits in the page table
pointer passed to tlb_remove_table to discern the different page table
types. __tlb_remove_table extracts the type with an incorrect mask which
leads to memory leaks. The correct mask is ((FRAG_MASK << 4) | FRAG_MASK).
Cc: stable@kernel.org
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If gmap_unmap_segment figures that the segment was not mapped in the
first place, it need to up mmap_sem on exit.
Cc: <stable@kernel.org>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
598841ca99 ([S390] use gmap address
spaces for kvm guest images) changed kvm to use a separate address
space for kvm guests. This address space was switched in __vcpu_run
In some cases (preemption, page fault) there is the possibility that
this address space switch is lost.
The typical symptom was a huge amount of validity intercepts or
random guest addressing exceptions.
Fix this by doing the switch in sie_loop and sie_exit and saving the
address space in the gmap structure itself. Also use the preempt
notifier.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
With this patch a new S390 shutdown trigger "restart" is added. If under
z/VM "systerm restart" is entered or under the HMC the "PSW restart" button
is pressed, the PSW located at 0 (31 bit) or 0x1a0 (64 bit) bit is loaded.
Now we execute do_restart() that processes the restart action that is
defined under /sys/firmware/shutdown_actions/on_restart. Currently the
following actions are possible: reipl (default), stop, vmcmd, dump, and
dump_reipl.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Fix the following compile warning for !CONFIG_PGSTE:
CC arch/s390/mm/pgtable.o
arch/s390/mm/pgtable.c: In function ‘page_table_alloc_pgste’:
arch/s390/mm/pgtable.c:531:1: warning: no return statement in function returning non-void [-Wreturn-type]
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Add code that allows KVM to control the virtual memory layout that
is seen by a guest. The guest address space uses a second page table
that shares the last level pte-tables with the process page table.
If a page is unmapped from the process page table it is automatically
unmapped from the guest page table as well.
The guest address space mapping starts out empty, KVM can map any
individual 1MB segments from the process virtual memory to any 1MB
aligned location in the guest virtual memory. If a target segment in
the process virtual memory does not exist or is unmapped while a
guest mapping exists the desired target address is stored as an
invalid segment table entry in the guest page table.
The population of the guest page table is fault driven.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The nmi parameter indicated if we could do wakeups from the current
context, if not, we would set some state and self-IPI and let the
resulting interrupt do the wakeup.
For the various event classes:
- hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from
the PMI-tail (ARM etc.)
- tracepoint: nmi=0; since tracepoint could be from NMI context.
- software: nmi=[0,1]; some, like the schedule thing cannot
perform wakeups, and hence need 0.
As one can see, there is very little nmi=1 usage, and the down-side of
not using it is that on some platforms some software events can have a
jiffy delay in wakeup (when arch_irq_work_raise isn't implemented).
The up-side however is that we can remove the nmi parameter and save a
bunch of conditionals in fast paths.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Michael Cree <mcree@orcon.net.nz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Replace the s390 specific rcu page-table freeing code with the
generic variant. This requires to duplicate the definition for the
struct mmu_table_batch as s390 does not use the generic tlb flush
code.
While we are at it remove the restriction that page table fragments
can not be reused after a single fragment has been freed with rcu
and split out allocation and freeing of page tables with pgstes.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Quite a few functions that get called from the tlb gather code require that
preemption must be disabled. So disable preemption inside of the called
functions instead.
The only drawback is that rcu_table_freelist_finish() doesn't get necessarily
called on the cpu(s) that filled the free lists. So we may see a delay, until
we finally see an rcu callback. However over time this shouldn't matter.
So we get rid of lots of "BUG: using smp_processor_id() in preemptible"
messages.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (25 commits)
perf: Fix SIGIO handling
perf top: Don't stop if no kernel symtab is found
perf top: Handle kptr_restrict
perf top: Remove unused macro
perf events: initialize fd array to -1 instead of 0
perf tools: Make sure kptr_restrict warnings fit 80 col terms
perf tools: Fix build on older systems
perf symbols: Handle /proc/sys/kernel/kptr_restrict
perf: Remove duplicate headers
ftrace: Add internal recursive checks
tracing: Update btrfs's tracepoints to use u64 interface
tracing: Add __print_symbolic_u64 to avoid warnings on 32bit machine
ftrace: Set ops->flag to enabled even on static function tracing
tracing: Have event with function tracer check error return
ftrace: Have ftrace_startup() return failure code
jump_label: Check entries limit in __jump_label_update
ftrace/recordmcount: Avoid STT_FUNC symbols as base on ARM
scripts/tags.sh: Add magic for trace-events for etags too
scripts/tags.sh: Fix ctags for DEFINE_EVENT()
x86/ftrace: Fix compiler warning in ftrace.c
...