Commit Graph

205 Commits

Author SHA1 Message Date
Aneesh Kumar K.V a4b349540a powerpc/mm: Cleanup LPCR defines
This makes it easy to verify we are not overloading the bits.
No functionality change by this patch.

mpe: Cleanup more. Completely fixup whitespace, convert all UL values to
ASM_CONST(), and replace all occurrences of 63-x with the actual shift.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-19 20:12:28 +10:00
Aneesh Kumar K.V bf16cdf48a powerpc/mm/radix: Update LPCR HR bit as per ISA
PowerISA 3.0 requires the MMU mode (radix vs. hash) of the hypervisor
to be mirrored in the LPCR register, in addition to the partition table.
This is done to avoid fetching from the table when deciding, among other
things, how to perform transitions to HV mode on some interrupts.
So let's set it up appropriately

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:50 +10:00
Benjamin Herrenschmidt 9baaef0a22 powerpc/irq: Add support for HV virtualization interrupts
This will be delivering external interrupts from the XIVE to the
Hypervisor. We treat it as a normal external interrupt for the
lazy irq disable code (so it will be replayed as a 0x500) and
route it to do_IRQ.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:44 +10:00
Shreyas B. Prabhu bcef83a00d powerpc/powernv: Add platform support for stop instruction
POWER ISA v3 defines a new idle processor core mechanism. In summary,
 a) new instruction named stop is added. This instruction replaces
	instructions like nap, sleep, rvwinkle.
 b) new per thread SPR named Processor Stop Status and Control Register
	(PSSCR) is added which controls the behavior of stop instruction.

PSSCR layout:
----------------------------------------------------------
| PLS | /// | SD | ESL | EC | PSLL | /// | TR | MTL | RL |
----------------------------------------------------------
0      4     41   42    43   44     48    54   56    60

PSSCR key fields:
	Bits 0:3  - Power-Saving Level Status. This field indicates the lowest
	power-saving state the thread entered since stop instruction was last
	executed.

	Bit 42 - Enable State Loss
	0 - No state is lost irrespective of other fields
	1 - Allows state loss

	Bits 44:47 - Power-Saving Level Limit
	This limits the power-saving level that can be entered into.

	Bits 60:63 - Requested Level
	Used to specify which power-saving level must be entered on executing
	stop instruction

This patch adds support for stop instruction and PSSCR handling.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15 20:18:41 +10:00
Michael Ellerman e0ddf7a245 powerpc/xmon: Dump ISA 2.07 SPRs
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14 20:26:24 +10:00
Christophe Leroy c223c90386 powerpc32: provide VIRT_CPU_ACCOUNTING
This patch provides VIRT_CPU_ACCOUTING to PPC32 architecture.
PPC32 doesn't have the PACA structure, so we use the task_info
structure to store the accounting data.

In order to reuse on PPC32 the PPC64 functions, all u64 data has
been replaced by 'unsigned long' so that it is u32 on PPC32 and
u64 on PPC64

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-07-09 01:43:50 -05:00
Oliver O'Halloran 7990102446 powerpc/timer: Large Decrementer support
Power ISAv3 adds a large decrementer (LD) mode which increases the size
of the decrementer register. The size of the enlarged decrementer
register is between 32 and 64 bits with the exact size being dependent
on the implementation. When in LD mode, reads are sign extended to 64
bits and a decrementer exception is raised when the high bit is set (i.e
the value goes below zero). Writes however are truncated to the physical
register width so some care needs to be taken to ensure that the high
bit is not set when reloading the decrementer. This patch adds support
for using the LD inside the host kernel on processors that support it.

When LD mode is supported firmware will supply the ibm,dec-bits property
for CPU nodes to allow the kernel to determine the maximum decrementer
value. Enabling LD mode is a hypervisor privileged operation so the kernel
can only enable it manually when running in hypervisor mode. Guests that
support LD mode can request it using the "ibm,client-architecture-support"
firmware call (not implemented in this patch) or some other platform
specific method. If this property is not supplied then the traditional
decrementer width of 32 bit is assumed and LD mode will not be enabled.

This patch was based on initial work by Jack Miller.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-05 23:58:53 +10:00
Jack Miller bd3ea317fd powerpc: Load Monitor Register Support
This enables new registers, LMRR and LMSER, that can trigger an EBB in
userspace code when a monitored load (via the new ldmx instruction)
loads memory from a monitored space. This facility is controlled by a
new FSCR bit, LM.

This patch disables the FSCR LM control bit on task init and enables
that bit when a load monitor facility unavailable exception is taken
for using it. On context switch, this bit is then used to determine
whether the two relevant registers are saved and restored. This is
done lazily for performance reasons.

Signed-off-by: Jack Miller <jack@codezen.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21 15:30:50 +10:00
Thomas Huth 8dd75ccb57 powerpc: Use privileged SPR number for MMCR2
We are already using the privileged versions of MMCR0, MMCR1
and MMCRA in the kernel, so for MMCR2, we should better use
the privileged versions, too, to be consistent.

Fixes: 240686c136 ("powerpc: Initialise PMU related regs on Power8")
Cc: stable@vger.kernel.org # v3.10+
Suggested-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-31 20:08:32 +10:00
Thomas Huth d23fac2b27 powerpc: Fix definition of SIAR and SDAR registers
The SIAR and SDAR registers are available twice, one time as SPRs
780 / 781 (unprivileged, but read-only), and one time as the SPRs
796 / 797 (privileged, but read and write). The Linux kernel code
currently uses the unprivileged  SPRs - while this is OK for reading,
writing to that register of course does not work.
Since the KVM code tries to write to this register, too (see the mtspr
in book3s_hv_rmhandlers.S), the contents of this register sometimes get
lost for the guests, e.g. during migration of a VM.
To fix this issue, simply switch to the privileged SPR numbers instead.

Cc: stable@vger.kernel.org
Signed-off-by: Thomas Huth <thuth@redhat.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-31 20:07:56 +10:00
Aneesh Kumar K.V 2bfd65e45e powerpc/mm/radix: Add radix callbacks for early init routines
This adds routines for early setup for radix. We use device tree
property "ibm,processor-radix-AP-encodings" to find supported page
sizes. If we don't find the above we consider 64K and 4K as supported
page sizes.

We do map vmemap using 2M page size if we can. The linear mapping is
done such that we use required page size for that range. For example
memory of 3.5G is mapped such that we use 1G mapping till 3G range and
use 2M mapping for the rest.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:33:00 +10:00
Aneesh Kumar K.V e99833448c powerpc/mm/radix: Add partition table format & callback
Add structs and #defines related to the radix MMU partition table
format. We also add a ppc_md callback for updating a partition table
entry.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:32:39 +10:00
Philippe Bergheaud 86c9ffcc1e powerpc: Define PVR value for POWER8NVL processor
Signed-off-by: Philippe Bergheaud <felix@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-04-11 20:30:46 +10:00
Christophe Leroy 1458dd951f powerpc/8xx: Handle CPU6 ERRATA directly in mtspr() macro
MPC8xx has an ERRATA on the use of mtspr() for some registers
This patch includes the ERRATA handling directly into mtspr() macro
so that mtspr() users don't need to bother about that errata

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-11 17:18:02 -06:00
Cyril Bur de2a20aa72 powerpc: Prepare for splitting giveup_{fpu, altivec, vsx} in two
This prepares for the decoupling of saving {fpu,altivec,vsx} registers and
marking {fpu,altivec,vsx} as being unused by a thread.

Currently giveup_{fpu,altivec,vsx}() does both however optimisations to
task switching can be made if these two operations are decoupled.
save_all() will permit the saving of registers to thread structs and leave
threads MSR with bits enabled.

This patch introduces no functional change.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-02 23:34:48 +11:00
Adam Buchbinder 446957ba51 powerpc: Fix misspellings in comments.
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-01 19:27:20 +11:00
Michael Ellerman 1901d8bb45 powerpc fixes for 4.4 #2
- tm: Block signal return from setting invalid MSR state from Michael Neuling
  - tm: Check for already reclaimed tasks from Michael Neuling
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWVul+AAoJEFHr6jzI4aWAUZAQAK2s+E4WTtcExXSG1bqq05o6
 5wCLtIq6M91h5HDpgBfN7S5OXpRd72ZeIVdzC0HkFeLBF3y7NSHSEezUw4g/GGfz
 K2xGV1CCXC3Rb3qyHSdyi6+c1AnLVRPBVzVxPVmlXigrXeFiQ4613YW9rzf9b8fs
 oktUciwW9aHbrIv7g8f82gpuk9jwwhp/sF+1H/7fGOozT4CFsKo4wj4HOOCBwH4y
 ODEjs6Z+9Uwb6Kfvi/rn3k4XA1wC36WFq3ORI6KrmK/ZB1eR0Kwf0IELYpMj8cOX
 q5ZtCH7t68f9vmEK2B34AUijf/amm+2vLwvF6xAuZJFPUPZtgMBdRcqkLalbtPAO
 8hlyPPgoZcgR/Of+lEYxUobcL0SMNufXwmfwRO35ktkm9Z9Ee96C8NNbpybBSDXL
 YRa6is5MeO4GL8Gbcc0TA50hGjok7o3acGE6HSAReyzf0guQ4xqcif7+6lfWZPkk
 P3aM02ajp2qoqyjhT/Ei6JlMptAiuQY+HvELFneqn5s9nDbv6cGuYZNNap0c1fK+
 74W0p7MiZh7+IF5HpyUIeYV836inXMDIoKzjA6H3OWitk/1lbcrbF34Qpz9zAWZn
 YF3w786ZzzQLw0jcALaqZejm58MLGIakO4MNDB0/ZBh0nKKfEV8WvPOjev78OAp1
 +pDrJh0iQzrujN6OMKQ0
 =IL8A
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.4-3' into next

Merge the two TM fixes we merged in 4.4. We are about to merge selftests
for these, and without the fixes the selftests will oops.

powerpc fixes for 4.4 #2

 - tm: Block signal return from setting invalid MSR state from Michael Neuling
 - tm: Check for already reclaimed tasks from Michael Neuling
2015-12-14 20:40:32 +11:00
Anton Blanchard 3eb5d5888d powerpc: Add ppc_strict_facility_enable boot option
Add a boot option that strictly manages the MSR unavailable bits.
This catches kernel uses of FP/Altivec/SPE that would otherwise
corrupt user state.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-01 13:52:26 +11:00
Anton Blanchard 611b0e5c19 powerpc: Create mtmsrd_isync()
mtmsrd_isync() will do an mtmsrd followed by an isync on older
processors. On newer processors we avoid the isync via a feature fixup.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-01 13:52:25 +11:00
Michael Neuling d2b9d2a5ad powerpc/tm: Block signal return setting invalid MSR state
Currently we allow both the MSR T and S bits to be set by userspace on
a signal return.  Unfortunately this is a reserved configuration and
will cause a TM Bad Thing exception if attempted (via rfid).

This patch checks for this case in both the 32 and 64 bit signals
code.  If both T and S are set, we mark the context as invalid.

Found using a syscall fuzzer.

Fixes: 2b0a576d15 ("powerpc: Add new transactional memory state to the signal context")
Cc: stable@vger.kernel.org # v3.9+
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-11-23 20:06:31 +11:00
Paul Mackerras 23316316c1 powerpc: Revert "Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8"
This reverts commit 9678cdaae9 ("Use the POWER8 Micro Partition
Prefetch Engine in KVM HV on POWER8") because the original commit had
multiple, partly self-cancelling bugs, that could cause occasional
memory corruption.

In fact the logmpp instruction was incorrectly using register r0 as the
source of the buffer address and operation code, and depending on what
was in r0, it would either do nothing or corrupt the 64k page pointed to
by r0.

The logmpp instruction encoding and the operation code definitions could
be corrected, but then there is the problem that there is no clearly
defined way to know when the hardware has finished writing to the
buffer.

The original commit attempted to work around this by aborting the
write-out before starting the prefetch, but this is ineffective in the
case where the virtual core is now executing on a different physical
core from the one where the write-out was initiated.

These problems plus advice from the hardware designers not to use the
function (since the measured performance improvement from using the
feature was actually mostly negative), mean that reverting the code is
the best option.

Fixes: 9678cdaae9 ("Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8")
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-21 20:50:30 +11:00
Gautham R. Shenoy e63dbd16ab powerpc: Add an inline function to update POWER8 HID0
Section 3.7 of Version 1.2 of the Power8 Processor User's Manual
prescribes that updates to HID0 be preceded by a SYNC instruction and
followed by an ISYNC instruction (Page 91).

Create an inline function name update_power8_hid0() which follows this
recipe and invoke it from the static split core path.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Tested-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-14 15:58:28 +10:00
Anton Blanchard 1c53973172 powerpc: Remove mtmsrd(), use existing mtmsr()
mtmsr() does the right thing on 32bit and 64bit, so use it everywhere.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-07-13 15:47:28 +10:00
Paul Mackerras 755563bc79 powerpc/powernv: Fixes for hypervisor doorbell handling
Since we can now use hypervisor doorbells for host IPIs, this makes
sure we clear the host IPI flag when taking a doorbell interrupt, and
clears any pending doorbell IPI in pnv_smp_cpu_kill_self() (as we
already do for IPIs sent via the XICS interrupt controller).  Otherwise
if there did happen to be a leftover pending doorbell interrupt for
an offline CPU thread for any reason, it would prevent that thread from
going into a power-saving mode; it would instead keep waking up because
of the interrupt.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-20 14:51:53 +11:00
Shreyas B. Prabhu 77b54e9f21 powernv/powerpc: Add winkle support for offline cpus
Winkle is a deep idle state supported in power8 chips. A core enters
winkle when all the threads of the core enter winkle. In this state
power supply to the entire chiplet i.e core, private L2 and private L3
is turned off. As a result it gives higher powersavings compared to
sleep.

But entering winkle results in a total hypervisor state loss. Hence the
hypervisor context has to be preserved before entering winkle and
restored upon wake up.

Power-on Reset Engine (PORE) is a dedicated engine which is responsible
for powering on the chiplet during wake up. It can be programmed to
restore the register contests of a few specific registers. This patch
uses PORE to restore register state wherever possible and uses stack to
save and restore rest of the necessary registers.

With hypervisor state restore things fall under three categories-
per-core state, per-subcore state and per-thread state. To manage this,
extend the infrastructure introduced for sleep. Mainly we add a paca
variable subcore_sibling_mask. Using this and the core_idle_state we can
distingush first thread in core and subcore.

Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-15 10:46:41 +11:00
Paul Mackerras 8117ac6a6c powerpc/powernv: Switch off MMU before entering nap/sleep/rvwinkle mode
Currently, when going idle, we set the flag indicating that we are in
nap mode (paca->kvm_hstate.hwthread_state) and then execute the nap
(or sleep or rvwinkle) instruction, all with the MMU on.  This is bad
for two reasons: (a) the architecture specifies that those instructions
must be executed with the MMU off, and in fact with only the SF, HV, ME
and possibly RI bits set, and (b) this introduces a race, because as
soon as we set the flag, another thread can switch the MMU to a guest
context.  If the race is lost, this thread will typically start looping
on relocation-on ISIs at 0xc...4400.

This fixes it by setting the MSR as required by the architecture before
setting the flag or executing the nap/sleep/rvwinkle instruction.

Cc: stable@vger.kernel.org
[ shreyas@linux.vnet.ibm.com: Edited to handle LE ]
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-15 10:46:32 +11:00
Anton Blanchard acf620ecf5 powerpc: Rename __get_SP() to current_stack_pointer()
Michael points out that __get_SP() is a pretty horrible
function name. Let's give it a better name.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-15 11:23:20 +11:00
Anton Blanchard bfe9a2cfe9 powerpc: Reimplement __get_SP() as a function not a define
Li Zhong points out an issue with our current __get_SP()
implementation. If ftrace function tracing is enabled (ie -pg
profiling using _mcount) we spill a stack frame on 64bit all the
time.

If a function calls __get_SP() and later calls a function that is
tail call optimised, we will pop the stack frame and the value
returned by __get_SP() is no longer valid. An example from Li can
be found in save_stack_trace -> save_context_stack:

c0000000000432c0 <.save_stack_trace>:
c0000000000432c0:       mflr    r0
c0000000000432c4:       std     r0,16(r1)
c0000000000432c8:       stdu    r1,-128(r1) <-- stack frame for _mcount
c0000000000432cc:       std     r3,112(r1)
c0000000000432d0:       bl      <._mcount>
c0000000000432d4:       nop

c0000000000432d8:       mr      r4,r1 <-- __get_SP()

c0000000000432dc:       ld      r5,632(r13)
c0000000000432e0:       ld      r3,112(r1)
c0000000000432e4:       li      r6,1

c0000000000432e8:       addi    r1,r1,128 <-- pop stack frame

c0000000000432ec:       ld      r0,16(r1)
c0000000000432f0:       mtlr    r0
c0000000000432f4:       b       <.save_context_stack> <-- tail call optimized

save_context_stack ends up with a stack pointer below the current
one, and it is likely to be scribbled over.

Fix this by making __get_SP() a function which returns the
callers stack frame. Also replace inline assembly which grabs
the stack pointer in save_stack_trace and show_stack with
__get_SP().

This also fixes an issue with perf_arch_fetch_caller_regs().
It currently unwinds the stack once, which will skip a
valid stack frame on a leaf function. With the __get_SP() fixes
in this patch, we never need to unwind the stack frame to get
to the first interesting frame.

We have to export __get_SP() because perf_arch_fetch_caller_regs()
(which is used in modules) calls it from a header file.

Reported-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-15 11:23:19 +11:00
Michael Ellerman 75d43b2d0a Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux.git
Freescale updates from Scott (27 commits):

  "Highlights include DMA32 zone support (SATA, USB, etc now works on 64-bit
   FSL kernels), MSI changes, 8xx optimizations and cleanup, t104x board
   support, and PrPMC PCI enumeration."
2014-10-04 08:59:06 +10:00
LEROY Christophe ae466bde19 powerpc/8xx: Declare SPRG2 as a SCRATCH register
Since commit 469d62be92, SPRG2 is used as a
scratch register just like SPRG0 and SPRG1. So Declare it as such and fix
the comment which is not valid anymore since that commit.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-09-04 19:02:11 -05:00
Nishanth Aravamudan 56758e3c3c powerpc: remove duplicate definition of TEXASR_FS
It appears that commits 7f06f21d40 ("powerpc/tm: Add checking to
treclaim/trechkpt") and e4e3812150 ("KVM: PPC: Book3S HV: Add
transactional memory support") both added definitions of TEXASR_FS.
Remove one of them. At the same time, fix the alignment of the remaining
definition (should be tab-separated like the rest of the #defines).

Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 15:13:47 +10:00
Linus Torvalds 66bb0aa077 Here are the PPC and ARM changes for KVM, which I separated because
they had small conflicts (respectively within KVM documentation,
 and with 3.16-rc changes).  Since they were all within the subsystem,
 I took care of them.
 
 Stephen Rothwell reported some snags in PPC builds, but they are all
 fixed now; the latest linux-next report was clean.
 
 New features for ARM include:
 - KVM VGIC v2 emulation on GICv3 hardware
 - Big-Endian support for arm/arm64 (guest and host)
 - Debug Architecture support for arm64 (arm32 is on Christoffer's todo list)
 
 And for PPC:
 - Book3S: Good number of LE host fixes, enable HV on LE
 - Book3S HV: Add in-guest debug support
 
 This release drops support for KVM on the PPC440.  As a result, the
 PPC merge removes more lines than it adds. :)
 
 I also included an x86 change, since Davidlohr tied it to an independent
 bug report and the reporter quickly provided a Tested-by; there was no
 reason to wait for -rc2.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJT4iIJAAoJEBvWZb6bTYbyZqoP/3Wxy8NWPFJ8HGt81NHlGnDS
 a9UbL7EibcOEG+aaKqmtBglTD5YDiGBDNCxxiSJaDHt+grLN4fsWIliJob1nJFoO
 90f89EWN2XjeCrJXA5nUoeg5tpc5OoYKsiP6pTgzIwkP8vvs/H1+zpcTS/UmYsr/
 qipVMMsM+zZeHWZcSbqjW88z7YqIn1sr5282wJ85cbyv4KGizb/G4dyPuDqLb6np
 hkAD8Ah6VV2suQ2FSy7G2fg20R0vglUi60hkEHLoCBPVqJCl7SmC8MvxNbjBnP8S
 J36R0R0u1wHYKzAGooLJGVOZ/o/gSiVqKX+++L2EvJBN+kuA6u/7fxLyBT+LwDAE
 IF/Aln5rpg1fe+eywvhz86WljTVEQ8bO1zVsIQUPY+/ZOPedZHMwyvXft8ogbjSp
 2m9OJ/3e8Aggh0OeHpCDoeow+QDUXvX0YdCw+2Yh0p+7VMXqkyp0QEiBu38jrusC
 rB3VNifJbDSWLKdG9LfCAPHnxZD2XYEwv2WFBo6KQOGMGHfx0GXpCOL/jQihrhA6
 HtEG5Bs3lvnHQemdpUZ58xojiABbMaUPdcnPXQQEp23WhZzrfLMLzqVG0VYnhSsC
 9pi7MJj8c31rqx5WU2oRM28i/BvNxN0NCtkDpineO5s3f89Ws1xnwxqlm38AKP0J
 irJQTYFEqec+GM9JK1rG
 =hyQP
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull second round of KVM changes from Paolo Bonzini:
 "Here are the PPC and ARM changes for KVM, which I separated because
  they had small conflicts (respectively within KVM documentation, and
  with 3.16-rc changes).  Since they were all within the subsystem, I
  took care of them.

  Stephen Rothwell reported some snags in PPC builds, but they are all
  fixed now; the latest linux-next report was clean.

  New features for ARM include:
   - KVM VGIC v2 emulation on GICv3 hardware
   - Big-Endian support for arm/arm64 (guest and host)
   - Debug Architecture support for arm64 (arm32 is on Christoffer's todo list)

  And for PPC:
   - Book3S: Good number of LE host fixes, enable HV on LE
   - Book3S HV: Add in-guest debug support

  This release drops support for KVM on the PPC440.  As a result, the
  PPC merge removes more lines than it adds.  :)

  I also included an x86 change, since Davidlohr tied it to an
  independent bug report and the reporter quickly provided a Tested-by;
  there was no reason to wait for -rc2"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (122 commits)
  KVM: Move more code under CONFIG_HAVE_KVM_IRQFD
  KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use
  KVM: nVMX: Fix nested vmexit ack intr before load vmcs01
  KVM: PPC: Enable IRQFD support for the XICS interrupt controller
  KVM: Give IRQFD its own separate enabling Kconfig option
  KVM: Move irq notifier implementation into eventfd.c
  KVM: Move all accesses to kvm::irq_routing into irqchip.c
  KVM: irqchip: Provide and use accessors for irq routing table
  KVM: Don't keep reference to irq routing table in irqfd struct
  KVM: PPC: drop duplicate tracepoint
  arm64: KVM: fix 64bit CP15 VM access for 32bit guests
  KVM: arm64: GICv3: mandate page-aligned GICV region
  arm64: KVM: GICv3: move system register access to msr_s/mrs_s
  KVM: PPC: PR: Handle FSCR feature deselects
  KVM: PPC: HV: Remove generic instruction emulation
  KVM: PPC: BOOKEHV: rename e500hv_spr to bookehv_spr
  KVM: PPC: Remove DCR handling
  KVM: PPC: Expose helper functions for data/inst faults
  KVM: PPC: Separate loadstore emulation from priv emulation
  KVM: PPC: Handle magic page in kvmppc_ld/st
  ...
2014-08-07 11:35:30 -07:00
Stewart Smith 9678cdaae9 Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8
The POWER8 processor has a Micro Partition Prefetch Engine, which is
a fancy way of saying "has way to store and load contents of L2 or
L2+MRU way of L3 cache". We initiate the storing of the log (list of
addresses) using the logmpp instruction and start restore by writing
to a SPR.

The logmpp instruction takes parameters in a single 64bit register:
- starting address of the table to store log of L2/L2+L3 cache contents
  - 32kb for L2
  - 128kb for L2+L3
  - Aligned relative to maximum size of the table (32kb or 128kb)
- Log control (no-op, L2 only, L2 and L3, abort logout)

We should abort any ongoing logging before initiating one.

To initiate restore, we write to the MPPR SPR. The format of what to write
to the SPR is similar to the logmpp instruction parameter:
- starting address of the table to read from (same alignment requirements)
- table size (no data, until end of table)
- prefetch rate (from fastest possible to slower. about every 8, 16, 24 or
  32 cycles)

The idea behind loading and storing the contents of L2/L3 cache is to
reduce memory latency in a system that is frequently swapping vcores on
a physical CPU.

The best case scenario for doing this is when some vcores are doing very
cache heavy workloads. The worst case is when they have about 0 cache hits,
so we just generate needless memory operations.

This implementation just does L2 store/load. In my benchmarks this proves
to be useful.

Benchmark 1:
 - 16 core POWER8
 - 3x Ubuntu 14.04LTS guests (LE) with 8 VCPUs each
 - No split core/SMT
 - two guests running sysbench memory test.
   sysbench --test=memory --num-threads=8 run
 - one guest running apache bench (of default HTML page)
   ab -n 490000 -c 400 http://localhost/

This benchmark aims to measure performance of real world application (apache)
where other guests are cache hot with their own workloads. The sysbench memory
benchmark does pointer sized writes to a (small) memory buffer in a loop.

In this benchmark with this patch I can see an improvement both in requests
per second (~5%) and in mean and median response times (again, about 5%).
The spread of minimum and maximum response times were largely unchanged.

benchmark 2:
 - Same VM config as benchmark 1
 - all three guests running sysbench memory benchmark

This benchmark aims to see if there is a positive or negative affect to this
cache heavy benchmark. Although due to the nature of the benchmark (stores) we
may not see a difference in performance, but rather hopefully an improvement
in consistency of performance (when vcore switched in, don't have to wait
many times for cachelines to be pulled in)

The results of this benchmark are improvements in consistency of performance
rather than performance itself. With this patch, the few outliers in duration
go away and we get more consistent performance in each guest.

benchmark 3:
 - same 3 guests and CPU configuration as benchmark 1 and 2.
 - two idle guests
 - 1 guest running STREAM benchmark

This scenario also saw performance improvement with this patch. On Copy and
Scale workloads from STREAM, I got 5-6% improvement with this patch. For
Add and triad, it was around 10% (or more).

benchmark 4:
 - same 3 guests as previous benchmarks
 - two guests running sysbench --memory, distinctly different cache heavy
   workload
 - one guest running STREAM benchmark.

Similar improvements to benchmark 3.

benchmark 5:
 - 1 guest, 8 VCPUs, Ubuntu 14.04
 - Host configured with split core (SMT8, subcores-per-core=4)
 - STREAM benchmark

In this benchmark, we see a 10-20% performance improvement across the board
of STREAM benchmark results with this patch.

Based on preliminary investigation and microbenchmarks
by Prerna Saxena <prerna@linux.vnet.ibm.com>

Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-07-28 15:23:17 +02:00
Bharat Bhushan 8c95ead603 KVM: PPC: Remove comment saying SPRG1 is used for vcpu pointer
Scott Wood pointed out that We are no longer using SPRG1 for vcpu pointer,
but using SPRN_SPRG_THREAD <=> SPRG3 (thread->vcpu). So this comment
is not valid now.

Note: SPRN_SPRG3R is not supported (do not see any need as of now),
and if we want to support this in future then we have to shift to using
SPRG1 for VCPU pointer.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-07-28 15:23:15 +02:00
Aneesh Kumar K.V 8f42ab2749 KVM: PPC: BOOK3S: PR: Emulate virtual timebase register
virtual time base register is a per VM, per cpu register that needs
to be saved and restored on vm exit and entry. Writing to VTB is not
allowed in the privileged mode.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[agraf: fix compile error]
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-07-28 15:21:50 +02:00
Michael Ellerman 376af5947c powerpc: Remove STAB code
Old cpus didn't have a Segment Lookaside Buffer (SLB), instead they had
a Segment Table (STAB). Now that we've dropped support for those cpus,
we can remove the STAB support entirely.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-28 14:10:22 +10:00
Linus Torvalds c5aec4c76a Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc updates from Ben Herrenschmidt:
 "Here is the bulk of the powerpc changes for this merge window.  It got
  a bit delayed in part because I wasn't paying attention, and in part
  because I discovered I had a core PCI change without a PCI maintainer
  ack in it.  Bjorn eventually agreed it was ok to merge it though we'll
  probably improve it later and I didn't want to rebase to add his ack.

  There is going to be a bit more next week, essentially fixes that I
  still want to sort through and test.

  The biggest item this time is the support to build the ppc64 LE kernel
  with our new v2 ABI.  We previously supported v2 userspace but the
  kernel itself was a tougher nut to crack.  This is now sorted mostly
  thanks to Anton and Rusty.

  We also have a fairly big series from Cedric that add support for
  64-bit LE zImage boot wrapper.  This was made harder by the fact that
  traditionally our zImage wrapper was always 32-bit, but our new LE
  toolchains don't really support 32-bit anymore (it's somewhat there
  but not really "supported") so we didn't want to rely on it.  This
  meant more churn that just endian fixes.

  This brings some more LE bits as well, such as the ability to run in
  LE mode without a hypervisor (ie. under OPAL firmware) by doing the
  right OPAL call to reinitialize the CPU to take HV interrupts in the
  right mode and the usual pile of endian fixes.

  There's another series from Gavin adding EEH improvements (one day we
  *will* have a release with less than 20 EEH patches, I promise!).

  Another highlight is the support for the "Split core" functionality on
  P8 by Michael.  This allows a P8 core to be split into "sub cores" of
  4 threads which allows the subcores to run different guests under KVM
  (the HW still doesn't support a partition per thread).

  And then the usual misc bits and fixes ..."

[ Further delayed by gmail deciding that BenH is a dirty spammer.
  Google knows.  ]

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (155 commits)
  powerpc/powernv: Add missing include to LPC code
  selftests/powerpc: Test the THP bug we fixed in the previous commit
  powerpc/mm: Check paca psize is up to date for huge mappings
  powerpc/powernv: Pass buffer size to OPAL validate flash call
  powerpc/pseries: hcall functions are exported to modules, need _GLOBAL_TOC()
  powerpc: Exported functions __clear_user and copy_page use r2 so need _GLOBAL_TOC()
  powerpc/powernv: Set memory_block_size_bytes to 256MB
  powerpc: Allow ppc_md platform hook to override memory_block_size_bytes
  powerpc/powernv: Fix endian issues in memory error handling code
  powerpc/eeh: Skip eeh sysfs when eeh is disabled
  powerpc: 64bit sendfile is capped at 2GB
  powerpc/powernv: Provide debugfs access to the LPC bus via OPAL
  powerpc/serial: Use saner flags when creating legacy ports
  powerpc: Add cpu family documentation
  powerpc/xmon: Fix up xmon format strings
  powerpc/powernv: Add calls to support little endian host
  powerpc: Document sysfs DSCR interface
  powerpc: Fix regression of per-CPU DSCR setting
  powerpc: Split __SYSFS_SPRSETUP macro
  arch: powerpc/fadump: Cleaning up inconsistent NULL checks
  ...
2014-06-10 18:54:22 -07:00
Paul Mackerras 9bc01a9bc7 KVM: PPC: Book3S HV: Work around POWER8 performance monitor bugs
This adds workarounds for two hardware bugs in the POWER8 performance
monitor unit (PMU), both related to interrupt generation.  The effect
of these bugs is that PMU interrupts can get lost, leading to tools
such as perf reporting fewer counts and samples than they should.

The first bug relates to the PMAO (perf. mon. alert occurred) bit in
MMCR0; setting it should cause an interrupt, but doesn't.  The other
bug relates to the PMAE (perf. mon. alert enable) bit in MMCR0.
Setting PMAE when a counter is negative and counter negative
conditions are enabled to cause alerts should cause an alert, but
doesn't.

The workaround for the first bug is to create conditions where a
counter will overflow, whenever we are about to restore a MMCR0
value that has PMAO set (and PMAO_SYNC clear).  The workaround for
the second bug is to freeze all counters using MMCR2 before reading
MMCR0.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30 14:26:29 +02:00
Michael Ellerman e2186023f2 powerpc/powernv: Add support for POWER8 split core on powernv
Upcoming POWER8 chips support a concept called split core. This is where the
core can be split into subcores that although not full cores, are able to
appear as full cores to a guest.

The splitting & unsplitting procedure is mildly complicated, and explained at
length in the comments within the patch.

One notable detail is that when splitting or unsplitting we need to pull
offline cpus out of their offline state to do work as part of the procedure.

The interface for changing the split mode is via a sysfs file, eg:

 $ echo 2 > /sys/devices/system/cpu/subcores_per_core

Currently supported values are '1', '2' and '4'. And indicate respectively that
the core should be unsplit, split in half, and split in quarters. These modes
correspond to threads_per_subcore of 8, 4 and 2.

We do not allow changing the split mode while KVM VMs are active. This is to
prevent the value changing while userspace is configuring the VM, and also to
prevent the mode being changed in such a way that existing guests are unable to
be run.

CPU hotplug fixes by Srivatsa.  max_cpus fixes by Mahesh.  cpuset fixes by
benh.  Fix for irq race by paulus.  The rest by mikey and mpe.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-05-28 13:35:37 +10:00
Michael Neuling 7f06f21d40 powerpc/tm: Add checking to treclaim/trechkpt
If we do a treclaim and we are not in TM suspend mode, it results in a TM bad
thing (ie. a 0x700 program check).  Similarly if we do a trechkpt and we have
an active transaction or TEXASR Failure Summary (FS) is not set, we also take a
TM bad thing.

This should never happen, but if it does (ie. a kernel bug), the cause is
almost impossible to debug as the GPR state is mostly userspace and hence we
don't get a call chain.

This adds some checks in these cases case a BUG_ON() (in asm) in case we ever
hit these cases.  It moves the register saving around to preserve r1 till later
also.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-04-28 17:36:51 +10:00
Rafael J. Wysocki fe10739284 Merge branch 'pm-cpufreq'
* pm-cpufreq:
  cpufreq: ppc: Remove duplicate inclusion of fsl_soc.h
  cpufreq: create another field .flags in cpufreq_frequency_table
  cpufreq: use kzalloc() to allocate memory for cpufreq_frequency_table
  cpufreq: don't print value of .driver_data from core
  cpufreq: ia64: don't set .driver_data to index
  cpufreq: powernv: Select CPUFreq related Kconfig options for powernv
  cpufreq: powernv: Use cpufreq_frequency_table.driver_data to store pstate ids
  cpufreq: powernv: cpufreq driver for powernv platform
  cpufreq: at32ap: don't declare local variable as static
  cpufreq: loongson2_cpufreq: don't declare local variable as static
  cpufreq: unicore32: fix typo issue for 'clk'
  cpufreq: exynos: Disable on multiplatform build
2014-04-08 13:28:02 +02:00
Vaidyanathan Srinivasan b3d627a5f2 cpufreq: powernv: cpufreq driver for powernv platform
Backend driver to dynamically set voltage and frequency on
IBM POWER non-virtualized platforms.  Power management SPRs
are used to set the required PState.

This driver works in conjunction with cpufreq governors
like 'ondemand' to provide a demand based frequency and
voltage setting on IBM POWER non-virtualized platforms.

PState table is obtained from OPAL v3 firmware through device
tree.

powernv_cpufreq back-end driver would parse the relevant device-tree
nodes and initialise the cpufreq subsystem on powernv platform.

The code was originally written by svaidy@linux.vnet.ibm.com. Over
time it was modified to accomodate bug-fixes as well as updates to the
the cpu-freq core. Relevant portions of the change logs corresponding
to those modifications are noted below:

 * The policy->cpus needs to be populated in a hotplug-invariant
   manner instead of using cpu_sibling_mask() which varies with
   cpu-hotplug. This is because the cpufreq core code copies this
   content into policy->related_cpus mask which should not vary on
   cpu-hotplug. [Authored by srivatsa.bhat@linux.vnet.ibm.com]

 * Create a helper routine that can return the cpu-frequency for the
   corresponding pstate_id. Also, cache the values of the pstate_max,
   pstate_min and pstate_nominal and nr_pstates in a static structure
   so that they can be reused in the future to perform any
   validations. [Authored by ego@linux.vnet.ibm.com]

 * Create a driver attribute named cpuinfo_nominal_freq which creates
   a sysfs read-only file named cpuinfo_nominal_freq. Export the
   frequency corresponding to the nominal_pstate through this
   interface.

     Nominal frequency is the highest non-turbo frequency for the
   platform.  This is generally used for setting governor policies
   from user space for optimal energy efficiency. [Authored by
   ego@linux.vnet.ibm.com]

 * Implement a powernv_cpufreq_get(unsigned int cpu) method which will
   return the current operating frequency. Export this via the sysfs
   interface cpuinfo_cur_freq by setting powernv_cpufreq_driver.get to
   powernv_cpufreq_get(). [Authored by ego@linux.vnet.ibm.com]

[Change log updated by ego@linux.vnet.ibm.com]

Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-04-07 14:35:27 +02:00
Linus Torvalds 7cbb39d4d4 Merge tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
 "PPC and ARM do not have much going on this time.  Most of the cool
  stuff, instead, is in s390 and (after a few releases) x86.

  ARM has some caching fixes and PPC has transactional memory support in
  guests.  MIPS has some fixes, with more probably coming in 3.16 as
  QEMU will soon get support for MIPS KVM.

  For x86 there are optimizations for debug registers, which trigger on
  some Windows games, and other important fixes for Windows guests.  We
  now expose to the guest Broadwell instruction set extensions and also
  Intel MPX.  There's also a fix/workaround for OS X guests, nested
  virtualization features (preemption timer), and a couple kvmclock
  refinements.

  For s390, the main news is asynchronous page faults, together with
  improvements to IRQs (floating irqs and adapter irqs) that speed up
  virtio devices"

* tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (96 commits)
  KVM: PPC: Book3S HV: Save/restore host PMU registers that are new in POWER8
  KVM: PPC: Book3S HV: Fix decrementer timeouts with non-zero TB offset
  KVM: PPC: Book3S HV: Don't use kvm_memslots() in real mode
  KVM: PPC: Book3S HV: Return ENODEV error rather than EIO
  KVM: PPC: Book3S: Trim top 4 bits of physical address in RTAS code
  KVM: PPC: Book3S HV: Add get/set_one_reg for new TM state
  KVM: PPC: Book3S HV: Add transactional memory support
  KVM: Specify byte order for KVM_EXIT_MMIO
  KVM: vmx: fix MPX detection
  KVM: PPC: Book3S HV: Fix KVM hang with CONFIG_KVM_XICS=n
  KVM: PPC: Book3S: Introduce hypervisor call H_GET_TCE
  KVM: PPC: Book3S HV: Fix incorrect userspace exit on ioeventfd write
  KVM: s390: clear local interrupts at cpu initial reset
  KVM: s390: Fix possible memory leak in SIGP functions
  KVM: s390: fix calculation of idle_mask array size
  KVM: s390: randomize sca address
  KVM: ioapic: reinject pending interrupts on KVM_SET_IRQCHIP
  KVM: Bump KVM_MAX_IRQ_ROUTES for s390
  KVM: s390: irq routing for adapter interrupts.
  KVM: s390: adapter interrupt sources
  ...
2014-04-02 14:50:10 -07:00
Paolo Bonzini 7227fc0666 Merge branch 'kvm-ppchv-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into kvm-next 2014-03-29 15:44:05 +01:00
Michael Neuling e4e3812150 KVM: PPC: Book3S HV: Add transactional memory support
This adds saving of the transactional memory (TM) checkpointed state
on guest entry and exit.  We only do this if we see that the guest has
an active transaction.

It also adds emulation of the TM state changes when delivering IRQs
into the guest.  According to the architecture, if we are
transactional when an IRQ occurs, the TM state is changed to
suspended, otherwise it's left unchanged.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Scott Wood <scottwood@freescale.com>
2014-03-29 19:58:02 +11:00
Benjamin Herrenschmidt cd42748535 Merge remote-tracking branch 'scott/next' into next
Freescale updates from Scott. Mostly support for critical
and machine check exceptions on 64-bit BookE, some new
PCI suspend/resume work and misc bits.
2014-03-24 10:26:10 +11:00
Michael Ellerman 76cb8a783a powerpc/perf: Enable BHRB access for EBB events
The previous commit added constraint and register handling to allow
processes using EBB (Event Based Branches) to request access to the BHRB
(Branch History Rolling Buffer).

With that in place we can allow processes using EBB to access the BHRB.
This is achieved by setting BHRBA in MMCR0 when we enable EBB access. We
must also clear BHRBA when we are disabling.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-24 09:48:27 +11:00
Michael Ellerman c2e37a2626 powerpc/perf: Add lost exception workaround
Some power8 revisions have a hardware bug where we can lose a PMU
exception, this commit adds a workaround to detect the bad condition and
rectify the situation.

See the comment in the commit for a full description.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-24 09:48:25 +11:00
Scott Wood 9d378dfac8 powerpc/booke64: Use SPRG7 for VDSO
Previously SPRG3 was marked for use by both VDSO and critical
interrupts (though critical interrupts were not fully implemented).

In commit 8b64a9dfb0 ("powerpc/booke64:
Use SPRG0/3 scratch for bolted TLB miss & crit int"), Mihai Caraman
made an attempt to resolve this conflict by restoring the VDSO value
early in the critical interrupt, but this has some issues:

 - It's incompatible with EXCEPTION_COMMON which restores r13 from the
   by-then-overwritten scratch (this cost me some debugging time).
 - It forces critical exceptions to be a special case handled
   differently from even machine check and debug level exceptions.
 - It didn't occur to me that it was possible to make this work at all
   (by doing a final "ld r13, PACA_EXCRIT+EX_R13(r13)") until after
   I made (most of) this patch. :-)

It might be worth investigating using a load rather than SPRG on return
from all exceptions (except TLB misses where the scratch never leaves
the SPRG) -- it could save a few cycles.  Until then, let's stick with
SPRG for all exceptions.

Since we cannot use SPRG4-7 for scratch without corrupting the state of
a KVM guest, move VDSO to SPRG7 on book3e.  Since neither SPRG4-7 nor
critical interrupts exist on book3s, SPRG3 is still used for VDSO
there.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Mihai Caraman <mihai.caraman@freescale.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: kvm-ppc@vger.kernel.org
2014-03-19 19:57:14 -05:00
Wang Dongsheng b0b7dcbdf3 powerpc/fsl: add PVR definition for E500MC and E5500
Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-03-19 19:39:25 -05:00
Linus Torvalds e2a0f813e0 Second batch of KVM updates. Some minor x86 fixes,
two s390 guest features that need some handling in the host,
 and all the PPC changes.  The PPC changes include support for
 little-endian guests and enablement for new POWER8 features.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJS6UF5AAoJEBvWZb6bTYby55kP/AgTJnyu7avN653/2aSHkjkx
 KgYSMYhZPIFoY5LyZuNetXaoXFRvCykux1VYSZ6V6s35h2PZ+hdJNbHGjFYKPGTq
 FQ92xQVNuWCAPxmFCjDNuDV/0BauG5y08/Orh/jpjz+GAfH43LruUQGbtXUuyJ8u
 vf+yTHniU5gguqsAmodqjHUgbf+GoPJ1j7hmRoWwt8IWm7Ns3v/IK4l0p6G0h26a
 RjE6aK+Tm208Yr5hD/dRAqeTbBNt3c4xub+QPsKoiEMaZBSuAOiux7D3Kx+If1gp
 WsmqEQxoymihVtkZhUFO9ONLJepvmG2QwJVVyMSUW9iqxX9rraXsvVyVMwcQAhog
 JuOAYxBftH07xu6Fs4eym5KvCFghM+EaJvxxt+kgnvdD4htK1+eK5trntc2zygSi
 /qGiIrkqjXpkskW8kujLayF0eAU3CrZvFWveEPBfFgYiOGX/2wzJCtSm/bt9Jo0M
 v60qgNFK3LNqAyeEfnm9VtlwGr6ZgsAB6DHNPX4fM5s2IBjL+qloXk/e/+aVKkW0
 I3yeRdy/ExhLAab6w81JtMeR7G3YS0UNuAEVvcoxzNb5wIBY8qnpfUzTKyMxQR94
 64EVpxWEYO1s55eCCyMujWrSvc+YAwhJcWHGKgC4K7mxxLD3FVyQXX6YZvgRozMX
 HjQju+DToj9CskyrFlRL
 =yd0Z
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more KVM updates from Paolo Bonzini:
 "Second batch of KVM updates.  Some minor x86 fixes, two s390 guest
  features that need some handling in the host, and all the PPC changes.

  The PPC changes include support for little-endian guests and
  enablement for new POWER8 features"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (45 commits)
  x86, kvm: correctly access the KVM_CPUID_FEATURES leaf at 0x40000101
  x86, kvm: cache the base of the KVM cpuid leaves
  kvm: x86: move KVM_CAP_HYPERV_TIME outside #ifdef
  KVM: PPC: Book3S PR: Cope with doorbell interrupts
  KVM: PPC: Book3S HV: Add software abort codes for transactional memory
  KVM: PPC: Book3S HV: Add new state for transactional memory
  powerpc/Kconfig: Make TM select VSX and VMX
  KVM: PPC: Book3S HV: Basic little-endian guest support
  KVM: PPC: Book3S HV: Add support for DABRX register on POWER7
  KVM: PPC: Book3S HV: Prepare for host using hypervisor doorbells
  KVM: PPC: Book3S HV: Handle new LPCR bits on POWER8
  KVM: PPC: Book3S HV: Handle guest using doorbells for IPIs
  KVM: PPC: Book3S HV: Consolidate code that checks reason for wake from nap
  KVM: PPC: Book3S HV: Implement architecture compatibility modes for POWER8
  KVM: PPC: Book3S HV: Add handler for HV facility unavailable
  KVM: PPC: Book3S HV: Flush the correct number of TLB sets on POWER8
  KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs
  KVM: PPC: Book3S HV: Align physical and virtual CPU thread numbers
  KVM: PPC: Book3S HV: Don't set DABR on POWER8
  kvm/ppc: IRQ disabling cleanup
  ...
2014-01-31 08:37:32 -08:00
Paolo Bonzini b73117c493 Merge branch 'kvm-ppc-next' of git://github.com/agraf/linux-2.6 into kvm-queue
Conflicts:
	arch/powerpc/kvm/book3s_hv_rmhandlers.S
	arch/powerpc/kvm/booke.c
2014-01-29 18:29:01 +01:00
Paul Mackerras 8563bf52d5 KVM: PPC: Book3S HV: Add support for DABRX register on POWER7
The DABRX (DABR extension) register on POWER7 processors provides finer
control over which accesses cause a data breakpoint interrupt.  It
contains 3 bits which indicate whether to enable accesses in user,
kernel and hypervisor modes respectively to cause data breakpoint
interrupts, plus one bit that enables both real mode and virtual mode
accesses to cause interrupts.  Currently, KVM sets DABRX to allow
both kernel and user accesses to cause interrupts while in the guest.

This adds support for the guest to specify other values for DABRX.
PAPR defines a H_SET_XDABR hcall to allow the guest to set both DABR
and DABRX with one call.  This adds a real-mode implementation of
H_SET_XDABR, which shares most of its code with the existing H_SET_DABR
implementation.  To support this, we add a per-vcpu field to store the
DABRX value plus code to get and set it via the ONE_REG interface.

For Linux guests to use this new hcall, userspace needs to add
"hcall-xdabr" to the set of strings in the /chosen/hypertas-functions
property in the device tree.  If userspace does this and then migrates
the guest to a host where the kernel doesn't include this patch, then
userspace will need to implement H_SET_XDABR by writing the specified
DABR value to the DABR using the ONE_REG interface.  In that case, the
old kernel will set DABRX to DABRX_USER | DABRX_KERNEL.  That should
still work correctly, at least for Linux guests, since Linux guests
cope with getting data breakpoint interrupts in modes that weren't
requested by just ignoring the interrupt, and Linux guests never set
DABRX_BTI.

The other thing this does is to make H_SET_DABR and H_SET_XDABR work
on POWER8, which has the DAWR and DAWRX instead of DABR/X.  Guests that
know about POWER8 should use H_SET_MODE rather than H_SET_[X]DABR, but
guests running in POWER7 compatibility mode will still use H_SET_[X]DABR.
For them, this adds the logic to convert DABR/X values into DAWR/X values
on POWER8.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27 16:01:15 +01:00
Paul Mackerras e0622bd9f2 KVM: PPC: Book3S HV: Handle new LPCR bits on POWER8
POWER8 has a bit in the LPCR to enable or disable the PURR and SPURR
registers to count when in the guest.  Set this bit.

POWER8 has a field in the LPCR called AIL (Alternate Interrupt Location)
which is used to enable relocation-on interrupts.  Allow userspace to
set this field.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27 16:01:11 +01:00
Paul Mackerras aa31e84322 KVM: PPC: Book3S HV: Handle guest using doorbells for IPIs
* SRR1 wake reason field for system reset interrupt on wakeup from nap
  is now a 4-bit field on P8, compared to 3 bits on P7.

* Set PECEDP in LPCR when napping because of H_CEDE so guest doorbells
  will wake us up.

* Waking up from nap because of a guest doorbell interrupt is not a
  reason to exit the guest.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27 16:01:10 +01:00
Paul Mackerras 5557ae0ec7 KVM: PPC: Book3S HV: Implement architecture compatibility modes for POWER8
This allows us to select architecture 2.05 (POWER6) or 2.06 (POWER7)
compatibility modes on a POWER8 processor.  (Note that transactional
memory is disabled for usermode if either or both of the PCR_TM_DIS
and PCR_ARCH_206 bits are set.)

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27 16:01:06 +01:00
Michael Neuling b005255e12 KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs
This adds fields to the struct kvm_vcpu_arch to store the new
guest-accessible SPRs on POWER8, adds code to the get/set_one_reg
functions to allow userspace to access this state, and adds code to
the guest entry and exit to context-switch these SPRs between host
and guest.

Note that DPDES (Directed Privileged Doorbell Exception State) is
shared between threads on a core; hence we store it in struct
kvmppc_vcore and have the master thread save and restore it.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27 16:01:00 +01:00
Wang Dongsheng 71a6fa17e1 powerpc/fsl: add E6500 PVR and SPRN_PWRMGTCR0 define
E6500 PVR and SPRN_PWRMGTCR0 will be used in subsequent pw20/altivec
idle patches.

Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-01-07 19:29:23 -06:00
LEROY Christophe ae2163be10 powerpc/8xx: mfspr SPRN_TBRx in lieu of mftb/mftbu is not supported
Commit beb2dc0a7a breaks the MPC8xx which
seems to not support using mfspr SPRN_TBRx instead of mftb/mftbu
despite what is written in the reference manual.

This patch reverts to the use of mftb/mftbu when CONFIG_8xx is
selected.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2013-11-22 16:56:48 -06:00
Linus Torvalds f080480488 Here are the 3.13 KVM changes. There was a lot of work on the PPC
side: the HV and emulation flavors can now coexist in a single kernel
 is probably the most interesting change from a user point of view.
 On the x86 side there are nested virtualization improvements and a
 few bugfixes.  ARM got transparent huge page support, improved
 overcommit, and support for big endian guests.
 
 Finally, there is a new interface to connect KVM with VFIO.  This
 helps with devices that use NoSnoop PCI transactions, letting the
 driver in the guest execute WBINVD instructions.  This includes
 some nVidia cards on Windows, that fail to start without these
 patches and the corresponding userspace changes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJShPAhAAoJEBvWZb6bTYbyl48P/297GgmELHAGBgjvb6q7yyGu
 L8+eHjKbh4XBAkPwyzbvUjuww5z2hM0N3JQ0BDV9oeXlO+zwwCEns/sg2Q5/NJXq
 XxnTeShaKnp9lqVBnE6G9rAOUWKoyLJ2wItlvUL8JlaO9xJ0Vmk0ta4n2Nv5GqDp
 db6UD7vju6rHtIAhNpvvAO51kAOwc01xxRixCVb7KUYOnmO9nvpixzoI/S0Rp1gu
 w/OWMfCosDzBoT+cOe79Yx1OKcpaVW94X6CH1s+ShCw3wcbCL2f13Ka8/E3FIcuq
 vkZaLBxio7vjUAHRjPObw0XBW4InXEbhI1DjzIvm8dmc4VsgmtLQkTCG8fj+jINc
 dlHQUq6Do+1F4zy6WMBUj8tNeP1Z9DsABp98rQwR8+BwHoQpGQBpAxW0TE0ZMngC
 t1caqyvjZ5pPpFUxSrAV+8Kg4AvobXPYOim0vqV7Qea07KhFcBXLCfF7BWdwq/Jc
 0CAOlsLL4mHGIQWZJuVGw0YGP7oATDCyewlBuDObx+szYCoV4fQGZVBEL0KwJx/1
 7lrLN7JWzRyw6xTgJ5VVwgYE1tUY4IFQcHu7/5N+dw8/xg9KWA3f4PeMavIKSf+R
 qteewbtmQsxUnvuQIBHLs8NRWPnBPy+F3Sc2ckeOLIe4pmfTte6shtTXcLDL+LqH
 NTmT/cfmYp2BRkiCfCiS
 =rWNf
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM changes from Paolo Bonzini:
 "Here are the 3.13 KVM changes.  There was a lot of work on the PPC
  side: the HV and emulation flavors can now coexist in a single kernel
  is probably the most interesting change from a user point of view.

  On the x86 side there are nested virtualization improvements and a few
  bugfixes.

  ARM got transparent huge page support, improved overcommit, and
  support for big endian guests.

  Finally, there is a new interface to connect KVM with VFIO.  This
  helps with devices that use NoSnoop PCI transactions, letting the
  driver in the guest execute WBINVD instructions.  This includes some
  nVidia cards on Windows, that fail to start without these patches and
  the corresponding userspace changes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits)
  kvm, vmx: Fix lazy FPU on nested guest
  arm/arm64: KVM: PSCI: propagate caller endianness to the incoming vcpu
  arm/arm64: KVM: MMIO support for BE guest
  kvm, cpuid: Fix sparse warning
  kvm: Delete prototype for non-existent function kvm_check_iopl
  kvm: Delete prototype for non-existent function complete_pio
  hung_task: add method to reset detector
  pvclock: detect watchdog reset at pvclock read
  kvm: optimize out smp_mb after srcu_read_unlock
  srcu: API for barrier after srcu read unlock
  KVM: remove vm mmap method
  KVM: IOMMU: hva align mapping page size
  KVM: x86: trace cpuid emulation when called from emulator
  KVM: emulator: cleanup decode_register_operand() a bit
  KVM: emulator: check rex prefix inside decode_register()
  KVM: x86: fix emulation of "movzbl %bpl, %eax"
  kvm_host: typo fix
  KVM: x86: emulate SAHF instruction
  MAINTAINERS: add tree for kvm.git
  Documentation/kvm: add a 00-INDEX file
  ...
2013-11-15 13:51:36 +09:00
Paul Mackerras 388cc6e133 KVM: PPC: Book3S HV: Support POWER6 compatibility mode on POWER7
This enables us to use the Processor Compatibility Register (PCR) on
POWER7 to put the processor into architecture 2.05 compatibility mode
when running a guest.  In this mode the new instructions and registers
that were introduced on POWER7 are disabled in user mode.  This
includes all the VSX facilities plus several other instructions such
as ldbrx, stdbrx, popcntw, popcntd, etc.

To select this mode, we have a new register accessible through the
set/get_one_reg interface, called KVM_REG_PPC_ARCH_COMPAT.  Setting
this to zero gives the full set of capabilities of the processor.
Setting it to one of the "logical" PVR values defined in PAPR puts
the vcpu into the compatibility mode for the corresponding
architecture level.  The supported values are:

0x0f000002	Architecture 2.05 (POWER6)
0x0f000003	Architecture 2.06 (POWER7)
0x0f100003	Architecture 2.06+ (POWER7+)

Since the PCR is per-core, the architecture compatibility level and
the corresponding PCR value are stored in the struct kvmppc_vcore, and
are therefore shared between all vcpus in a virtual core.

Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: squash in fix to add missing break statements and documentation]
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 14:45:02 +02:00
Paul Mackerras a0144e2a6b KVM: PPC: Book3S HV: Store LPCR value for each virtual core
This adds the ability to have a separate LPCR (Logical Partitioning
Control Register) value relating to a guest for each virtual core,
rather than only having a single value for the whole VM.  This
corresponds to what real POWER hardware does, where there is a LPCR
per CPU thread but most of the fields are required to have the same
value on all active threads in a core.

The per-virtual-core LPCR can be read and written using the
GET/SET_ONE_REG interface.  Userspace can can only modify the
following fields of the LPCR value:

DPFD	Default prefetch depth
ILE	Interrupt little-endian
TC	Translation control (secondary HPT hash group search disable)

We still maintain a per-VM default LPCR value in kvm->arch.lpcr, which
contains bits relating to memory management, i.e. the Virtualized
Partition Memory (VPM) bits and the bits relating to guest real mode.
When this default value is updated, the update needs to be propagated
to the per-vcore values, so we add a kvmppc_update_lpcr() helper to do
that.

Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix whitespace]
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 14:45:01 +02:00
Paul Mackerras 93b0f4dc29 KVM: PPC: Book3S HV: Implement timebase offset for guests
This allows guests to have a different timebase origin from the host.
This is needed for migration, where a guest can migrate from one host
to another and the two hosts might have a different timebase origin.
However, the timebase seen by the guest must not go backwards, and
should go forwards only by a small amount corresponding to the time
taken for the migration.

Therefore this provides a new per-vcpu value accessed via the one_reg
interface using the new KVM_REG_PPC_TB_OFFSET identifier.  This value
defaults to 0 and is not modified by KVM.  On entering the guest, this
value is added onto the timebase, and on exiting the guest, it is
subtracted from the timebase.

This is only supported for recent POWER hardware which has the TBU40
(timebase upper 40 bits) register.  Writing to the TBU40 register only
alters the upper 40 bits of the timebase, leaving the lower 24 bits
unchanged.  This provides a way to modify the timebase for guest
migration without disturbing the synchronization of the timebase
registers across CPU cores.  The kernel rounds up the value given
to a multiple of 2^24.

Timebase values stored in KVM structures (struct kvm_vcpu, struct
kvmppc_vcore, etc.) are stored as host timebase values.  The timebase
values in the dispatch trace log need to be guest timebase values,
however, since that is read directly by the guest.  This moves the
setting of vcpu->arch.dec_expires on guest exit to a point after we
have restored the host timebase so that vcpu->arch.dec_expires is a
host timebase value.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 14:44:59 +02:00
Anton Blanchard ef1967ff87 powerpc: Set MSR_LE bit on little endian builds
We need to set MSR_LE in kernel and userspace for little endian builds

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-10-11 16:48:31 +11:00
Paul Mackerras 9f24b0c9ef powerpc: Correct FSCR bit definitions
Commit 74e400cee6 ("powerpc: Rework setting up H/FSCR bit definitions")
ended up with incorrect bit numbers for FSCR_PM_LG and FSCR_BHRB_LG.
This fixes them.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-09-05 17:29:20 +10:00
Benjamin Herrenschmidt 3f1f431188 Merge branch 'merge' into next
Merge stuff that already went into Linus via "merge" which
are pre-reqs for subsequent patches
2013-08-27 15:03:30 +10:00
Scott Wood beb2dc0a7a powerpc: Convert some mftb/mftbu into mfspr
Some CPUs (such as e500v1/v2) don't implement mftb and will take a
trap.  mfspr should work on everything that has a timebase, and is the
preferred instruction according to ISA v2.06.

Currently we get away with mftb on 85xx because the assembler converts
it to mfspr due to -Wa,-me500.  However, that flag has other effects
that are undesireable for certain targets (e.g.  lwsync is converted to
sync), and is hostile to multiplatform kernels.  Thus we would like to
stop setting it for all e500-family builds.

mftb/mftbu instances which are in 85xx code or common code are
converted.  Instances which will never run on 85xx are left alone.

Signed-off-by: Scott Wood <scottwood@freescale.com>
2013-08-20 19:33:12 -05:00
Scott Wood d52459ca30 powerpc/fsl-booke: Work around erratum A-006958
Erratum A-006598 says that 64-bit mftb is not atomic -- it's subject
to a similar race condition as doing mftbu/mftbl on 32-bit.  The lower
half of timebase is updated before the upper half; thus, we can share
the workaround for a similar bug on Cell.  This workaround involves
looping if the lower half of timebase is zero, thus avoiding the need
for a scratch register (other than CR0).  This workaround must be
avoided when the timebase is frozen, such as during the timebase sync
code.

This deals with kernel and vdso accesses, but other userspace accesses
will of course need to be fixed elsewhere.

Signed-off-by: Scott Wood <scottwood@freescale.com>
2013-08-20 15:45:49 -05:00
Anton Blanchard 594bd38303 powerpc: Wrap MSR macros with parentheses
Not having parentheses around a macro is asking for trouble.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-08-14 11:50:21 +10:00
Michael Neuling 74e400cee6 powerpc: Rework setting up H/FSCR bit definitions
This reworks the Facility Status and Control Regsiter (FSCR) config bit
definitions so that we can access the bit numbers.  This is needed for a
subsequent patch to fix the userspace DSCR handling.

HFSCR and FSCR bit definitions are the same, so reuse them.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> [v3.10]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-08-09 18:07:01 +10:00
Michael Neuling 33959f88fc powerpc: Add second POWER8 PVR entry
POWER8 comes with two different PVRs.  This patch enables the additional
PVR in the cputable.

The existing entry (PVR=0x4b) is renamed to POWER8E and the new entry
(PVR=0x4d) is given POWER8.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-07-24 14:18:43 +10:00
Michael Ellerman 330a1eb777 powerpc/perf: Core EBB support for 64-bit book3s
Add support for EBB (Event Based Branches) on 64-bit book3s. See the
included documentation for more details.

EBBs are a feature which allows the hardware to branch directly to a
specified user space address when a PMU event overflows. This can be
used by programs for self-monitoring with no kernel involvement in the
inner loop.

Most of the logic is in the generic book3s code, primarily to avoid a
proliferation of PMU callbacks.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-07-01 11:50:10 +10:00
Michael Ellerman 7a7a41f9d5 powerpc/perf: Freeze PMC5/6 if we're not using them
On Power8 we can freeze PMC5 and 6 if we're not using them. Normally they
run all the time.

As noticed by Anshuman, we should unfreeze them when we disable the PMU
as there are legacy tools which expect them to run all the time.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
CC: <stable@vger.kernel.org> [v3.10]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-07-01 11:49:57 +10:00
Michael Neuling b75c100ef2 powerpc/tm: Move TM abort cause codes to uapi
These cause codes are usable by userspace, so let's export to uapi.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> # v3.9
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:23 +10:00
Michael Neuling 6ce6c629fd powerpc/tm: Abort on emulation and alignment faults
If we are emulating an instruction inside an active user transaction that
touches memory, the kernel can't emulate it as it operates in transactional
suspend context.  We need to abort these transactions and send them back to
userspace for the hardware to rollback.

We can service these if the user transaction is in suspend mode, since the
kernel will operate in the same suspend context.

This adds a check to all alignment faults and to specific instruction
emulations (only string instructions for now).  If the user process is in an
active (non-suspended) transaction, we abort the transaction go back to
userspace allowing the HW to roll back the transaction and tell the user of the
failure.  This also adds new tm abort cause codes to report the reason of the
persistent error to the user.

Crappy test case here http://neuling.org/devel/junkcode/aligntm.c

Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> # v3.9
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:22 +10:00
Michael Neuling 35f7097fce powerpc/tm: Make room for hypervisor in abort cause codes
PAPR carves out 0xff-0xe0 for hypervisor use of transactional memory software
abort cause codes.  Unfortunately we don't respect this currently.

Below fixes this to move our cause codes to below this region.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> # 3.9 only
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:22 +10:00
Linus Torvalds 01227a889e Merge tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Gleb Natapov:
 "Highlights of the updates are:

  general:
   - new emulated device API
   - legacy device assignment is now optional
   - irqfd interface is more generic and can be shared between arches

  x86:
   - VMCS shadow support and other nested VMX improvements
   - APIC virtualization and Posted Interrupt hardware support
   - Optimize mmio spte zapping

  ppc:
    - BookE: in-kernel MPIC emulation with irqfd support
    - Book3S: in-kernel XICS emulation (incomplete)
    - Book3S: HV: migration fixes
    - BookE: more debug support preparation
    - BookE: e6500 support

  ARM:
   - reworking of Hyp idmaps

  s390:
   - ioeventfd for virtio-ccw

  And many other bug fixes, cleanups and improvements"

* tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
  kvm: Add compat_ioctl for device control API
  KVM: x86: Account for failing enable_irq_window for NMI window request
  KVM: PPC: Book3S: Add API for in-kernel XICS emulation
  kvm/ppc/mpic: fix missing unlock in set_base_addr()
  kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write
  kvm/ppc/mpic: remove users
  kvm/ppc/mpic: fix mmio region lists when multiple guests used
  kvm/ppc/mpic: remove default routes from documentation
  kvm: KVM_CAP_IOMMU only available with device assignment
  ARM: KVM: iterate over all CPUs for CPU compatibility check
  KVM: ARM: Fix spelling in error message
  ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally
  KVM: ARM: Fix API documentation for ONE_REG encoding
  ARM: KVM: promote vfp_host pointer to generic host cpu context
  ARM: KVM: add architecture specific hook for capabilities
  ARM: KVM: perform HYP initilization for hotplugged CPUs
  ARM: KVM: switch to a dual-step HYP init code
  ARM: KVM: rework HYP page table freeing
  ARM: KVM: enforce maximum size for identity mapped code
  ARM: KVM: move to a KVM provided HYP idmap
  ...
2013-05-05 14:47:31 -07:00
Michael Ellerman 9353374b8e powerpc: Context switch the new EBB SPRs
This context switches the new Event Based Branching (EBB) SPRs.  The three new
SPRs are:
  - Event Based Branch Handler Register (EBBHR)
  - Event Based Branch Return Register (EBBRR)
  - Branch Event Status and Control Register (BESCR)

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-02 10:37:36 +10:00
Michael Neuling 1ddf499e1a powerpc: Turn on the EBB H/FSCR bits
This turns Event Based Branching (EBB) on in the Hypervisor Facility Status and
Control Register (HFSCR) and Facility Status and Control Register (FSCR).

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-02 10:36:55 +10:00
Anshuman Khandual 53b56ca019 powerpc: Setup BHRB instructions facility in HFSCR for POWER8
Make BHRB instructions available in problem and privileged states.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-02 10:35:15 +10:00
Paul Mackerras 4619ac88b7 KVM: PPC: Book3S HV: Improve real-mode handling of external interrupts
This streamlines our handling of external interrupts that come in
while we're in the guest.  First, when waking up a hardware thread
that was napping, we split off the "napping due to H_CEDE" case
earlier, and use the code that handles an external interrupt (0x500)
in the guest to handle that too.  Secondly, the code that handles
those external interrupts now checks if any other thread is exiting
to the host before bouncing an external interrupt to the guest, and
also checks that there is actually an external interrupt pending for
the guest before setting the LPCR MER bit (mediated external request).

This also makes sure that we clear the "ceded" flag when we handle a
wakeup from cede in real mode, and fixes a potential infinite loop
in kvmppc_run_vcpu() which can occur if we ever end up with the ceded
flag set but MSR[EE] off.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:32 +02:00
Michael Ellerman 8f61aa325f powerpc/perf: Add support for SIER
On power8 we have a new SIER (Sampled Instruction Event Register), which
captures information about instructions when we have random sampling
enabled.

Add support for loading the SIER into pt_regs, overloading regs->dar.
Also set the new NO_SIPR flag in regs->result if we don't have SIPR.

Update regs_sihv/sipr() to look for SIPR/SIHV in SIER.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:11:10 +10:00
Michael Ellerman 240686c136 powerpc: Initialise PMU related regs on Power8
For both HV and guest kernels, intialise PMU regs to something sane.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:11:06 +10:00
Michael Neuling 04b418c97f powerpc: Add HFSCR SPR definitions
Add SPR number and bit definitions for the HFSCR (Hypervisor Facility Status
and Control Register).

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 13:03:58 +10:00
Michael Neuling fa759e9b09 powerpc: Add DSCR FSCR register bit definition
This sets the DSCR (Data Stream Control Register) in the FSCR (Facility Status
& Control Register).

Also harmonise TAR (Target Address Register) FSCR bit definition too.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-03-05 16:56:30 +11:00
Linus Torvalds 89f883372f Merge tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Marcelo Tosatti:
 "KVM updates for the 3.9 merge window, including x86 real mode
  emulation fixes, stronger memory slot interface restrictions, mmu_lock
  spinlock hold time reduction, improved handling of large page faults
  on shadow, initial APICv HW acceleration support, s390 channel IO
  based virtio, amongst others"

* tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits)
  Revert "KVM: MMU: lazily drop large spte"
  x86: pvclock kvm: align allocation size to page size
  KVM: nVMX: Remove redundant get_vmcs12 from nested_vmx_exit_handled_msr
  x86 emulator: fix parity calculation for AAD instruction
  KVM: PPC: BookE: Handle alignment interrupts
  booke: Added DBCR4 SPR number
  KVM: PPC: booke: Allow multiple exception types
  KVM: PPC: booke: use vcpu reference from thread_struct
  KVM: Remove user_alloc from struct kvm_memory_slot
  KVM: VMX: disable apicv by default
  KVM: s390: Fix handling of iscs.
  KVM: MMU: cleanup __direct_map
  KVM: MMU: remove pt_access in mmu_set_spte
  KVM: MMU: cleanup mapping-level
  KVM: MMU: lazily drop large spte
  KVM: VMX: cleanup vmx_set_cr0().
  KVM: VMX: add missing exit names to VMX_EXIT_REASONS array
  KVM: VMX: disable SMEP feature when guest is in non-paging mode
  KVM: Remove duplicate text in api.txt
  Revert "KVM: MMU: split kvm_mmu_free_page"
  ...
2013-02-24 13:07:18 -08:00
Michael Neuling 2b0a576d15 powerpc: Add new transactional memory state to the signal context
This adds the new transactional memory archtected state to the signal context
in both 32 and 64 bit.

Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-02-15 17:02:23 +11:00
Michael Neuling 98ae22e15b powerpc: Add helper functions for transactional memory context switching
Here we add the helper functions to be used when context switching.  These
allow us to fully reclaim and recheckpoint a transaction.

We introduce a new paca field called tm_scratch to help us store away register
values when doing the low level tm reclaim register save.

Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-02-15 16:58:52 +11:00
Michael Neuling 97a0aac9b8 powerpc: Register defines for various transactional memory registers
Defines for MSR bits and transactional memory related SPRs TFIAR, TEXASR and
TEXASRU.

Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-02-15 16:58:51 +11:00
Bharat Bhushan ffe129ecd7 KVM: PPC: booke: use vcpu reference from thread_struct
Like other places, use thread_struct to get vcpu reference.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-02-13 12:56:39 +01:00
Ian Munsie 2468dcf641 powerpc: Add support for context switching the TAR register
This patch adds support for enabling and context switching the Target
Address Register in Power8. The TAR is a new special purpose register
that can be used for computed branches with the bctar[l] (branch
conditional to TAR) instruction in the same manner as the count and link
registers.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-02-08 14:05:50 +11:00
Michael Neuling 9422de3e95 powerpc: Hardware breakpoints rewrite to handle non DABR breakpoint registers
This is a rewrite so that we don't assume we are using the DABR throughout the
code.  We now use the arch_hw_breakpoint to store the breakpoint in a generic
manner in the thread_struct, rather than storing the raw DABR value.

The ptrace GET/SET_DEBUGREG interface currently passes the raw DABR in from
userspace.  We keep this functionality, so that future changes (like the POWER8
DAWR), will still fake the DABR to userspace.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-01-10 17:01:44 +11:00
Michael Neuling a8190a59e7 powerpc: Add DAWR/X SPR number definitions
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-01-10 17:01:41 +11:00
Haren Myneni 13e7a8e846 powerpc: Macros for saving/restore PPR
[PATCH 5/6] powerpc: Macros for saving/restore PPR

Several macros are defined for saving and restore user defined PPR value.

Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-01-10 17:01:10 +11:00
Ian Munsie 42d02b81f2 powerpc: Define differences between doorbells on book3e and book3s
There are a few key differences between doorbells on server compared
with embedded that we care about on Linux, namely:

- We have a new msgsndp instruction for directed privileged doorbells.
  msgsnd is used for directed hypervisor doorbells.
- The tag we use in the instruction is the Thread Identification
  Register of the recipient thread (since server doorbells can only
  occur between threads within a single core), and is only 7 bits wide.
- A new message type is introduced for server doorbells (none of the
  existing book3e message types are currently supported on book3s).

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Tested-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-01-10 15:09:05 +11:00
Linus Torvalds 16e024f30c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc update from Benjamin Herrenschmidt:
 "The main highlight is probably some base POWER8 support.  There's more
  to come such as transactional memory support but that will wait for
  the next one.

  Overall it's pretty quiet, or rather I've been pretty poor at picking
  things up from patchwork and reviewing them this time around and Kumar
  no better on the FSL side it seems..."

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (73 commits)
  powerpc+of: Rename and fix OF reconfig notifier error inject module
  powerpc: mpc5200: Add a3m071 board support
  powerpc/512x: don't compile any platform DIU code if the DIU is not enabled
  powerpc/mpc52xx: use module_platform_driver macro
  powerpc+of: Export of_reconfig_notifier_[register,unregister]
  powerpc/dma/raidengine: add raidengine device
  powerpc/iommu/fsl: Add PAMU bypass enable register to ccsr_guts struct
  powerpc/mpc85xx: Change spin table to cached memory
  powerpc/fsl-pci: Add PCI controller ATMU PM support
  powerpc/86xx: fsl_pcibios_fixup_bus requires CONFIG_PCI
  drivers/virt: the Freescale hypervisor driver doesn't need to check MSR[GS]
  powerpc/85xx: p1022ds: Use NULL instead of 0 for pointers
  powerpc: Disable relocation on exceptions when kexecing
  powerpc: Enable relocation on during exceptions at boot
  powerpc: Move get_longbusy_msecs into hvcall.h and remove duplicate function
  powerpc: Add wrappers to enable/disable relocation on exceptions
  powerpc: Add set_mode hcall
  powerpc: Setup relocation on exceptions for bare metal systems
  powerpc: Move initial mfspr LPCR out of __init_LPCR
  powerpc: Add relocation on exception vector handlers
  ...
2012-12-18 09:58:09 -08:00
Paul Mackerras 28c483b62f KVM: PPC: Book3S PR: Fix VSX handling
This fixes various issues in how we were handling the VSX registers
that exist on POWER7 machines.  First, we were running off the end
of the current->thread.fpr[] array.  Ultimately this was because the
vcpu->arch.vsr[] array is sized to be able to store both the FP
registers and the extra VSX registers (i.e. 64 entries), but PR KVM
only uses it for the extra VSX registers (i.e. 32 entries).

Secondly, calling load_up_vsx() from C code is a really bad idea,
because it jumps to fast_exception_return at the end, rather than
returning with a blr instruction.  This was causing it to jump off
to a random location with random register contents, since it was using
the largely uninitialized stack frame created by kvmppc_load_up_vsx.

In fact, it isn't necessary to call either __giveup_vsx or load_up_vsx,
since giveup_fpu and load_up_fpu handle the extra VSX registers as well
as the standard FP registers on machines with VSX.  Also, since VSX
instructions can access the VMX registers and the FP registers as well
as the extra VSX registers, we have to load up the FP and VMX registers
before we can turn on the MSR_VSX bit for the guest.  Conversely, if
we save away any of the VSX or FP registers, we have to turn off MSR_VSX
for the guest.

To handle all this, it is more convenient for a single call to
kvmppc_giveup_ext() to handle all the state saving that needs to be done,
so we make it take a set of MSR bits rather than just one, and the switch
statement becomes a series of if statements.  Similarly kvmppc_handle_ext
needs to be able to load up more than one set of registers.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:34:02 +01:00
Michael Neuling b0302722ee powerpc: Setup relocation on exceptions for bare metal systems
This turns on MMU on execptions via AIL field in the LPCR.

Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 15:08:06 +11:00
Michael Neuling 71e1849724 powerpc: POWER8 cputable entry
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 13:00:45 +11:00
sukadev@linux.vnet.ibm.com e6878835ac powerpc/perf: Sample only if SIAR-Valid bit is set in P7+
powerpc/perf: Sample only if SIAR-Valid bit is set in P7+

On POWER7+ two new bits (mmcra[35] and mmcra[36]) indicate whether the
contents of SIAR and SDAR are valid.

For marked instructions on P7+, we must save the contents of SIAR and
SDAR registers only if these new bits are set.

This code/check for the SIAR-Valid bit is specific to P7+, so rather than
waste a CPU-feature bit use the PVR flag.

Note that Carl Love proposed a similar change for oprofile:

        https://lkml.org/lkml/2012/6/22/309

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-27 12:51:05 +10:00