2011-01-24 15:42:41 +08:00
|
|
|
/*
|
2016-07-08 14:20:49 +08:00
|
|
|
* This file contains idle entry/exit functions for POWER7,
|
|
|
|
* POWER8 and POWER9 CPUs.
|
2011-01-24 15:42:41 +08:00
|
|
|
*
|
|
|
|
* This program is free software; you can redistribute it and/or
|
|
|
|
* modify it under the terms of the GNU General Public License
|
|
|
|
* as published by the Free Software Foundation; either version
|
|
|
|
* 2 of the License, or (at your option) any later version.
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include <linux/threads.h>
|
|
|
|
#include <asm/processor.h>
|
|
|
|
#include <asm/page.h>
|
|
|
|
#include <asm/cputable.h>
|
|
|
|
#include <asm/thread_info.h>
|
|
|
|
#include <asm/ppc_asm.h>
|
|
|
|
#include <asm/asm-offsets.h>
|
|
|
|
#include <asm/ppc-opcode.h>
|
powerpc: Rework lazy-interrupt handling
The current implementation of lazy interrupts handling has some
issues that this tries to address.
We don't do the various workarounds we need to do when re-enabling
interrupts in some cases such as when returning from an interrupt
and thus we may still lose or get delayed decrementer or doorbell
interrupts.
The current scheme also makes it much harder to handle the external
"edge" interrupts provided by some BookE processors when using the
EPR facility (External Proxy) and the Freescale Hypervisor.
Additionally, we tend to keep interrupts hard disabled in a number
of cases, such as decrementer interrupts, external interrupts, or
when a masked decrementer interrupt is pending. This is sub-optimal.
This is an attempt at fixing it all in one go by reworking the way
we do the lazy interrupt disabling from the ground up.
The base idea is to replace the "hard_enabled" field with a
"irq_happened" field in which we store a bit mask of what interrupt
occurred while soft-disabled.
When re-enabling, either via arch_local_irq_restore() or when returning
from an interrupt, we can now decide what to do by testing bits in that
field.
We then implement replaying of the missed interrupts either by
re-using the existing exception frame (in exception exit case) or via
the creation of a new one from an assembly trampoline (in the
arch_local_irq_enable case).
This removes the need to play with the decrementer to try to create
fake interrupts, among others.
In addition, this adds a few refinements:
- We no longer hard disable decrementer interrupts that occur
while soft-disabled. We now simply bump the decrementer back to max
(on BookS) or leave it stopped (on BookE) and continue with hard interrupts
enabled, which means that we'll potentially get better sample quality from
performance monitor interrupts.
- Timer, decrementer and doorbell interrupts now hard-enable
shortly after removing the source of the interrupt, which means
they no longer run entirely hard disabled. Again, this will improve
perf sample quality.
- On Book3E 64-bit, we now make the performance monitor interrupt
act as an NMI like Book3S (the necessary C code for that to work
appear to already be present in the FSL perf code, notably calling
nmi_enter instead of irq_enter). (This also fixes a bug where BookE
perfmon interrupts could clobber r14 ... oops)
- We could make "masked" decrementer interrupts act as NMIs when doing
timer-based perf sampling to improve the sample quality.
Signed-off-by-yet: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
v2:
- Add hard-enable to decrementer, timer and doorbells
- Fix CR clobber in masked irq handling on BookE
- Make embedded perf interrupt act as an NMI
- Add a PACA_HAPPENED_EE_EDGE for use by FSL if they want
to retrigger an interrupt without preventing hard-enable
v3:
- Fix or vs. ori bug on Book3E
- Fix enabling of interrupts for some exceptions on Book3E
v4:
- Fix resend of doorbells on return from interrupt on Book3E
v5:
- Rebased on top of my latest series, which involves some significant
rework of some aspects of the patch.
v6:
- 32-bit compile fix
- more compile fixes with various .config combos
- factor out the asm code to soft-disable interrupts
- remove the C wrapper around preempt_schedule_irq
v7:
- Fix a bug with hard irq state tracking on native power7
2012-03-06 15:27:59 +08:00
|
|
|
#include <asm/hw_irq.h>
|
2012-02-03 08:54:17 +08:00
|
|
|
#include <asm/kvm_book3s_asm.h>
|
2014-02-26 08:08:43 +08:00
|
|
|
#include <asm/opal.h>
|
2014-12-10 02:56:52 +08:00
|
|
|
#include <asm/cpuidle.h>
|
2017-04-19 21:05:44 +08:00
|
|
|
#include <asm/exception-64s.h>
|
2016-03-01 15:29:20 +08:00
|
|
|
#include <asm/book3s/64/mmu-hash.h>
|
2016-07-08 14:20:49 +08:00
|
|
|
#include <asm/mmu.h>
|
2011-01-24 15:42:41 +08:00
|
|
|
|
|
|
|
#undef DEBUG
|
|
|
|
|
2014-12-10 02:56:53 +08:00
|
|
|
/*
|
|
|
|
* Use unused space in the interrupt stack to save and restore
|
|
|
|
* registers for winkle support.
|
|
|
|
*/
|
2017-07-10 14:19:38 +08:00
|
|
|
#define _MMCR0 GPR0
|
2014-12-10 02:56:53 +08:00
|
|
|
#define _SDR1 GPR3
|
2017-05-16 16:49:45 +08:00
|
|
|
#define _PTCR GPR3
|
2014-12-10 02:56:53 +08:00
|
|
|
#define _RPR GPR4
|
|
|
|
#define _SPURR GPR5
|
|
|
|
#define _PURR GPR6
|
|
|
|
#define _TSCR GPR7
|
|
|
|
#define _DSCR GPR8
|
|
|
|
#define _AMOR GPR9
|
|
|
|
#define _WORT GPR10
|
|
|
|
#define _WORC GPR11
|
2017-05-16 16:49:45 +08:00
|
|
|
#define _LPCR GPR12
|
2016-07-08 14:20:49 +08:00
|
|
|
|
powernv: Pass PSSCR value and mask to power9_idle_stop
The power9_idle_stop method currently takes only the requested stop
level as a parameter and picks up the rest of the PSSCR bits from a
hand-coded macro. This is not a very flexible design, especially when
the firmware has the capability to communicate the psscr value and the
mask associated with a particular stop state via device tree.
This patch modifies the power9_idle_stop API to take as parameters the
PSSCR value and the PSSCR mask corresponding to the stop state that
needs to be set. These PSSCR value and mask are respectively obtained
by parsing the "ibm,cpu-idle-state-psscr" and
"ibm,cpu-idle-state-psscr-mask" fields from the device tree.
In addition to this, the patch adds support for handling stop states
for which ESL and EC bits in the PSSCR are zero. As per the
architecture, a wakeup from these stop states resumes execution from
the subsequent instruction as opposed to waking up at the System
Vector.
The older firmware sets only the Requested Level (RL) field in the
psscr and psscr-mask exposed in the device tree. For older firmware
where psscr-mask=0xf, this patch will set the default sane values that
the set for for remaining PSSCR fields (i.e PSLL, MTL, ESL, EC, and
TR). For the new firmware, the patch will validate that the invariants
required by the ISA for the psscr values are maintained by the
firmware.
This skiboot patch that exports fully populated PSSCR values and the
mask for all the stop states can be found here:
https://lists.ozlabs.org/pipermail/skiboot/2016-September/004869.html
[Optimize the number of instructions before entering STOP with
ESL=EC=0, validate the PSSCR values provided by the firimware
maintains the invariants required as per the ISA suggested by Balbir
Singh]
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-25 16:36:28 +08:00
|
|
|
#define PSSCR_EC_ESL_MASK_SHIFTED (PSSCR_EC | PSSCR_ESL) >> 16
|
2014-12-10 02:56:53 +08:00
|
|
|
|
2014-02-26 08:08:25 +08:00
|
|
|
.text
|
|
|
|
|
2016-07-08 14:20:48 +08:00
|
|
|
/*
|
|
|
|
* Used by threads before entering deep idle states. Saves SPRs
|
|
|
|
* in interrupt stack frame
|
|
|
|
*/
|
|
|
|
save_sprs_to_stack:
|
|
|
|
/*
|
|
|
|
* Note all register i.e per-core, per-subcore or per-thread is saved
|
|
|
|
* here since any thread in the core might wake up first
|
|
|
|
*/
|
2016-07-08 14:20:49 +08:00
|
|
|
BEGIN_FTR_SECTION
|
|
|
|
/*
|
|
|
|
* Note - SDR1 is dropped in Power ISA v3. Hence not restoring
|
|
|
|
* SDR1 here
|
|
|
|
*/
|
2017-05-16 16:49:45 +08:00
|
|
|
mfspr r3,SPRN_PTCR
|
|
|
|
std r3,_PTCR(r1)
|
|
|
|
mfspr r3,SPRN_LPCR
|
|
|
|
std r3,_LPCR(r1)
|
2016-07-08 14:20:49 +08:00
|
|
|
FTR_SECTION_ELSE
|
2016-07-08 14:20:48 +08:00
|
|
|
mfspr r3,SPRN_SDR1
|
|
|
|
std r3,_SDR1(r1)
|
2016-07-08 14:20:49 +08:00
|
|
|
ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
|
2016-07-08 14:20:48 +08:00
|
|
|
mfspr r3,SPRN_RPR
|
|
|
|
std r3,_RPR(r1)
|
|
|
|
mfspr r3,SPRN_SPURR
|
|
|
|
std r3,_SPURR(r1)
|
|
|
|
mfspr r3,SPRN_PURR
|
|
|
|
std r3,_PURR(r1)
|
|
|
|
mfspr r3,SPRN_TSCR
|
|
|
|
std r3,_TSCR(r1)
|
|
|
|
mfspr r3,SPRN_DSCR
|
|
|
|
std r3,_DSCR(r1)
|
|
|
|
mfspr r3,SPRN_AMOR
|
|
|
|
std r3,_AMOR(r1)
|
|
|
|
mfspr r3,SPRN_WORT
|
|
|
|
std r3,_WORT(r1)
|
|
|
|
mfspr r3,SPRN_WORC
|
|
|
|
std r3,_WORC(r1)
|
powerpc/powernv: Save/Restore additional SPRs for stop4 cpuidle
The stop4 idle state on POWER9 is a deep idle state which loses
hypervisor resources, but whose latency is low enough that it can be
exposed via cpuidle.
Until now, the deep idle states which lose hypervisor resources (eg:
winkle) were only exposed via CPU-Hotplug. Hence currently on wakeup
from such states, barring a few SPRs which need to be restored to
their older value, rest of the SPRS are reinitialized to their values
corresponding to that at boot time.
When stop4 is used in the context of cpuidle, we want these additional
SPRs to be restored to their older value, to ensure that the context
on the CPU coming back from idle is same as it was before going idle.
In this patch, we define a SPR save area in PACA (since we have used
up the volatile register space in the stack) and on POWER9, we restore
SPRN_PID, SPRN_LDBAR, SPRN_FSCR, SPRN_HFSCR, SPRN_MMCRA, SPRN_MMCR1,
SPRN_MMCR2 to the values they had before entering stop.
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-21 18:41:37 +08:00
|
|
|
/*
|
|
|
|
* On POWER9, there are idle states such as stop4, invoked via cpuidle,
|
|
|
|
* that lose hypervisor resources. In such cases, we need to save
|
|
|
|
* additional SPRs before entering those idle states so that they can
|
|
|
|
* be restored to their older values on wakeup from the idle state.
|
|
|
|
*
|
|
|
|
* On POWER8, the only such deep idle state is winkle which is used
|
|
|
|
* only in the context of CPU-Hotplug, where these additional SPRs are
|
|
|
|
* reinitiazed to a sane value. Hence there is no need to save/restore
|
|
|
|
* these SPRs.
|
|
|
|
*/
|
|
|
|
BEGIN_FTR_SECTION
|
|
|
|
blr
|
|
|
|
END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
|
|
|
|
|
|
|
|
power9_save_additional_sprs:
|
|
|
|
mfspr r3, SPRN_PID
|
|
|
|
mfspr r4, SPRN_LDBAR
|
|
|
|
std r3, STOP_PID(r13)
|
|
|
|
std r4, STOP_LDBAR(r13)
|
2016-07-08 14:20:48 +08:00
|
|
|
|
powerpc/powernv: Save/Restore additional SPRs for stop4 cpuidle
The stop4 idle state on POWER9 is a deep idle state which loses
hypervisor resources, but whose latency is low enough that it can be
exposed via cpuidle.
Until now, the deep idle states which lose hypervisor resources (eg:
winkle) were only exposed via CPU-Hotplug. Hence currently on wakeup
from such states, barring a few SPRs which need to be restored to
their older value, rest of the SPRS are reinitialized to their values
corresponding to that at boot time.
When stop4 is used in the context of cpuidle, we want these additional
SPRs to be restored to their older value, to ensure that the context
on the CPU coming back from idle is same as it was before going idle.
In this patch, we define a SPR save area in PACA (since we have used
up the volatile register space in the stack) and on POWER9, we restore
SPRN_PID, SPRN_LDBAR, SPRN_FSCR, SPRN_HFSCR, SPRN_MMCRA, SPRN_MMCR1,
SPRN_MMCR2 to the values they had before entering stop.
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-21 18:41:37 +08:00
|
|
|
mfspr r3, SPRN_FSCR
|
|
|
|
mfspr r4, SPRN_HFSCR
|
|
|
|
std r3, STOP_FSCR(r13)
|
|
|
|
std r4, STOP_HFSCR(r13)
|
|
|
|
|
|
|
|
mfspr r3, SPRN_MMCRA
|
2017-11-03 12:13:21 +08:00
|
|
|
mfspr r4, SPRN_MMCR0
|
powerpc/powernv: Save/Restore additional SPRs for stop4 cpuidle
The stop4 idle state on POWER9 is a deep idle state which loses
hypervisor resources, but whose latency is low enough that it can be
exposed via cpuidle.
Until now, the deep idle states which lose hypervisor resources (eg:
winkle) were only exposed via CPU-Hotplug. Hence currently on wakeup
from such states, barring a few SPRs which need to be restored to
their older value, rest of the SPRS are reinitialized to their values
corresponding to that at boot time.
When stop4 is used in the context of cpuidle, we want these additional
SPRs to be restored to their older value, to ensure that the context
on the CPU coming back from idle is same as it was before going idle.
In this patch, we define a SPR save area in PACA (since we have used
up the volatile register space in the stack) and on POWER9, we restore
SPRN_PID, SPRN_LDBAR, SPRN_FSCR, SPRN_HFSCR, SPRN_MMCRA, SPRN_MMCR1,
SPRN_MMCR2 to the values they had before entering stop.
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-21 18:41:37 +08:00
|
|
|
std r3, STOP_MMCRA(r13)
|
2017-11-03 12:13:21 +08:00
|
|
|
std r4, _MMCR0(r1)
|
powerpc/powernv: Save/Restore additional SPRs for stop4 cpuidle
The stop4 idle state on POWER9 is a deep idle state which loses
hypervisor resources, but whose latency is low enough that it can be
exposed via cpuidle.
Until now, the deep idle states which lose hypervisor resources (eg:
winkle) were only exposed via CPU-Hotplug. Hence currently on wakeup
from such states, barring a few SPRs which need to be restored to
their older value, rest of the SPRS are reinitialized to their values
corresponding to that at boot time.
When stop4 is used in the context of cpuidle, we want these additional
SPRs to be restored to their older value, to ensure that the context
on the CPU coming back from idle is same as it was before going idle.
In this patch, we define a SPR save area in PACA (since we have used
up the volatile register space in the stack) and on POWER9, we restore
SPRN_PID, SPRN_LDBAR, SPRN_FSCR, SPRN_HFSCR, SPRN_MMCRA, SPRN_MMCR1,
SPRN_MMCR2 to the values they had before entering stop.
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-21 18:41:37 +08:00
|
|
|
|
2017-11-03 12:13:21 +08:00
|
|
|
mfspr r3, SPRN_MMCR1
|
|
|
|
mfspr r4, SPRN_MMCR2
|
|
|
|
std r3, STOP_MMCR1(r13)
|
|
|
|
std r4, STOP_MMCR2(r13)
|
powerpc/powernv: Save/Restore additional SPRs for stop4 cpuidle
The stop4 idle state on POWER9 is a deep idle state which loses
hypervisor resources, but whose latency is low enough that it can be
exposed via cpuidle.
Until now, the deep idle states which lose hypervisor resources (eg:
winkle) were only exposed via CPU-Hotplug. Hence currently on wakeup
from such states, barring a few SPRs which need to be restored to
their older value, rest of the SPRS are reinitialized to their values
corresponding to that at boot time.
When stop4 is used in the context of cpuidle, we want these additional
SPRs to be restored to their older value, to ensure that the context
on the CPU coming back from idle is same as it was before going idle.
In this patch, we define a SPR save area in PACA (since we have used
up the volatile register space in the stack) and on POWER9, we restore
SPRN_PID, SPRN_LDBAR, SPRN_FSCR, SPRN_HFSCR, SPRN_MMCRA, SPRN_MMCR1,
SPRN_MMCR2 to the values they had before entering stop.
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-21 18:41:37 +08:00
|
|
|
blr
|
|
|
|
|
|
|
|
power9_restore_additional_sprs:
|
|
|
|
ld r3,_LPCR(r1)
|
|
|
|
ld r4, STOP_PID(r13)
|
|
|
|
mtspr SPRN_LPCR,r3
|
|
|
|
mtspr SPRN_PID, r4
|
|
|
|
|
|
|
|
ld r3, STOP_LDBAR(r13)
|
|
|
|
ld r4, STOP_FSCR(r13)
|
|
|
|
mtspr SPRN_LDBAR, r3
|
|
|
|
mtspr SPRN_FSCR, r4
|
|
|
|
|
|
|
|
ld r3, STOP_HFSCR(r13)
|
|
|
|
ld r4, STOP_MMCRA(r13)
|
|
|
|
mtspr SPRN_HFSCR, r3
|
|
|
|
mtspr SPRN_MMCRA, r4
|
2017-11-03 12:13:21 +08:00
|
|
|
|
|
|
|
ld r3, _MMCR0(r1)
|
|
|
|
ld r4, STOP_MMCR1(r13)
|
|
|
|
mtspr SPRN_MMCR0, r3
|
|
|
|
mtspr SPRN_MMCR1, r4
|
|
|
|
|
|
|
|
ld r3, STOP_MMCR2(r13)
|
|
|
|
mtspr SPRN_MMCR2, r3
|
2016-07-08 14:20:48 +08:00
|
|
|
blr
|
|
|
|
|
powerpc/powernv: Fix race in updating core_idle_state
core_idle_state is maintained for each core. It uses 0-7 bits to track
whether a thread in the core has entered fastsleep or winkle. 8th bit is
used as a lock bit.
The lock bit is set in these 2 scenarios-
- The thread is first in subcore to wakeup from sleep/winkle.
- If its the last thread in the core about to enter sleep/winkle
While the lock bit is set, if any other thread in the core wakes up, it
loops until the lock bit is cleared before proceeding in the wakeup
path. This helps prevent race conditions w.r.t fastsleep workaround and
prevents threads from switching to process context before core/subcore
resources are restored.
But, in the path to sleep/winkle entry, we currently don't check for
lock-bit. This exposes us to following race when running with subcore
on-
First thread in the subcorea Another thread in the same
waking up core entering sleep/winkle
lwarx r15,0,r14
ori r15,r15,PNV_CORE_IDLE_LOCK_BIT
stwcx. r15,0,r14
[Code to restore subcore state]
lwarx r15,0,r14
[clear thread bit]
stwcx. r15,0,r14
andi. r15,r15,PNV_CORE_IDLE_THREAD_BITS
stw r15,0(r14)
Here, after the thread entering sleep clears its thread bit in
core_idle_state, the value is overwritten by the thread waking up.
In such cases when the core enters fastsleep, code mistakes an idle
thread as running. Because of this, the first thread waking up from
fastsleep which is supposed to resync timebase skips it. So we can
end up having a core with stale timebase value.
This patch fixes the above race by looping on the lock bit even while
entering the idle states.
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Fixes: 7b54e9f213f76 'powernv/powerpc: Add winkle support for offline cpus'
Cc: stable@vger.kernel.org # 3.19+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-07-07 04:09:23 +08:00
|
|
|
/*
|
|
|
|
* Used by threads when the lock bit of core_idle_state is set.
|
|
|
|
* Threads will spin in HMT_LOW until the lock bit is cleared.
|
|
|
|
* r14 - pointer to core_idle_state
|
|
|
|
* r15 - used to load contents of core_idle_state
|
2016-10-21 17:04:17 +08:00
|
|
|
* r9 - used as a temporary variable
|
powerpc/powernv: Fix race in updating core_idle_state
core_idle_state is maintained for each core. It uses 0-7 bits to track
whether a thread in the core has entered fastsleep or winkle. 8th bit is
used as a lock bit.
The lock bit is set in these 2 scenarios-
- The thread is first in subcore to wakeup from sleep/winkle.
- If its the last thread in the core about to enter sleep/winkle
While the lock bit is set, if any other thread in the core wakes up, it
loops until the lock bit is cleared before proceeding in the wakeup
path. This helps prevent race conditions w.r.t fastsleep workaround and
prevents threads from switching to process context before core/subcore
resources are restored.
But, in the path to sleep/winkle entry, we currently don't check for
lock-bit. This exposes us to following race when running with subcore
on-
First thread in the subcorea Another thread in the same
waking up core entering sleep/winkle
lwarx r15,0,r14
ori r15,r15,PNV_CORE_IDLE_LOCK_BIT
stwcx. r15,0,r14
[Code to restore subcore state]
lwarx r15,0,r14
[clear thread bit]
stwcx. r15,0,r14
andi. r15,r15,PNV_CORE_IDLE_THREAD_BITS
stw r15,0(r14)
Here, after the thread entering sleep clears its thread bit in
core_idle_state, the value is overwritten by the thread waking up.
In such cases when the core enters fastsleep, code mistakes an idle
thread as running. Because of this, the first thread waking up from
fastsleep which is supposed to resync timebase skips it. So we can
end up having a core with stale timebase value.
This patch fixes the above race by looping on the lock bit even while
entering the idle states.
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Fixes: 7b54e9f213f76 'powernv/powerpc: Add winkle support for offline cpus'
Cc: stable@vger.kernel.org # 3.19+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-07-07 04:09:23 +08:00
|
|
|
*/
|
|
|
|
|
|
|
|
core_idle_lock_held:
|
|
|
|
HMT_LOW
|
|
|
|
3: lwz r15,0(r14)
|
2017-04-19 21:05:48 +08:00
|
|
|
andis. r15,r15,PNV_CORE_IDLE_LOCK_BIT@h
|
powerpc/powernv: Fix race in updating core_idle_state
core_idle_state is maintained for each core. It uses 0-7 bits to track
whether a thread in the core has entered fastsleep or winkle. 8th bit is
used as a lock bit.
The lock bit is set in these 2 scenarios-
- The thread is first in subcore to wakeup from sleep/winkle.
- If its the last thread in the core about to enter sleep/winkle
While the lock bit is set, if any other thread in the core wakes up, it
loops until the lock bit is cleared before proceeding in the wakeup
path. This helps prevent race conditions w.r.t fastsleep workaround and
prevents threads from switching to process context before core/subcore
resources are restored.
But, in the path to sleep/winkle entry, we currently don't check for
lock-bit. This exposes us to following race when running with subcore
on-
First thread in the subcorea Another thread in the same
waking up core entering sleep/winkle
lwarx r15,0,r14
ori r15,r15,PNV_CORE_IDLE_LOCK_BIT
stwcx. r15,0,r14
[Code to restore subcore state]
lwarx r15,0,r14
[clear thread bit]
stwcx. r15,0,r14
andi. r15,r15,PNV_CORE_IDLE_THREAD_BITS
stw r15,0(r14)
Here, after the thread entering sleep clears its thread bit in
core_idle_state, the value is overwritten by the thread waking up.
In such cases when the core enters fastsleep, code mistakes an idle
thread as running. Because of this, the first thread waking up from
fastsleep which is supposed to resync timebase skips it. So we can
end up having a core with stale timebase value.
This patch fixes the above race by looping on the lock bit even while
entering the idle states.
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Fixes: 7b54e9f213f76 'powernv/powerpc: Add winkle support for offline cpus'
Cc: stable@vger.kernel.org # 3.19+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-07-07 04:09:23 +08:00
|
|
|
bne 3b
|
|
|
|
HMT_MEDIUM
|
|
|
|
lwarx r15,0,r14
|
2017-04-19 21:05:48 +08:00
|
|
|
andis. r9,r15,PNV_CORE_IDLE_LOCK_BIT@h
|
|
|
|
bne- core_idle_lock_held
|
powerpc/powernv: Fix race in updating core_idle_state
core_idle_state is maintained for each core. It uses 0-7 bits to track
whether a thread in the core has entered fastsleep or winkle. 8th bit is
used as a lock bit.
The lock bit is set in these 2 scenarios-
- The thread is first in subcore to wakeup from sleep/winkle.
- If its the last thread in the core about to enter sleep/winkle
While the lock bit is set, if any other thread in the core wakes up, it
loops until the lock bit is cleared before proceeding in the wakeup
path. This helps prevent race conditions w.r.t fastsleep workaround and
prevents threads from switching to process context before core/subcore
resources are restored.
But, in the path to sleep/winkle entry, we currently don't check for
lock-bit. This exposes us to following race when running with subcore
on-
First thread in the subcorea Another thread in the same
waking up core entering sleep/winkle
lwarx r15,0,r14
ori r15,r15,PNV_CORE_IDLE_LOCK_BIT
stwcx. r15,0,r14
[Code to restore subcore state]
lwarx r15,0,r14
[clear thread bit]
stwcx. r15,0,r14
andi. r15,r15,PNV_CORE_IDLE_THREAD_BITS
stw r15,0(r14)
Here, after the thread entering sleep clears its thread bit in
core_idle_state, the value is overwritten by the thread waking up.
In such cases when the core enters fastsleep, code mistakes an idle
thread as running. Because of this, the first thread waking up from
fastsleep which is supposed to resync timebase skips it. So we can
end up having a core with stale timebase value.
This patch fixes the above race by looping on the lock bit even while
entering the idle states.
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Fixes: 7b54e9f213f76 'powernv/powerpc: Add winkle support for offline cpus'
Cc: stable@vger.kernel.org # 3.19+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-07-07 04:09:23 +08:00
|
|
|
blr
|
|
|
|
|
2014-02-26 08:08:25 +08:00
|
|
|
/*
|
|
|
|
* Pass requested state in r3:
|
2016-07-08 14:20:49 +08:00
|
|
|
* r3 - PNV_THREAD_NAP/SLEEP/WINKLE in POWER8
|
2017-06-13 21:05:45 +08:00
|
|
|
* - Requested PSSCR value in POWER9
|
2014-05-23 16:15:26 +08:00
|
|
|
*
|
2017-06-13 21:05:51 +08:00
|
|
|
* Address of idle handler to branch to in realmode in r4
|
2014-02-26 08:08:25 +08:00
|
|
|
*/
|
2017-04-19 21:05:44 +08:00
|
|
|
pnv_powersave_common:
|
2014-02-26 08:08:25 +08:00
|
|
|
/* Use r3 to pass state nap/sleep/winkle */
|
2011-01-24 15:42:41 +08:00
|
|
|
/* NAP is a state loss, we create a regs frame on the
|
|
|
|
* stack, fill it up with the state we care about and
|
|
|
|
* stick a pointer to it in PACAR1. We really only
|
|
|
|
* need to save PC, some CR bits and the NV GPRs,
|
|
|
|
* but for now an interrupt frame will do.
|
|
|
|
*/
|
2017-06-13 21:05:51 +08:00
|
|
|
mtctr r4
|
|
|
|
|
2011-01-24 15:42:41 +08:00
|
|
|
mflr r0
|
|
|
|
std r0,16(r1)
|
|
|
|
stdu r1,-INT_FRAME_SIZE(r1)
|
|
|
|
std r0,_LINK(r1)
|
|
|
|
std r0,_NIP(r1)
|
|
|
|
|
powerpc: Rework lazy-interrupt handling
The current implementation of lazy interrupts handling has some
issues that this tries to address.
We don't do the various workarounds we need to do when re-enabling
interrupts in some cases such as when returning from an interrupt
and thus we may still lose or get delayed decrementer or doorbell
interrupts.
The current scheme also makes it much harder to handle the external
"edge" interrupts provided by some BookE processors when using the
EPR facility (External Proxy) and the Freescale Hypervisor.
Additionally, we tend to keep interrupts hard disabled in a number
of cases, such as decrementer interrupts, external interrupts, or
when a masked decrementer interrupt is pending. This is sub-optimal.
This is an attempt at fixing it all in one go by reworking the way
we do the lazy interrupt disabling from the ground up.
The base idea is to replace the "hard_enabled" field with a
"irq_happened" field in which we store a bit mask of what interrupt
occurred while soft-disabled.
When re-enabling, either via arch_local_irq_restore() or when returning
from an interrupt, we can now decide what to do by testing bits in that
field.
We then implement replaying of the missed interrupts either by
re-using the existing exception frame (in exception exit case) or via
the creation of a new one from an assembly trampoline (in the
arch_local_irq_enable case).
This removes the need to play with the decrementer to try to create
fake interrupts, among others.
In addition, this adds a few refinements:
- We no longer hard disable decrementer interrupts that occur
while soft-disabled. We now simply bump the decrementer back to max
(on BookS) or leave it stopped (on BookE) and continue with hard interrupts
enabled, which means that we'll potentially get better sample quality from
performance monitor interrupts.
- Timer, decrementer and doorbell interrupts now hard-enable
shortly after removing the source of the interrupt, which means
they no longer run entirely hard disabled. Again, this will improve
perf sample quality.
- On Book3E 64-bit, we now make the performance monitor interrupt
act as an NMI like Book3S (the necessary C code for that to work
appear to already be present in the FSL perf code, notably calling
nmi_enter instead of irq_enter). (This also fixes a bug where BookE
perfmon interrupts could clobber r14 ... oops)
- We could make "masked" decrementer interrupts act as NMIs when doing
timer-based perf sampling to improve the sample quality.
Signed-off-by-yet: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
v2:
- Add hard-enable to decrementer, timer and doorbells
- Fix CR clobber in masked irq handling on BookE
- Make embedded perf interrupt act as an NMI
- Add a PACA_HAPPENED_EE_EDGE for use by FSL if they want
to retrigger an interrupt without preventing hard-enable
v3:
- Fix or vs. ori bug on Book3E
- Fix enabling of interrupts for some exceptions on Book3E
v4:
- Fix resend of doorbells on return from interrupt on Book3E
v5:
- Rebased on top of my latest series, which involves some significant
rework of some aspects of the patch.
v6:
- 32-bit compile fix
- more compile fixes with various .config combos
- factor out the asm code to soft-disable interrupts
- remove the C wrapper around preempt_schedule_irq
v7:
- Fix a bug with hard irq state tracking on native power7
2012-03-06 15:27:59 +08:00
|
|
|
/* We haven't lost state ... yet */
|
2011-01-24 15:42:41 +08:00
|
|
|
li r0,0
|
2011-12-06 03:47:26 +08:00
|
|
|
stb r0,PACA_NAPSTATELOST(r13)
|
2011-01-24 15:42:41 +08:00
|
|
|
|
|
|
|
/* Continue saving state */
|
|
|
|
SAVE_GPR(2, r1)
|
|
|
|
SAVE_NVGPRS(r1)
|
2017-06-13 21:05:45 +08:00
|
|
|
mfcr r5
|
|
|
|
std r5,_CCR(r1)
|
2011-01-24 15:42:41 +08:00
|
|
|
std r1,PACAR1(r13)
|
|
|
|
|
2017-08-25 12:30:35 +08:00
|
|
|
BEGIN_FTR_SECTION
|
|
|
|
/*
|
|
|
|
* POWER9 does not require real mode to stop, and presently does not
|
|
|
|
* set hwthread_state for KVM (threads don't share MMU context), so
|
|
|
|
* we can remain in virtual mode for this.
|
|
|
|
*/
|
|
|
|
bctr
|
|
|
|
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
|
powerpc/powernv: Switch off MMU before entering nap/sleep/rvwinkle mode
Currently, when going idle, we set the flag indicating that we are in
nap mode (paca->kvm_hstate.hwthread_state) and then execute the nap
(or sleep or rvwinkle) instruction, all with the MMU on. This is bad
for two reasons: (a) the architecture specifies that those instructions
must be executed with the MMU off, and in fact with only the SF, HV, ME
and possibly RI bits set, and (b) this introduces a race, because as
soon as we set the flag, another thread can switch the MMU to a guest
context. If the race is lost, this thread will typically start looping
on relocation-on ISIs at 0xc...4400.
This fixes it by setting the MSR as required by the architecture before
setting the flag or executing the nap/sleep/rvwinkle instruction.
Cc: stable@vger.kernel.org
[ shreyas@linux.vnet.ibm.com: Edited to handle LE ]
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-10 02:56:50 +08:00
|
|
|
/*
|
2017-08-25 12:30:35 +08:00
|
|
|
* POWER8
|
powerpc/powernv: Switch off MMU before entering nap/sleep/rvwinkle mode
Currently, when going idle, we set the flag indicating that we are in
nap mode (paca->kvm_hstate.hwthread_state) and then execute the nap
(or sleep or rvwinkle) instruction, all with the MMU on. This is bad
for two reasons: (a) the architecture specifies that those instructions
must be executed with the MMU off, and in fact with only the SF, HV, ME
and possibly RI bits set, and (b) this introduces a race, because as
soon as we set the flag, another thread can switch the MMU to a guest
context. If the race is lost, this thread will typically start looping
on relocation-on ISIs at 0xc...4400.
This fixes it by setting the MSR as required by the architecture before
setting the flag or executing the nap/sleep/rvwinkle instruction.
Cc: stable@vger.kernel.org
[ shreyas@linux.vnet.ibm.com: Edited to handle LE ]
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-10 02:56:50 +08:00
|
|
|
* Go to real mode to do the nap, as required by the architecture.
|
|
|
|
* Also, we need to be in real mode before setting hwthread_state,
|
|
|
|
* because as soon as we do that, another thread can switch
|
|
|
|
* the MMU context to the guest.
|
|
|
|
*/
|
2016-07-08 14:20:47 +08:00
|
|
|
LOAD_REG_IMMEDIATE(r7, MSR_IDLE)
|
2017-06-13 21:05:51 +08:00
|
|
|
mtmsrd r7,0
|
|
|
|
bctr
|
powerpc/powernv: Switch off MMU before entering nap/sleep/rvwinkle mode
Currently, when going idle, we set the flag indicating that we are in
nap mode (paca->kvm_hstate.hwthread_state) and then execute the nap
(or sleep or rvwinkle) instruction, all with the MMU on. This is bad
for two reasons: (a) the architecture specifies that those instructions
must be executed with the MMU off, and in fact with only the SF, HV, ME
and possibly RI bits set, and (b) this introduces a race, because as
soon as we set the flag, another thread can switch the MMU to a guest
context. If the race is lost, this thread will typically start looping
on relocation-on ISIs at 0xc...4400.
This fixes it by setting the MSR as required by the architecture before
setting the flag or executing the nap/sleep/rvwinkle instruction.
Cc: stable@vger.kernel.org
[ shreyas@linux.vnet.ibm.com: Edited to handle LE ]
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-10 02:56:50 +08:00
|
|
|
|
2017-08-29 19:36:35 +08:00
|
|
|
/*
|
|
|
|
* This is the sequence required to execute idle instructions, as
|
|
|
|
* specified in ISA v2.07 (and earlier). MSR[IR] and MSR[DR] must be 0.
|
|
|
|
*/
|
2017-08-29 19:40:35 +08:00
|
|
|
#define IDLE_STATE_ENTER_SEQ_NORET(IDLE_INST) \
|
2017-08-29 19:36:35 +08:00
|
|
|
/* Magic NAP/SLEEP/WINKLE mode enter sequence */ \
|
|
|
|
std r0,0(r1); \
|
|
|
|
ptesync; \
|
|
|
|
ld r0,0(r1); \
|
|
|
|
236: cmpd cr0,r0,r0; \
|
|
|
|
bne 236b; \
|
|
|
|
IDLE_INST;
|
|
|
|
|
|
|
|
|
2016-07-08 14:20:46 +08:00
|
|
|
.globl pnv_enter_arch207_idle_mode
|
|
|
|
pnv_enter_arch207_idle_mode:
|
2016-10-21 17:03:05 +08:00
|
|
|
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
|
|
|
|
/* Tell KVM we're entering idle */
|
|
|
|
li r4,KVM_HWTHREAD_IN_IDLE
|
|
|
|
/******************************************************/
|
|
|
|
/* N O T E W E L L ! ! ! N O T E W E L L */
|
|
|
|
/* The following store to HSTATE_HWTHREAD_STATE(r13) */
|
|
|
|
/* MUST occur in real mode, i.e. with the MMU off, */
|
|
|
|
/* and the MMU must stay off until we clear this flag */
|
2017-04-19 21:05:44 +08:00
|
|
|
/* and test HSTATE_HWTHREAD_REQ(r13) in */
|
|
|
|
/* pnv_powersave_wakeup in this file. */
|
2016-10-21 17:03:05 +08:00
|
|
|
/* The reason is that another thread can switch the */
|
|
|
|
/* MMU to a guest context whenever this flag is set */
|
|
|
|
/* to KVM_HWTHREAD_IN_IDLE, and if the MMU was on, */
|
|
|
|
/* that would potentially cause this thread to start */
|
|
|
|
/* executing instructions from guest memory in */
|
|
|
|
/* hypervisor mode, leading to a host crash or data */
|
|
|
|
/* corruption, or worse. */
|
|
|
|
/******************************************************/
|
|
|
|
stb r4,HSTATE_HWTHREAD_STATE(r13)
|
|
|
|
#endif
|
2014-12-10 02:56:52 +08:00
|
|
|
stb r3,PACA_THREAD_IDLE_STATE(r13)
|
2014-12-10 02:56:53 +08:00
|
|
|
cmpwi cr3,r3,PNV_THREAD_SLEEP
|
|
|
|
bge cr3,2f
|
2017-01-25 16:36:25 +08:00
|
|
|
IDLE_STATE_ENTER_SEQ_NORET(PPC_NAP)
|
2014-02-26 08:08:25 +08:00
|
|
|
/* No return */
|
2014-12-10 02:56:52 +08:00
|
|
|
2:
|
|
|
|
/* Sleep or winkle */
|
|
|
|
lbz r7,PACA_THREAD_MASK(r13)
|
|
|
|
ld r14,PACA_CORE_IDLE_STATE_PTR(r13)
|
2017-04-19 21:05:50 +08:00
|
|
|
li r5,0
|
|
|
|
beq cr3,3f
|
|
|
|
lis r5,PNV_CORE_IDLE_WINKLE_COUNT@h
|
|
|
|
3:
|
2014-12-10 02:56:52 +08:00
|
|
|
lwarx_loop1:
|
|
|
|
lwarx r15,0,r14
|
powerpc/powernv: Fix race in updating core_idle_state
core_idle_state is maintained for each core. It uses 0-7 bits to track
whether a thread in the core has entered fastsleep or winkle. 8th bit is
used as a lock bit.
The lock bit is set in these 2 scenarios-
- The thread is first in subcore to wakeup from sleep/winkle.
- If its the last thread in the core about to enter sleep/winkle
While the lock bit is set, if any other thread in the core wakes up, it
loops until the lock bit is cleared before proceeding in the wakeup
path. This helps prevent race conditions w.r.t fastsleep workaround and
prevents threads from switching to process context before core/subcore
resources are restored.
But, in the path to sleep/winkle entry, we currently don't check for
lock-bit. This exposes us to following race when running with subcore
on-
First thread in the subcorea Another thread in the same
waking up core entering sleep/winkle
lwarx r15,0,r14
ori r15,r15,PNV_CORE_IDLE_LOCK_BIT
stwcx. r15,0,r14
[Code to restore subcore state]
lwarx r15,0,r14
[clear thread bit]
stwcx. r15,0,r14
andi. r15,r15,PNV_CORE_IDLE_THREAD_BITS
stw r15,0(r14)
Here, after the thread entering sleep clears its thread bit in
core_idle_state, the value is overwritten by the thread waking up.
In such cases when the core enters fastsleep, code mistakes an idle
thread as running. Because of this, the first thread waking up from
fastsleep which is supposed to resync timebase skips it. So we can
end up having a core with stale timebase value.
This patch fixes the above race by looping on the lock bit even while
entering the idle states.
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Fixes: 7b54e9f213f76 'powernv/powerpc: Add winkle support for offline cpus'
Cc: stable@vger.kernel.org # 3.19+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-07-07 04:09:23 +08:00
|
|
|
|
2017-04-19 21:05:48 +08:00
|
|
|
andis. r9,r15,PNV_CORE_IDLE_LOCK_BIT@h
|
|
|
|
bnel- core_idle_lock_held
|
powerpc/powernv: Fix race in updating core_idle_state
core_idle_state is maintained for each core. It uses 0-7 bits to track
whether a thread in the core has entered fastsleep or winkle. 8th bit is
used as a lock bit.
The lock bit is set in these 2 scenarios-
- The thread is first in subcore to wakeup from sleep/winkle.
- If its the last thread in the core about to enter sleep/winkle
While the lock bit is set, if any other thread in the core wakes up, it
loops until the lock bit is cleared before proceeding in the wakeup
path. This helps prevent race conditions w.r.t fastsleep workaround and
prevents threads from switching to process context before core/subcore
resources are restored.
But, in the path to sleep/winkle entry, we currently don't check for
lock-bit. This exposes us to following race when running with subcore
on-
First thread in the subcorea Another thread in the same
waking up core entering sleep/winkle
lwarx r15,0,r14
ori r15,r15,PNV_CORE_IDLE_LOCK_BIT
stwcx. r15,0,r14
[Code to restore subcore state]
lwarx r15,0,r14
[clear thread bit]
stwcx. r15,0,r14
andi. r15,r15,PNV_CORE_IDLE_THREAD_BITS
stw r15,0(r14)
Here, after the thread entering sleep clears its thread bit in
core_idle_state, the value is overwritten by the thread waking up.
In such cases when the core enters fastsleep, code mistakes an idle
thread as running. Because of this, the first thread waking up from
fastsleep which is supposed to resync timebase skips it. So we can
end up having a core with stale timebase value.
This patch fixes the above race by looping on the lock bit even while
entering the idle states.
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Fixes: 7b54e9f213f76 'powernv/powerpc: Add winkle support for offline cpus'
Cc: stable@vger.kernel.org # 3.19+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-07-07 04:09:23 +08:00
|
|
|
|
2017-04-19 21:05:50 +08:00
|
|
|
add r15,r15,r5 /* Add if winkle */
|
2014-12-10 02:56:52 +08:00
|
|
|
andc r15,r15,r7 /* Clear thread bit */
|
|
|
|
|
2017-04-19 21:05:50 +08:00
|
|
|
andi. r9,r15,PNV_CORE_IDLE_THREAD_BITS
|
2014-12-10 02:56:52 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If cr0 = 0, then current thread is the last thread of the core entering
|
|
|
|
* sleep. Last thread needs to execute the hardware bug workaround code if
|
|
|
|
* required by the platform.
|
|
|
|
* Make the workaround call unconditionally here. The below branch call is
|
|
|
|
* patched out when the idle states are discovered if the platform does not
|
|
|
|
* require it.
|
|
|
|
*/
|
|
|
|
.global pnv_fastsleep_workaround_at_entry
|
|
|
|
pnv_fastsleep_workaround_at_entry:
|
|
|
|
beq fastsleep_workaround_at_entry
|
|
|
|
|
|
|
|
stwcx. r15,0,r14
|
|
|
|
bne- lwarx_loop1
|
|
|
|
isync
|
|
|
|
|
2014-12-10 02:56:53 +08:00
|
|
|
common_enter: /* common code for all the threads entering sleep or winkle */
|
|
|
|
bgt cr3,enter_winkle
|
2017-01-25 16:36:25 +08:00
|
|
|
IDLE_STATE_ENTER_SEQ_NORET(PPC_SLEEP)
|
2014-12-10 02:56:52 +08:00
|
|
|
|
|
|
|
fastsleep_workaround_at_entry:
|
2017-04-19 21:05:48 +08:00
|
|
|
oris r15,r15,PNV_CORE_IDLE_LOCK_BIT@h
|
2014-12-10 02:56:52 +08:00
|
|
|
stwcx. r15,0,r14
|
|
|
|
bne- lwarx_loop1
|
|
|
|
isync
|
|
|
|
|
|
|
|
/* Fast sleep workaround */
|
|
|
|
li r3,1
|
|
|
|
li r4,1
|
2017-02-07 13:03:17 +08:00
|
|
|
bl opal_config_cpu_idle_state
|
2014-12-10 02:56:52 +08:00
|
|
|
|
2017-04-19 21:05:48 +08:00
|
|
|
/* Unlock */
|
|
|
|
xoris r15,r15,PNV_CORE_IDLE_LOCK_BIT@h
|
2014-12-10 02:56:52 +08:00
|
|
|
lwsync
|
2017-04-19 21:05:48 +08:00
|
|
|
stw r15,0(r14)
|
2014-12-10 02:56:52 +08:00
|
|
|
b common_enter
|
|
|
|
|
2014-12-10 02:56:53 +08:00
|
|
|
enter_winkle:
|
2016-07-08 14:20:48 +08:00
|
|
|
bl save_sprs_to_stack
|
|
|
|
|
2017-01-25 16:36:25 +08:00
|
|
|
IDLE_STATE_ENTER_SEQ_NORET(PPC_WINKLE)
|
2012-02-03 08:54:17 +08:00
|
|
|
|
2016-07-08 14:20:49 +08:00
|
|
|
/*
|
powernv: Pass PSSCR value and mask to power9_idle_stop
The power9_idle_stop method currently takes only the requested stop
level as a parameter and picks up the rest of the PSSCR bits from a
hand-coded macro. This is not a very flexible design, especially when
the firmware has the capability to communicate the psscr value and the
mask associated with a particular stop state via device tree.
This patch modifies the power9_idle_stop API to take as parameters the
PSSCR value and the PSSCR mask corresponding to the stop state that
needs to be set. These PSSCR value and mask are respectively obtained
by parsing the "ibm,cpu-idle-state-psscr" and
"ibm,cpu-idle-state-psscr-mask" fields from the device tree.
In addition to this, the patch adds support for handling stop states
for which ESL and EC bits in the PSSCR are zero. As per the
architecture, a wakeup from these stop states resumes execution from
the subsequent instruction as opposed to waking up at the System
Vector.
The older firmware sets only the Requested Level (RL) field in the
psscr and psscr-mask exposed in the device tree. For older firmware
where psscr-mask=0xf, this patch will set the default sane values that
the set for for remaining PSSCR fields (i.e PSLL, MTL, ESL, EC, and
TR). For the new firmware, the patch will validate that the invariants
required by the ISA for the psscr values are maintained by the
firmware.
This skiboot patch that exports fully populated PSSCR values and the
mask for all the stop states can be found here:
https://lists.ozlabs.org/pipermail/skiboot/2016-September/004869.html
[Optimize the number of instructions before entering STOP with
ESL=EC=0, validate the PSSCR values provided by the firimware
maintains the invariants required as per the ISA suggested by Balbir
Singh]
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-25 16:36:28 +08:00
|
|
|
* r3 - PSSCR value corresponding to the requested stop state.
|
2016-07-08 14:20:49 +08:00
|
|
|
*/
|
2017-10-19 12:14:20 +08:00
|
|
|
power_enter_stop:
|
2016-10-21 17:03:05 +08:00
|
|
|
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
|
2017-10-19 12:14:20 +08:00
|
|
|
/* Tell KVM we're entering idle */
|
2016-10-21 17:03:05 +08:00
|
|
|
li r4,KVM_HWTHREAD_IN_IDLE
|
|
|
|
/* DO THIS IN REAL MODE! See comment above. */
|
|
|
|
stb r4,HSTATE_HWTHREAD_STATE(r13)
|
|
|
|
#endif
|
powernv: Pass PSSCR value and mask to power9_idle_stop
The power9_idle_stop method currently takes only the requested stop
level as a parameter and picks up the rest of the PSSCR bits from a
hand-coded macro. This is not a very flexible design, especially when
the firmware has the capability to communicate the psscr value and the
mask associated with a particular stop state via device tree.
This patch modifies the power9_idle_stop API to take as parameters the
PSSCR value and the PSSCR mask corresponding to the stop state that
needs to be set. These PSSCR value and mask are respectively obtained
by parsing the "ibm,cpu-idle-state-psscr" and
"ibm,cpu-idle-state-psscr-mask" fields from the device tree.
In addition to this, the patch adds support for handling stop states
for which ESL and EC bits in the PSSCR are zero. As per the
architecture, a wakeup from these stop states resumes execution from
the subsequent instruction as opposed to waking up at the System
Vector.
The older firmware sets only the Requested Level (RL) field in the
psscr and psscr-mask exposed in the device tree. For older firmware
where psscr-mask=0xf, this patch will set the default sane values that
the set for for remaining PSSCR fields (i.e PSLL, MTL, ESL, EC, and
TR). For the new firmware, the patch will validate that the invariants
required by the ISA for the psscr values are maintained by the
firmware.
This skiboot patch that exports fully populated PSSCR values and the
mask for all the stop states can be found here:
https://lists.ozlabs.org/pipermail/skiboot/2016-September/004869.html
[Optimize the number of instructions before entering STOP with
ESL=EC=0, validate the PSSCR values provided by the firimware
maintains the invariants required as per the ISA suggested by Balbir
Singh]
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-25 16:36:28 +08:00
|
|
|
/*
|
|
|
|
* Check if we are executing the lite variant with ESL=EC=0
|
|
|
|
*/
|
|
|
|
andis. r4,r3,PSSCR_EC_ESL_MASK_SHIFTED
|
|
|
|
clrldi r3,r3,60 /* r3 = Bits[60:63] = Requested Level (RL) */
|
2017-02-27 13:40:07 +08:00
|
|
|
bne .Lhandle_esl_ec_set
|
2017-08-29 19:34:40 +08:00
|
|
|
PPC_STOP
|
powernv: Pass PSSCR value and mask to power9_idle_stop
The power9_idle_stop method currently takes only the requested stop
level as a parameter and picks up the rest of the PSSCR bits from a
hand-coded macro. This is not a very flexible design, especially when
the firmware has the capability to communicate the psscr value and the
mask associated with a particular stop state via device tree.
This patch modifies the power9_idle_stop API to take as parameters the
PSSCR value and the PSSCR mask corresponding to the stop state that
needs to be set. These PSSCR value and mask are respectively obtained
by parsing the "ibm,cpu-idle-state-psscr" and
"ibm,cpu-idle-state-psscr-mask" fields from the device tree.
In addition to this, the patch adds support for handling stop states
for which ESL and EC bits in the PSSCR are zero. As per the
architecture, a wakeup from these stop states resumes execution from
the subsequent instruction as opposed to waking up at the System
Vector.
The older firmware sets only the Requested Level (RL) field in the
psscr and psscr-mask exposed in the device tree. For older firmware
where psscr-mask=0xf, this patch will set the default sane values that
the set for for remaining PSSCR fields (i.e PSLL, MTL, ESL, EC, and
TR). For the new firmware, the patch will validate that the invariants
required by the ISA for the psscr values are maintained by the
firmware.
This skiboot patch that exports fully populated PSSCR values and the
mask for all the stop states can be found here:
https://lists.ozlabs.org/pipermail/skiboot/2016-September/004869.html
[Optimize the number of instructions before entering STOP with
ESL=EC=0, validate the PSSCR values provided by the firimware
maintains the invariants required as per the ISA suggested by Balbir
Singh]
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-25 16:36:28 +08:00
|
|
|
li r3,0 /* Since we didn't lose state, return 0 */
|
powerpc/powernv: Provide a way to force a core into SMT4 mode
POWER9 processors up to and including "Nimbus" v2.2 have hardware
bugs relating to transactional memory and thread reconfiguration.
One of these bugs has a workaround which is to get the core into
SMT4 state temporarily. This workaround is only needed when
running bare-metal.
This patch provides a function which gets the core into SMT4 mode
by preventing threads from going to a stop state, and waking up
those which are already in a stop state. Once at least 3 threads
are not in a stop state, the core will be in SMT4 and we can
continue.
To do this, we add a "dont_stop" flag to the paca to tell the
thread not to go into a stop state. If this flag is set,
power9_idle_stop() just returns immediately with a return value
of 0. The pnv_power9_force_smt4_catch() function does the following:
1. Set the dont_stop flag for each thread in the core, except
ourselves (in fact we use an atomic_inc() in case more than
one thread is calling this function concurrently).
2. See how many threads are awake, indicated by their
requested_psscr field in the paca being 0. If this is at
least 3, skip to step 5.
3. Send a doorbell interrupt to each thread that was seen as
being in a stop state in step 2.
4. Until at least 3 threads are awake, scan the threads to which
we sent a doorbell interrupt and check if they are awake now.
This relies on the following properties:
- Once dont_stop is non-zero, requested_psccr can't go from zero to
non-zero, except transiently (and without the thread doing stop).
- requested_psscr being zero guarantees that the thread isn't in
a state-losing stop state where thread reconfiguration could occur.
- Doing stop with a PSSCR value of 0 won't be a state-losing stop
and thus won't allow thread reconfiguration.
- Once threads_per_core/2 + 1 (i.e. 3) threads are awake, the core
must be in SMT4 mode, since SMT modes are powers of 2.
This does add a sync to power9_idle_stop(), which is necessary to
provide the correct ordering between setting requested_psscr and
checking dont_stop. The overhead of the sync should be unnoticeable
compared to the latency of going into and out of a stop state.
Because some objected to incurring this extra latency on systems where
the XER[SO] bug is not relevant, I have put the test in
power9_idle_stop inside a feature section. This means that
pnv_power9_force_smt4_catch() WILL NOT WORK correctly on systems
without the CPU_FTR_P9_TM_XER_SO_BUG feature bit set, and will
probably hang the system.
In order to cater for uses where the caller has an operation that
has to be done while the core is in SMT4, the core continues to be
kept in SMT4 after pnv_power9_force_smt4_catch() function returns,
until the pnv_power9_force_smt4_release() function is called.
It undoes the effect of step 1 above and allows the other threads
to go into a stop state.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-21 18:32:00 +08:00
|
|
|
std r3, PACA_REQ_PSSCR(r13)
|
2017-06-28 09:16:49 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* pnv_wakeup_noloss() expects r12 to contain the SRR1 value so
|
|
|
|
* it can determine if the wakeup reason is an HMI in
|
|
|
|
* CHECK_HMI_INTERRUPT.
|
|
|
|
*
|
|
|
|
* However, when we wakeup with ESL=0, SRR1 will not contain the wakeup
|
|
|
|
* reason, so there is no point setting r12 to SRR1.
|
|
|
|
*
|
|
|
|
* Further, we clear r12 here, so that we don't accidentally enter the
|
|
|
|
* HMI in pnv_wakeup_noloss() if the value of r12[42:45] == WAKE_HMI.
|
|
|
|
*/
|
|
|
|
li r12, 0
|
powernv: Pass PSSCR value and mask to power9_idle_stop
The power9_idle_stop method currently takes only the requested stop
level as a parameter and picks up the rest of the PSSCR bits from a
hand-coded macro. This is not a very flexible design, especially when
the firmware has the capability to communicate the psscr value and the
mask associated with a particular stop state via device tree.
This patch modifies the power9_idle_stop API to take as parameters the
PSSCR value and the PSSCR mask corresponding to the stop state that
needs to be set. These PSSCR value and mask are respectively obtained
by parsing the "ibm,cpu-idle-state-psscr" and
"ibm,cpu-idle-state-psscr-mask" fields from the device tree.
In addition to this, the patch adds support for handling stop states
for which ESL and EC bits in the PSSCR are zero. As per the
architecture, a wakeup from these stop states resumes execution from
the subsequent instruction as opposed to waking up at the System
Vector.
The older firmware sets only the Requested Level (RL) field in the
psscr and psscr-mask exposed in the device tree. For older firmware
where psscr-mask=0xf, this patch will set the default sane values that
the set for for remaining PSSCR fields (i.e PSLL, MTL, ESL, EC, and
TR). For the new firmware, the patch will validate that the invariants
required by the ISA for the psscr values are maintained by the
firmware.
This skiboot patch that exports fully populated PSSCR values and the
mask for all the stop states can be found here:
https://lists.ozlabs.org/pipermail/skiboot/2016-September/004869.html
[Optimize the number of instructions before entering STOP with
ESL=EC=0, validate the PSSCR values provided by the firimware
maintains the invariants required as per the ISA suggested by Balbir
Singh]
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-25 16:36:28 +08:00
|
|
|
b pnv_wakeup_noloss
|
2017-02-27 13:40:07 +08:00
|
|
|
|
|
|
|
.Lhandle_esl_ec_set:
|
2017-11-03 12:13:21 +08:00
|
|
|
BEGIN_FTR_SECTION
|
2017-07-10 14:19:38 +08:00
|
|
|
/*
|
2017-11-15 11:25:42 +08:00
|
|
|
* POWER9 DD2.0 or earlier can incorrectly set PMAO when waking up after
|
|
|
|
* a state-loss idle. Saving and restoring MMCR0 over idle is a
|
2017-07-10 14:19:38 +08:00
|
|
|
* workaround.
|
|
|
|
*/
|
|
|
|
mfspr r4,SPRN_MMCR0
|
|
|
|
std r4,_MMCR0(r1)
|
2017-11-15 11:25:42 +08:00
|
|
|
END_FTR_SECTION_IFCLR(CPU_FTR_POWER9_DD2_1)
|
2017-07-10 14:19:38 +08:00
|
|
|
|
2016-07-08 14:20:49 +08:00
|
|
|
/*
|
|
|
|
* Check if the requested state is a deep idle state.
|
|
|
|
*/
|
2017-02-27 13:40:07 +08:00
|
|
|
LOAD_REG_ADDRBASE(r5,pnv_first_deep_stop_state)
|
2016-07-08 14:20:49 +08:00
|
|
|
ld r4,ADDROFF(pnv_first_deep_stop_state)(r5)
|
|
|
|
cmpd r3,r4
|
2017-02-27 13:40:07 +08:00
|
|
|
bge .Lhandle_deep_stop
|
2017-08-29 19:34:40 +08:00
|
|
|
PPC_STOP /* Does not return (system reset interrupt) */
|
|
|
|
|
2017-02-27 13:40:07 +08:00
|
|
|
.Lhandle_deep_stop:
|
2016-07-08 14:20:49 +08:00
|
|
|
/*
|
|
|
|
* Entering deep idle state.
|
|
|
|
* Clear thread bit in PACA_CORE_IDLE_STATE, save SPRs to
|
|
|
|
* stack and enter stop
|
|
|
|
*/
|
|
|
|
lbz r7,PACA_THREAD_MASK(r13)
|
|
|
|
ld r14,PACA_CORE_IDLE_STATE_PTR(r13)
|
|
|
|
|
|
|
|
lwarx_loop_stop:
|
|
|
|
lwarx r15,0,r14
|
2017-04-19 21:05:48 +08:00
|
|
|
andis. r9,r15,PNV_CORE_IDLE_LOCK_BIT@h
|
|
|
|
bnel- core_idle_lock_held
|
2016-07-08 14:20:49 +08:00
|
|
|
andc r15,r15,r7 /* Clear thread bit */
|
|
|
|
|
|
|
|
stwcx. r15,0,r14
|
|
|
|
bne- lwarx_loop_stop
|
|
|
|
isync
|
|
|
|
|
|
|
|
bl save_sprs_to_stack
|
|
|
|
|
2017-08-29 19:34:40 +08:00
|
|
|
PPC_STOP /* Does not return (system reset interrupt) */
|
2016-07-08 14:20:49 +08:00
|
|
|
|
2017-06-13 21:05:45 +08:00
|
|
|
/*
|
|
|
|
* Entered with MSR[EE]=0 and no soft-masked interrupts pending.
|
|
|
|
* r3 contains desired idle state (PNV_THREAD_NAP/SLEEP/WINKLE).
|
|
|
|
*/
|
|
|
|
_GLOBAL(power7_idle_insn)
|
2014-02-26 08:08:25 +08:00
|
|
|
/* Now check if user or arch enabled NAP mode */
|
2017-06-13 21:05:45 +08:00
|
|
|
LOAD_REG_ADDR(r4, pnv_enter_arch207_idle_mode)
|
2016-07-08 14:20:46 +08:00
|
|
|
b pnv_powersave_common
|
2014-12-10 02:56:53 +08:00
|
|
|
|
2014-07-29 21:10:13 +08:00
|
|
|
#define CHECK_HMI_INTERRUPT \
|
|
|
|
BEGIN_FTR_SECTION_NESTED(66); \
|
2017-06-13 21:05:51 +08:00
|
|
|
rlwinm r0,r12,45-31,0xf; /* extract wake reason field (P8) */ \
|
2014-07-29 21:10:13 +08:00
|
|
|
FTR_SECTION_ELSE_NESTED(66); \
|
2017-06-13 21:05:51 +08:00
|
|
|
rlwinm r0,r12,45-31,0xe; /* P7 wake reason field is 3 bits */ \
|
2014-07-29 21:10:13 +08:00
|
|
|
ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66); \
|
|
|
|
cmpwi r0,0xa; /* Hypervisor maintenance ? */ \
|
2017-06-13 21:05:52 +08:00
|
|
|
bne+ 20f; \
|
2014-07-29 21:10:13 +08:00
|
|
|
/* Invoke opal call to handle hmi */ \
|
|
|
|
ld r2,PACATOC(r13); \
|
|
|
|
ld r1,PACAR1(r13); \
|
|
|
|
std r3,ORIG_GPR3(r1); /* Save original r3 */ \
|
KVM: PPC: Book3S HV: Fix TB corruption in guest exit path on HMI interrupt
When a guest is assigned to a core it converts the host Timebase (TB)
into guest TB by adding guest timebase offset before entering into
guest. During guest exit it restores the guest TB to host TB. This means
under certain conditions (Guest migration) host TB and guest TB can differ.
When we get an HMI for TB related issues the opal HMI handler would
try fixing errors and restore the correct host TB value. With no guest
running, we don't have any issues. But with guest running on the core
we run into TB corruption issues.
If we get an HMI while in the guest, the current HMI handler invokes opal
hmi handler before forcing guest to exit. The guest exit path subtracts
the guest TB offset from the current TB value which may have already
been restored with host value by opal hmi handler. This leads to incorrect
host and guest TB values.
With split-core, things become more complex. With split-core, TB also gets
split and each subcore gets its own TB register. When a hmi handler fixes
a TB error and restores the TB value, it affects all the TB values of
sibling subcores on the same core. On TB errors all the thread in the core
gets HMI. With existing code, the individual threads call opal hmi handle
independently which can easily throw TB out of sync if we have guest
running on subcores. Hence we will need to co-ordinate with all the
threads before making opal hmi handler call followed by TB resync.
This patch introduces a sibling subcore state structure (shared by all
threads in the core) in paca which holds information about whether sibling
subcores are in Guest mode or host mode. An array in_guest[] of size
MAX_SUBCORE_PER_CORE=4 is used to maintain the state of each subcore.
The subcore id is used as index into in_guest[] array. Only primary
thread entering/exiting the guest is responsible to set/unset its
designated array element.
On TB error, we get HMI interrupt on every thread on the core. Upon HMI,
this patch will now force guest to vacate the core/subcore. Primary
thread from each subcore will then turn off its respective bit
from the above bitmap during the guest exit path just after the
guest->host partition switch is complete.
All other threads that have just exited the guest OR were already in host
will wait until all other subcores clears their respective bit.
Once all the subcores turn off their respective bit, all threads will
will make call to opal hmi handler.
It is not necessary that opal hmi handler would resync the TB value for
every HMI interrupts. It would do so only for the HMI caused due to
TB errors. For rest, it would not touch TB value. Hence to make things
simpler, primary thread would call TB resync explicitly once for each
core immediately after opal hmi handler instead of subtracting guest
offset from TB. TB resync call will restore the TB with host value.
Thus we can be sure about the TB state.
One of the primary threads exiting the guest will take up the
responsibility of calling TB resync. It will use one of the top bits
(bit 63) from subcore state flags bitmap to make the decision. The first
primary thread (among the subcores) that is able to set the bit will
have to call the TB resync. Rest all other threads will wait until TB
resync is complete. Once TB resync is complete all threads will then
proceed.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-05-15 12:14:26 +08:00
|
|
|
li r3,0; /* NULL argument */ \
|
|
|
|
bl hmi_exception_realmode; \
|
|
|
|
nop; \
|
2014-07-29 21:10:13 +08:00
|
|
|
ld r3,ORIG_GPR3(r1); /* Restore original r3 */ \
|
|
|
|
20: nop;
|
|
|
|
|
2016-07-08 14:20:49 +08:00
|
|
|
/*
|
2017-06-13 21:05:45 +08:00
|
|
|
* Entered with MSR[EE]=0 and no soft-masked interrupts pending.
|
|
|
|
* r3 contains desired PSSCR register value.
|
2016-07-08 14:20:49 +08:00
|
|
|
*/
|
|
|
|
_GLOBAL(power9_idle_stop)
|
powerpc/powernv: Provide a way to force a core into SMT4 mode
POWER9 processors up to and including "Nimbus" v2.2 have hardware
bugs relating to transactional memory and thread reconfiguration.
One of these bugs has a workaround which is to get the core into
SMT4 state temporarily. This workaround is only needed when
running bare-metal.
This patch provides a function which gets the core into SMT4 mode
by preventing threads from going to a stop state, and waking up
those which are already in a stop state. Once at least 3 threads
are not in a stop state, the core will be in SMT4 and we can
continue.
To do this, we add a "dont_stop" flag to the paca to tell the
thread not to go into a stop state. If this flag is set,
power9_idle_stop() just returns immediately with a return value
of 0. The pnv_power9_force_smt4_catch() function does the following:
1. Set the dont_stop flag for each thread in the core, except
ourselves (in fact we use an atomic_inc() in case more than
one thread is calling this function concurrently).
2. See how many threads are awake, indicated by their
requested_psscr field in the paca being 0. If this is at
least 3, skip to step 5.
3. Send a doorbell interrupt to each thread that was seen as
being in a stop state in step 2.
4. Until at least 3 threads are awake, scan the threads to which
we sent a doorbell interrupt and check if they are awake now.
This relies on the following properties:
- Once dont_stop is non-zero, requested_psccr can't go from zero to
non-zero, except transiently (and without the thread doing stop).
- requested_psscr being zero guarantees that the thread isn't in
a state-losing stop state where thread reconfiguration could occur.
- Doing stop with a PSSCR value of 0 won't be a state-losing stop
and thus won't allow thread reconfiguration.
- Once threads_per_core/2 + 1 (i.e. 3) threads are awake, the core
must be in SMT4 mode, since SMT modes are powers of 2.
This does add a sync to power9_idle_stop(), which is necessary to
provide the correct ordering between setting requested_psscr and
checking dont_stop. The overhead of the sync should be unnoticeable
compared to the latency of going into and out of a stop state.
Because some objected to incurring this extra latency on systems where
the XER[SO] bug is not relevant, I have put the test in
power9_idle_stop inside a feature section. This means that
pnv_power9_force_smt4_catch() WILL NOT WORK correctly on systems
without the CPU_FTR_P9_TM_XER_SO_BUG feature bit set, and will
probably hang the system.
In order to cater for uses where the caller has an operation that
has to be done while the core is in SMT4, the core continues to be
kept in SMT4 after pnv_power9_force_smt4_catch() function returns,
until the pnv_power9_force_smt4_release() function is called.
It undoes the effect of step 1 above and allows the other threads
to go into a stop state.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-21 18:32:00 +08:00
|
|
|
BEGIN_FTR_SECTION
|
|
|
|
lwz r5, PACA_DONT_STOP(r13)
|
|
|
|
cmpwi r5, 0
|
|
|
|
bne 1f
|
2017-05-16 16:49:47 +08:00
|
|
|
std r3, PACA_REQ_PSSCR(r13)
|
powerpc/powernv: Provide a way to force a core into SMT4 mode
POWER9 processors up to and including "Nimbus" v2.2 have hardware
bugs relating to transactional memory and thread reconfiguration.
One of these bugs has a workaround which is to get the core into
SMT4 state temporarily. This workaround is only needed when
running bare-metal.
This patch provides a function which gets the core into SMT4 mode
by preventing threads from going to a stop state, and waking up
those which are already in a stop state. Once at least 3 threads
are not in a stop state, the core will be in SMT4 and we can
continue.
To do this, we add a "dont_stop" flag to the paca to tell the
thread not to go into a stop state. If this flag is set,
power9_idle_stop() just returns immediately with a return value
of 0. The pnv_power9_force_smt4_catch() function does the following:
1. Set the dont_stop flag for each thread in the core, except
ourselves (in fact we use an atomic_inc() in case more than
one thread is calling this function concurrently).
2. See how many threads are awake, indicated by their
requested_psscr field in the paca being 0. If this is at
least 3, skip to step 5.
3. Send a doorbell interrupt to each thread that was seen as
being in a stop state in step 2.
4. Until at least 3 threads are awake, scan the threads to which
we sent a doorbell interrupt and check if they are awake now.
This relies on the following properties:
- Once dont_stop is non-zero, requested_psccr can't go from zero to
non-zero, except transiently (and without the thread doing stop).
- requested_psscr being zero guarantees that the thread isn't in
a state-losing stop state where thread reconfiguration could occur.
- Doing stop with a PSSCR value of 0 won't be a state-losing stop
and thus won't allow thread reconfiguration.
- Once threads_per_core/2 + 1 (i.e. 3) threads are awake, the core
must be in SMT4 mode, since SMT modes are powers of 2.
This does add a sync to power9_idle_stop(), which is necessary to
provide the correct ordering between setting requested_psscr and
checking dont_stop. The overhead of the sync should be unnoticeable
compared to the latency of going into and out of a stop state.
Because some objected to incurring this extra latency on systems where
the XER[SO] bug is not relevant, I have put the test in
power9_idle_stop inside a feature section. This means that
pnv_power9_force_smt4_catch() WILL NOT WORK correctly on systems
without the CPU_FTR_P9_TM_XER_SO_BUG feature bit set, and will
probably hang the system.
In order to cater for uses where the caller has an operation that
has to be done while the core is in SMT4, the core continues to be
kept in SMT4 after pnv_power9_force_smt4_catch() function returns,
until the pnv_power9_force_smt4_release() function is called.
It undoes the effect of step 1 above and allows the other threads
to go into a stop state.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-21 18:32:00 +08:00
|
|
|
sync
|
|
|
|
lwz r5, PACA_DONT_STOP(r13)
|
|
|
|
cmpwi r5, 0
|
|
|
|
bne 1f
|
|
|
|
END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_XER_SO_BUG)
|
powernv: Pass PSSCR value and mask to power9_idle_stop
The power9_idle_stop method currently takes only the requested stop
level as a parameter and picks up the rest of the PSSCR bits from a
hand-coded macro. This is not a very flexible design, especially when
the firmware has the capability to communicate the psscr value and the
mask associated with a particular stop state via device tree.
This patch modifies the power9_idle_stop API to take as parameters the
PSSCR value and the PSSCR mask corresponding to the stop state that
needs to be set. These PSSCR value and mask are respectively obtained
by parsing the "ibm,cpu-idle-state-psscr" and
"ibm,cpu-idle-state-psscr-mask" fields from the device tree.
In addition to this, the patch adds support for handling stop states
for which ESL and EC bits in the PSSCR are zero. As per the
architecture, a wakeup from these stop states resumes execution from
the subsequent instruction as opposed to waking up at the System
Vector.
The older firmware sets only the Requested Level (RL) field in the
psscr and psscr-mask exposed in the device tree. For older firmware
where psscr-mask=0xf, this patch will set the default sane values that
the set for for remaining PSSCR fields (i.e PSLL, MTL, ESL, EC, and
TR). For the new firmware, the patch will validate that the invariants
required by the ISA for the psscr values are maintained by the
firmware.
This skiboot patch that exports fully populated PSSCR values and the
mask for all the stop states can be found here:
https://lists.ozlabs.org/pipermail/skiboot/2016-September/004869.html
[Optimize the number of instructions before entering STOP with
ESL=EC=0, validate the PSSCR values provided by the firimware
maintains the invariants required as per the ISA suggested by Balbir
Singh]
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-25 16:36:28 +08:00
|
|
|
mtspr SPRN_PSSCR,r3
|
2017-06-13 21:05:45 +08:00
|
|
|
LOAD_REG_ADDR(r4,power_enter_stop)
|
2016-07-08 14:20:49 +08:00
|
|
|
b pnv_powersave_common
|
|
|
|
/* No return */
|
powerpc/powernv: Provide a way to force a core into SMT4 mode
POWER9 processors up to and including "Nimbus" v2.2 have hardware
bugs relating to transactional memory and thread reconfiguration.
One of these bugs has a workaround which is to get the core into
SMT4 state temporarily. This workaround is only needed when
running bare-metal.
This patch provides a function which gets the core into SMT4 mode
by preventing threads from going to a stop state, and waking up
those which are already in a stop state. Once at least 3 threads
are not in a stop state, the core will be in SMT4 and we can
continue.
To do this, we add a "dont_stop" flag to the paca to tell the
thread not to go into a stop state. If this flag is set,
power9_idle_stop() just returns immediately with a return value
of 0. The pnv_power9_force_smt4_catch() function does the following:
1. Set the dont_stop flag for each thread in the core, except
ourselves (in fact we use an atomic_inc() in case more than
one thread is calling this function concurrently).
2. See how many threads are awake, indicated by their
requested_psscr field in the paca being 0. If this is at
least 3, skip to step 5.
3. Send a doorbell interrupt to each thread that was seen as
being in a stop state in step 2.
4. Until at least 3 threads are awake, scan the threads to which
we sent a doorbell interrupt and check if they are awake now.
This relies on the following properties:
- Once dont_stop is non-zero, requested_psccr can't go from zero to
non-zero, except transiently (and without the thread doing stop).
- requested_psscr being zero guarantees that the thread isn't in
a state-losing stop state where thread reconfiguration could occur.
- Doing stop with a PSSCR value of 0 won't be a state-losing stop
and thus won't allow thread reconfiguration.
- Once threads_per_core/2 + 1 (i.e. 3) threads are awake, the core
must be in SMT4 mode, since SMT modes are powers of 2.
This does add a sync to power9_idle_stop(), which is necessary to
provide the correct ordering between setting requested_psscr and
checking dont_stop. The overhead of the sync should be unnoticeable
compared to the latency of going into and out of a stop state.
Because some objected to incurring this extra latency on systems where
the XER[SO] bug is not relevant, I have put the test in
power9_idle_stop inside a feature section. This means that
pnv_power9_force_smt4_catch() WILL NOT WORK correctly on systems
without the CPU_FTR_P9_TM_XER_SO_BUG feature bit set, and will
probably hang the system.
In order to cater for uses where the caller has an operation that
has to be done while the core is in SMT4, the core continues to be
kept in SMT4 after pnv_power9_force_smt4_catch() function returns,
until the pnv_power9_force_smt4_release() function is called.
It undoes the effect of step 1 above and allows the other threads
to go into a stop state.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-21 18:32:00 +08:00
|
|
|
1:
|
|
|
|
/*
|
|
|
|
* We get here when TM / thread reconfiguration bug workaround
|
|
|
|
* code wants to get the CPU into SMT4 mode, and therefore
|
|
|
|
* we are being asked not to stop.
|
|
|
|
*/
|
|
|
|
li r3, 0
|
|
|
|
std r3, PACA_REQ_PSSCR(r13)
|
|
|
|
blr /* return 0 for wakeup cause / SRR1 value */
|
2017-03-22 23:04:17 +08:00
|
|
|
|
2016-07-08 14:20:44 +08:00
|
|
|
/*
|
2017-03-22 23:04:17 +08:00
|
|
|
* On waking up from stop 0,1,2 with ESL=1 on POWER9 DD1,
|
|
|
|
* HSPRG0 will be set to the HSPRG0 value of one of the
|
|
|
|
* threads in this core. Thus the value we have in r13
|
|
|
|
* may not be this thread's paca pointer.
|
|
|
|
*
|
|
|
|
* Fortunately, the TIR remains invariant. Since this thread's
|
|
|
|
* paca pointer is recorded in all its sibling's paca, we can
|
|
|
|
* correctly recover this thread's paca pointer if we
|
|
|
|
* know the index of this thread in the core.
|
|
|
|
*
|
|
|
|
* This index can be obtained from the TIR.
|
2016-07-08 14:20:44 +08:00
|
|
|
*
|
2017-03-22 23:04:17 +08:00
|
|
|
* i.e, thread's position in the core = TIR.
|
|
|
|
* If this value is i, then this thread's paca is
|
|
|
|
* paca->thread_sibling_pacas[i].
|
|
|
|
*/
|
|
|
|
power9_dd1_recover_paca:
|
|
|
|
mfspr r4, SPRN_TIR
|
|
|
|
/*
|
|
|
|
* Since each entry in thread_sibling_pacas is 8 bytes
|
|
|
|
* we need to left-shift by 3 bits. Thus r4 = i * 8
|
|
|
|
*/
|
|
|
|
sldi r4, r4, 3
|
|
|
|
/* Get &paca->thread_sibling_pacas[0] in r5 */
|
|
|
|
ld r5, PACA_SIBLING_PACA_PTRS(r13)
|
|
|
|
/* Load paca->thread_sibling_pacas[i] into r13 */
|
|
|
|
ldx r13, r4, r5
|
|
|
|
SET_PACA(r13)
|
|
|
|
/*
|
|
|
|
* Indicate that we have lost NVGPR state
|
|
|
|
* which needs to be restored from the stack.
|
|
|
|
*/
|
|
|
|
li r3, 1
|
2017-05-12 17:22:06 +08:00
|
|
|
stb r3,PACA_NAPSTATELOST(r13)
|
2017-03-22 23:04:17 +08:00
|
|
|
blr
|
|
|
|
|
2017-04-19 21:05:47 +08:00
|
|
|
/*
|
|
|
|
* Called from machine check handler for powersave wakeups.
|
|
|
|
* Low level machine check processing has already been done. Now just
|
|
|
|
* go through the wake up path to get everything in order.
|
|
|
|
*
|
|
|
|
* r3 - The original SRR1 value.
|
|
|
|
* Original SRR[01] have been clobbered.
|
|
|
|
* MSR_RI is clear.
|
|
|
|
*/
|
|
|
|
.global pnv_powersave_wakeup_mce
|
|
|
|
pnv_powersave_wakeup_mce:
|
|
|
|
/* Set cr3 for pnv_powersave_wakeup */
|
|
|
|
rlwinm r11,r3,47-31,30,31
|
|
|
|
cmpwi cr3,r11,2
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Now put the original SRR1 with SRR1_WAKEMCE_RESVD as the wake
|
2017-06-13 21:05:51 +08:00
|
|
|
* reason into r12, which allows reuse of the system reset wakeup
|
2017-04-19 21:05:47 +08:00
|
|
|
* code without being mistaken for another type of wakeup.
|
|
|
|
*/
|
2017-06-13 21:05:51 +08:00
|
|
|
oris r12,r3,SRR1_WAKEMCE_RESVD@h
|
2017-04-19 21:05:47 +08:00
|
|
|
|
|
|
|
b pnv_powersave_wakeup
|
|
|
|
|
2017-04-19 21:05:45 +08:00
|
|
|
/*
|
|
|
|
* Called from reset vector for powersave wakeups.
|
2016-07-08 14:20:44 +08:00
|
|
|
* cr3 - set to gt if waking up with partial/complete hypervisor state loss
|
2017-06-13 21:05:51 +08:00
|
|
|
* r12 - SRR1
|
2016-07-08 14:20:44 +08:00
|
|
|
*/
|
2017-04-19 21:05:44 +08:00
|
|
|
.global pnv_powersave_wakeup
|
|
|
|
pnv_powersave_wakeup:
|
2017-04-19 21:05:51 +08:00
|
|
|
ld r2, PACATOC(r13)
|
|
|
|
|
2016-07-08 14:20:49 +08:00
|
|
|
BEGIN_FTR_SECTION
|
2017-04-19 21:05:51 +08:00
|
|
|
BEGIN_FTR_SECTION_NESTED(70)
|
|
|
|
bl power9_dd1_recover_paca
|
|
|
|
END_FTR_SECTION_NESTED_IFSET(CPU_FTR_POWER9_DD1, 70)
|
2017-04-19 21:05:46 +08:00
|
|
|
bl pnv_restore_hyp_resource_arch300
|
|
|
|
FTR_SECTION_ELSE
|
|
|
|
bl pnv_restore_hyp_resource_arch207
|
|
|
|
ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
|
2017-04-19 21:05:44 +08:00
|
|
|
|
|
|
|
li r0,PNV_THREAD_RUNNING
|
|
|
|
stb r0,PACA_THREAD_IDLE_STATE(r13) /* Clear thread state */
|
|
|
|
|
2017-06-13 21:05:51 +08:00
|
|
|
mr r3,r12
|
|
|
|
|
2017-04-19 21:05:44 +08:00
|
|
|
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
|
2017-10-19 12:14:20 +08:00
|
|
|
li r0,KVM_HWTHREAD_IN_KERNEL
|
|
|
|
stb r0,HSTATE_HWTHREAD_STATE(r13)
|
|
|
|
/* Order setting hwthread_state vs. testing hwthread_req */
|
|
|
|
sync
|
|
|
|
lbz r0,HSTATE_HWTHREAD_REQ(r13)
|
|
|
|
cmpwi r0,0
|
|
|
|
beq 1f
|
|
|
|
b kvm_start_guest
|
|
|
|
1:
|
2017-04-19 21:05:44 +08:00
|
|
|
#endif
|
|
|
|
|
|
|
|
/* Return SRR1 from power7_nap() */
|
|
|
|
blt cr3,pnv_wakeup_noloss
|
|
|
|
b pnv_wakeup_loss
|
|
|
|
|
2016-07-08 14:20:44 +08:00
|
|
|
/*
|
2017-04-19 21:05:44 +08:00
|
|
|
* Check whether we have woken up with hypervisor state loss.
|
|
|
|
* If yes, restore hypervisor state and return back to link.
|
2016-07-08 14:20:44 +08:00
|
|
|
*
|
|
|
|
* cr3 - set to gt if waking up with partial/complete hypervisor state loss
|
|
|
|
*/
|
2017-04-19 21:05:46 +08:00
|
|
|
pnv_restore_hyp_resource_arch300:
|
2017-06-25 01:29:01 +08:00
|
|
|
/*
|
|
|
|
* Workaround for POWER9, if we lost resources, the ERAT
|
2017-07-10 14:19:38 +08:00
|
|
|
* might have been mixed up and needs flushing. We also need
|
2017-07-20 09:53:22 +08:00
|
|
|
* to reload MMCR0 (see comment above). We also need to set
|
|
|
|
* then clear bit 60 in MMCRA to ensure the PMU starts running.
|
2017-06-25 01:29:01 +08:00
|
|
|
*/
|
|
|
|
blt cr3,1f
|
2017-11-03 12:13:20 +08:00
|
|
|
BEGIN_FTR_SECTION
|
2017-06-25 01:29:01 +08:00
|
|
|
PPC_INVALIDATE_ERAT
|
2017-07-10 14:19:38 +08:00
|
|
|
ld r1,PACAR1(r13)
|
2017-11-03 12:13:21 +08:00
|
|
|
ld r4,_MMCR0(r1)
|
|
|
|
mtspr SPRN_MMCR0,r4
|
2017-11-15 11:25:42 +08:00
|
|
|
END_FTR_SECTION_IFCLR(CPU_FTR_POWER9_DD2_1)
|
2017-07-20 09:53:22 +08:00
|
|
|
mfspr r4,SPRN_MMCRA
|
|
|
|
ori r4,r4,(1 << (63-60))
|
|
|
|
mtspr SPRN_MMCRA,r4
|
|
|
|
xori r4,r4,(1 << (63-60))
|
|
|
|
mtspr SPRN_MMCRA,r4
|
2017-06-25 01:29:01 +08:00
|
|
|
1:
|
2016-07-08 14:20:49 +08:00
|
|
|
/*
|
|
|
|
* POWER ISA 3. Use PSSCR to determine if we
|
|
|
|
* are waking up from deep idle state
|
|
|
|
*/
|
|
|
|
LOAD_REG_ADDRBASE(r5,pnv_first_deep_stop_state)
|
|
|
|
ld r4,ADDROFF(pnv_first_deep_stop_state)(r5)
|
|
|
|
|
2017-05-16 16:49:47 +08:00
|
|
|
BEGIN_FTR_SECTION_NESTED(71)
|
|
|
|
/*
|
|
|
|
* Assume that we are waking up from the state
|
|
|
|
* same as the Requested Level (RL) in the PSSCR
|
|
|
|
* which are Bits 60-63
|
|
|
|
*/
|
|
|
|
ld r5,PACA_REQ_PSSCR(r13)
|
|
|
|
rldicl r5,r5,0,60
|
|
|
|
FTR_SECTION_ELSE_NESTED(71)
|
2016-07-08 14:20:44 +08:00
|
|
|
/*
|
2016-07-08 14:20:49 +08:00
|
|
|
* 0-3 bits correspond to Power-Saving Level Status
|
|
|
|
* which indicates the idle state we are waking up from
|
|
|
|
*/
|
2017-05-16 16:49:47 +08:00
|
|
|
mfspr r5, SPRN_PSSCR
|
2016-07-08 14:20:49 +08:00
|
|
|
rldicl r5,r5,4,60
|
2017-05-16 16:49:47 +08:00
|
|
|
ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_POWER9_DD1, 71)
|
powerpc/powernv: Provide a way to force a core into SMT4 mode
POWER9 processors up to and including "Nimbus" v2.2 have hardware
bugs relating to transactional memory and thread reconfiguration.
One of these bugs has a workaround which is to get the core into
SMT4 state temporarily. This workaround is only needed when
running bare-metal.
This patch provides a function which gets the core into SMT4 mode
by preventing threads from going to a stop state, and waking up
those which are already in a stop state. Once at least 3 threads
are not in a stop state, the core will be in SMT4 and we can
continue.
To do this, we add a "dont_stop" flag to the paca to tell the
thread not to go into a stop state. If this flag is set,
power9_idle_stop() just returns immediately with a return value
of 0. The pnv_power9_force_smt4_catch() function does the following:
1. Set the dont_stop flag for each thread in the core, except
ourselves (in fact we use an atomic_inc() in case more than
one thread is calling this function concurrently).
2. See how many threads are awake, indicated by their
requested_psscr field in the paca being 0. If this is at
least 3, skip to step 5.
3. Send a doorbell interrupt to each thread that was seen as
being in a stop state in step 2.
4. Until at least 3 threads are awake, scan the threads to which
we sent a doorbell interrupt and check if they are awake now.
This relies on the following properties:
- Once dont_stop is non-zero, requested_psccr can't go from zero to
non-zero, except transiently (and without the thread doing stop).
- requested_psscr being zero guarantees that the thread isn't in
a state-losing stop state where thread reconfiguration could occur.
- Doing stop with a PSSCR value of 0 won't be a state-losing stop
and thus won't allow thread reconfiguration.
- Once threads_per_core/2 + 1 (i.e. 3) threads are awake, the core
must be in SMT4 mode, since SMT modes are powers of 2.
This does add a sync to power9_idle_stop(), which is necessary to
provide the correct ordering between setting requested_psscr and
checking dont_stop. The overhead of the sync should be unnoticeable
compared to the latency of going into and out of a stop state.
Because some objected to incurring this extra latency on systems where
the XER[SO] bug is not relevant, I have put the test in
power9_idle_stop inside a feature section. This means that
pnv_power9_force_smt4_catch() WILL NOT WORK correctly on systems
without the CPU_FTR_P9_TM_XER_SO_BUG feature bit set, and will
probably hang the system.
In order to cater for uses where the caller has an operation that
has to be done while the core is in SMT4, the core continues to be
kept in SMT4 after pnv_power9_force_smt4_catch() function returns,
until the pnv_power9_force_smt4_release() function is called.
It undoes the effect of step 1 above and allows the other threads
to go into a stop state.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-21 18:32:00 +08:00
|
|
|
li r0, 0 /* clear requested_psscr to say we're awake */
|
|
|
|
std r0, PACA_REQ_PSSCR(r13)
|
2016-07-08 14:20:49 +08:00
|
|
|
cmpd cr4,r5,r4
|
2017-04-19 21:05:44 +08:00
|
|
|
bge cr4,pnv_wakeup_tb_loss /* returns to caller */
|
2016-07-08 14:20:49 +08:00
|
|
|
|
2017-04-19 21:05:44 +08:00
|
|
|
blr /* Waking up without hypervisor state loss. */
|
2016-07-08 14:20:49 +08:00
|
|
|
|
2017-04-19 21:05:46 +08:00
|
|
|
/* Same calling convention as arch300 */
|
|
|
|
pnv_restore_hyp_resource_arch207:
|
2016-07-08 14:20:49 +08:00
|
|
|
/*
|
|
|
|
* POWER ISA 2.07 or less.
|
2017-04-19 21:05:50 +08:00
|
|
|
* Check if we slept with sleep or winkle.
|
2016-07-08 14:20:44 +08:00
|
|
|
*/
|
2017-04-19 21:05:50 +08:00
|
|
|
lbz r4,PACA_THREAD_IDLE_STATE(r13)
|
|
|
|
cmpwi cr2,r4,PNV_THREAD_NAP
|
|
|
|
bgt cr2,pnv_wakeup_tb_loss /* Either sleep or Winkle */
|
2016-07-08 14:20:44 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* We fall through here if PACA_THREAD_IDLE_STATE shows we are waking
|
|
|
|
* up from nap. At this stage CR3 shouldn't contains 'gt' since that
|
|
|
|
* indicates we are waking with hypervisor state loss from nap.
|
|
|
|
*/
|
|
|
|
bgt cr3,.
|
|
|
|
|
2017-04-19 21:05:44 +08:00
|
|
|
blr /* Waking up without hypervisor state loss */
|
2016-07-08 14:20:44 +08:00
|
|
|
|
2016-07-08 14:20:49 +08:00
|
|
|
/*
|
|
|
|
* Called if waking up from idle state which can cause either partial or
|
|
|
|
* complete hyp state loss.
|
|
|
|
* In POWER8, called if waking up from fastsleep or winkle
|
|
|
|
* In POWER9, called if waking up from stop state >= pnv_first_deep_stop_state
|
|
|
|
*
|
|
|
|
* r13 - PACA
|
|
|
|
* cr3 - gt if waking up with partial/complete hypervisor state loss
|
2017-04-19 21:05:50 +08:00
|
|
|
*
|
|
|
|
* If ISA300:
|
2016-09-07 13:16:30 +08:00
|
|
|
* cr4 - gt or eq if waking up from complete hypervisor state loss.
|
2017-04-19 21:05:50 +08:00
|
|
|
*
|
|
|
|
* If ISA207:
|
|
|
|
* r4 - PACA_THREAD_IDLE_STATE
|
2016-07-08 14:20:49 +08:00
|
|
|
*/
|
2017-04-19 21:05:44 +08:00
|
|
|
pnv_wakeup_tb_loss:
|
2014-02-26 08:08:43 +08:00
|
|
|
ld r1,PACAR1(r13)
|
2014-12-10 02:56:52 +08:00
|
|
|
/*
|
2017-03-17 13:13:20 +08:00
|
|
|
* Before entering any idle state, the NVGPRs are saved in the stack.
|
|
|
|
* If there was a state loss, or PACA_NAPSTATELOST was set, then the
|
|
|
|
* NVGPRs are restored. If we are here, it is likely that state is lost,
|
|
|
|
* but not guaranteed -- neither ISA207 nor ISA300 tests to reach
|
|
|
|
* here are the same as the test to restore NVGPRS:
|
|
|
|
* PACA_THREAD_IDLE_STATE test for ISA207, PSSCR test for ISA300,
|
|
|
|
* and SRR1 test for restoring NVGPRs.
|
|
|
|
*
|
|
|
|
* We are about to clobber NVGPRs now, so set NAPSTATELOST to
|
|
|
|
* guarantee they will always be restored. This might be tightened
|
|
|
|
* with careful reading of specs (particularly for ISA300) but this
|
|
|
|
* is already a slow wakeup path and it's simpler to be safe.
|
|
|
|
*/
|
|
|
|
li r0,1
|
|
|
|
stb r0,PACA_NAPSTATELOST(r13)
|
|
|
|
|
|
|
|
/*
|
2014-12-10 02:56:52 +08:00
|
|
|
*
|
2016-07-08 14:20:44 +08:00
|
|
|
* Save SRR1 and LR in NVGPRs as they might be clobbered in
|
2016-07-08 14:37:11 +08:00
|
|
|
* opal_call() (called in CHECK_HMI_INTERRUPT). SRR1 is required
|
2016-07-08 14:20:44 +08:00
|
|
|
* to determine the wakeup reason if we branch to kvm_start_guest. LR
|
|
|
|
* is required to return back to reset vector after hypervisor state
|
|
|
|
* restore is complete.
|
2014-12-10 02:56:52 +08:00
|
|
|
*/
|
2017-06-13 21:05:51 +08:00
|
|
|
mr r19,r12
|
2017-04-19 21:05:50 +08:00
|
|
|
mr r18,r4
|
2016-07-08 14:20:44 +08:00
|
|
|
mflr r17
|
2014-07-29 21:10:13 +08:00
|
|
|
BEGIN_FTR_SECTION
|
|
|
|
CHECK_HMI_INTERRUPT
|
|
|
|
END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
|
2014-12-10 02:56:52 +08:00
|
|
|
|
|
|
|
ld r14,PACA_CORE_IDLE_STATE_PTR(r13)
|
2017-04-19 21:05:49 +08:00
|
|
|
lbz r7,PACA_THREAD_MASK(r13)
|
|
|
|
|
2014-12-10 02:56:52 +08:00
|
|
|
/*
|
2017-04-19 21:05:49 +08:00
|
|
|
* Take the core lock to synchronize against other threads.
|
|
|
|
*
|
2014-12-10 02:56:52 +08:00
|
|
|
* Lock bit is set in one of the 2 cases-
|
|
|
|
* a. In the sleep/winkle enter path, the last thread is executing
|
|
|
|
* fastsleep workaround code.
|
|
|
|
* b. In the wake up path, another thread is executing fastsleep
|
|
|
|
* workaround undo code or resyncing timebase or restoring context
|
|
|
|
* In either case loop until the lock bit is cleared.
|
|
|
|
*/
|
2017-04-19 21:05:49 +08:00
|
|
|
1:
|
|
|
|
lwarx r15,0,r14
|
|
|
|
andis. r9,r15,PNV_CORE_IDLE_LOCK_BIT@h
|
2017-04-19 21:05:48 +08:00
|
|
|
bnel- core_idle_lock_held
|
2017-04-19 21:05:49 +08:00
|
|
|
oris r15,r15,PNV_CORE_IDLE_LOCK_BIT@h
|
|
|
|
stwcx. r15,0,r14
|
|
|
|
bne- 1b
|
|
|
|
isync
|
2014-12-10 02:56:52 +08:00
|
|
|
|
2017-04-19 21:05:48 +08:00
|
|
|
andi. r9,r15,PNV_CORE_IDLE_THREAD_BITS
|
|
|
|
cmpwi cr2,r9,0
|
2014-12-10 02:56:53 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* At this stage
|
2016-07-08 14:20:49 +08:00
|
|
|
* cr2 - eq if first thread to wakeup in core
|
|
|
|
* cr3- gt if waking up with partial/complete hypervisor state loss
|
2017-04-19 21:05:50 +08:00
|
|
|
* ISA300:
|
2016-09-07 13:16:30 +08:00
|
|
|
* cr4 - gt or eq if waking up from complete hypervisor state loss.
|
2014-12-10 02:56:53 +08:00
|
|
|
*/
|
|
|
|
|
2016-07-08 14:20:49 +08:00
|
|
|
BEGIN_FTR_SECTION
|
2017-04-19 21:05:50 +08:00
|
|
|
/*
|
|
|
|
* Were we in winkle?
|
|
|
|
* If yes, check if all threads were in winkle, decrement our
|
|
|
|
* winkle count, set all thread winkle bits if all were in winkle.
|
|
|
|
* Check if our thread has a winkle bit set, and set cr4 accordingly
|
|
|
|
* (to match ISA300, above). Pseudo-code for core idle state
|
|
|
|
* transitions for ISA207 is as follows (everything happens atomically
|
|
|
|
* due to store conditional and/or lock bit):
|
|
|
|
*
|
|
|
|
* nap_idle() { }
|
|
|
|
* nap_wake() { }
|
|
|
|
*
|
|
|
|
* sleep_idle()
|
|
|
|
* {
|
|
|
|
* core_idle_state &= ~thread_in_core
|
|
|
|
* }
|
|
|
|
*
|
|
|
|
* sleep_wake()
|
|
|
|
* {
|
|
|
|
* bool first_in_core, first_in_subcore;
|
|
|
|
*
|
|
|
|
* first_in_core = (core_idle_state & IDLE_THREAD_BITS) == 0;
|
|
|
|
* first_in_subcore = (core_idle_state & SUBCORE_SIBLING_MASK) == 0;
|
|
|
|
*
|
|
|
|
* core_idle_state |= thread_in_core;
|
|
|
|
* }
|
|
|
|
*
|
|
|
|
* winkle_idle()
|
|
|
|
* {
|
|
|
|
* core_idle_state &= ~thread_in_core;
|
|
|
|
* core_idle_state += 1 << WINKLE_COUNT_SHIFT;
|
|
|
|
* }
|
|
|
|
*
|
|
|
|
* winkle_wake()
|
|
|
|
* {
|
|
|
|
* bool first_in_core, first_in_subcore, winkle_state_lost;
|
|
|
|
*
|
|
|
|
* first_in_core = (core_idle_state & IDLE_THREAD_BITS) == 0;
|
|
|
|
* first_in_subcore = (core_idle_state & SUBCORE_SIBLING_MASK) == 0;
|
|
|
|
*
|
|
|
|
* core_idle_state |= thread_in_core;
|
|
|
|
*
|
|
|
|
* if ((core_idle_state & WINKLE_MASK) == (8 << WINKLE_COUNT_SIHFT))
|
|
|
|
* core_idle_state |= THREAD_WINKLE_BITS;
|
|
|
|
* core_idle_state -= 1 << WINKLE_COUNT_SHIFT;
|
|
|
|
*
|
|
|
|
* winkle_state_lost = core_idle_state &
|
|
|
|
* (thread_in_core << WINKLE_THREAD_SHIFT);
|
|
|
|
* core_idle_state &= ~(thread_in_core << WINKLE_THREAD_SHIFT);
|
|
|
|
* }
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
cmpwi r18,PNV_THREAD_WINKLE
|
|
|
|
bne 2f
|
|
|
|
andis. r9,r15,PNV_CORE_IDLE_WINKLE_COUNT_ALL_BIT@h
|
|
|
|
subis r15,r15,PNV_CORE_IDLE_WINKLE_COUNT@h
|
|
|
|
beq 2f
|
|
|
|
ori r15,r15,PNV_CORE_IDLE_THREAD_WINKLE_BITS /* all were winkle */
|
|
|
|
2:
|
|
|
|
/* Shift thread bit to winkle mask, then test if this thread is set,
|
|
|
|
* and remove it from the winkle bits */
|
|
|
|
slwi r8,r7,8
|
|
|
|
and r8,r8,r15
|
|
|
|
andc r15,r15,r8
|
|
|
|
cmpwi cr4,r8,1 /* cr4 will be gt if our bit is set, lt if not */
|
|
|
|
|
2016-07-08 14:20:49 +08:00
|
|
|
lbz r4,PACA_SUBCORE_SIBLING_MASK(r13)
|
|
|
|
and r4,r4,r15
|
|
|
|
cmpwi r4,0 /* Check if first in subcore */
|
|
|
|
|
|
|
|
or r15,r15,r7 /* Set thread bit */
|
|
|
|
beq first_thread_in_subcore
|
|
|
|
END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
|
|
|
|
|
|
|
|
or r15,r15,r7 /* Set thread bit */
|
|
|
|
beq cr2,first_thread_in_core
|
|
|
|
|
|
|
|
/* Not first thread in core or subcore to wake up */
|
|
|
|
b clear_lock
|
|
|
|
|
|
|
|
first_thread_in_subcore:
|
2014-12-10 02:56:53 +08:00
|
|
|
/*
|
|
|
|
* If waking up from sleep, subcore state is not lost. Hence
|
|
|
|
* skip subcore state restore
|
|
|
|
*/
|
2016-09-07 13:16:30 +08:00
|
|
|
blt cr4,subcore_state_restored
|
2014-12-10 02:56:53 +08:00
|
|
|
|
|
|
|
/* Restore per-subcore state */
|
|
|
|
ld r4,_SDR1(r1)
|
|
|
|
mtspr SPRN_SDR1,r4
|
2016-07-08 14:20:49 +08:00
|
|
|
|
2014-12-10 02:56:53 +08:00
|
|
|
ld r4,_RPR(r1)
|
|
|
|
mtspr SPRN_RPR,r4
|
|
|
|
ld r4,_AMOR(r1)
|
|
|
|
mtspr SPRN_AMOR,r4
|
|
|
|
|
|
|
|
subcore_state_restored:
|
|
|
|
/*
|
|
|
|
* Check if the thread is also the first thread in the core. If not,
|
|
|
|
* skip to clear_lock.
|
|
|
|
*/
|
|
|
|
bne cr2,clear_lock
|
|
|
|
|
|
|
|
first_thread_in_core:
|
|
|
|
|
2014-12-10 02:56:52 +08:00
|
|
|
/*
|
2016-07-08 14:20:49 +08:00
|
|
|
* First thread in the core waking up from any state which can cause
|
|
|
|
* partial or complete hypervisor state loss. It needs to
|
2014-12-10 02:56:52 +08:00
|
|
|
* call the fastsleep workaround code if the platform requires it.
|
|
|
|
* Call it unconditionally here. The below branch instruction will
|
2016-07-08 14:20:49 +08:00
|
|
|
* be patched out if the platform does not have fastsleep or does not
|
|
|
|
* require the workaround. Patching will be performed during the
|
|
|
|
* discovery of idle-states.
|
2014-12-10 02:56:52 +08:00
|
|
|
*/
|
|
|
|
.global pnv_fastsleep_workaround_at_exit
|
|
|
|
pnv_fastsleep_workaround_at_exit:
|
|
|
|
b fastsleep_workaround_at_exit
|
|
|
|
|
|
|
|
timebase_resync:
|
2016-07-08 14:20:49 +08:00
|
|
|
/*
|
|
|
|
* Use cr3 which indicates that we are waking up with atleast partial
|
|
|
|
* hypervisor state loss to determine if TIMEBASE RESYNC is needed.
|
|
|
|
*/
|
2017-05-16 16:49:44 +08:00
|
|
|
ble cr3,.Ltb_resynced
|
2014-02-26 08:08:43 +08:00
|
|
|
/* Time base re-sync */
|
2017-02-07 13:03:17 +08:00
|
|
|
bl opal_resync_timebase;
|
2014-12-10 02:56:53 +08:00
|
|
|
/*
|
2017-05-16 16:49:44 +08:00
|
|
|
* If waking up from sleep (POWER8), per core state
|
|
|
|
* is not lost, skip to clear_lock.
|
2014-12-10 02:56:53 +08:00
|
|
|
*/
|
2017-05-16 16:49:44 +08:00
|
|
|
.Ltb_resynced:
|
2016-09-07 13:16:30 +08:00
|
|
|
blt cr4,clear_lock
|
2014-12-10 02:56:53 +08:00
|
|
|
|
2016-07-08 14:20:49 +08:00
|
|
|
/*
|
|
|
|
* First thread in the core to wake up and its waking up with
|
|
|
|
* complete hypervisor state loss. Restore per core hypervisor
|
|
|
|
* state.
|
|
|
|
*/
|
|
|
|
BEGIN_FTR_SECTION
|
|
|
|
ld r4,_PTCR(r1)
|
|
|
|
mtspr SPRN_PTCR,r4
|
|
|
|
ld r4,_RPR(r1)
|
|
|
|
mtspr SPRN_RPR,r4
|
|
|
|
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
|
|
|
|
|
2014-12-10 02:56:53 +08:00
|
|
|
ld r4,_TSCR(r1)
|
|
|
|
mtspr SPRN_TSCR,r4
|
|
|
|
ld r4,_WORC(r1)
|
|
|
|
mtspr SPRN_WORC,r4
|
|
|
|
|
2014-12-10 02:56:52 +08:00
|
|
|
clear_lock:
|
2017-04-19 21:05:48 +08:00
|
|
|
xoris r15,r15,PNV_CORE_IDLE_LOCK_BIT@h
|
2014-12-10 02:56:52 +08:00
|
|
|
lwsync
|
|
|
|
stw r15,0(r14)
|
|
|
|
|
|
|
|
common_exit:
|
2014-12-10 02:56:53 +08:00
|
|
|
/*
|
|
|
|
* Common to all threads.
|
|
|
|
*
|
|
|
|
* If waking up from sleep, hypervisor state is not lost. Hence
|
|
|
|
* skip hypervisor state restore.
|
|
|
|
*/
|
2016-09-07 13:16:30 +08:00
|
|
|
blt cr4,hypervisor_state_restored
|
2014-12-10 02:56:53 +08:00
|
|
|
|
|
|
|
/* Waking up from winkle */
|
|
|
|
|
2016-07-08 14:20:49 +08:00
|
|
|
BEGIN_MMU_FTR_SECTION
|
|
|
|
b no_segments
|
2016-07-27 11:19:01 +08:00
|
|
|
END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_RADIX)
|
2014-12-10 02:56:53 +08:00
|
|
|
/* Restore SLB from PACA */
|
|
|
|
ld r8,PACA_SLBSHADOWPTR(r13)
|
|
|
|
|
|
|
|
.rept SLB_NUM_BOLTED
|
|
|
|
li r3, SLBSHADOW_SAVEAREA
|
|
|
|
LDX_BE r5, r8, r3
|
|
|
|
addi r3, r3, 8
|
|
|
|
LDX_BE r6, r8, r3
|
|
|
|
andis. r7,r5,SLB_ESID_V@h
|
|
|
|
beq 1f
|
|
|
|
slbmte r6,r5
|
|
|
|
1: addi r8,r8,16
|
|
|
|
.endr
|
2016-07-08 14:20:49 +08:00
|
|
|
no_segments:
|
|
|
|
|
|
|
|
/* Restore per thread state */
|
2014-12-10 02:56:53 +08:00
|
|
|
|
|
|
|
ld r4,_SPURR(r1)
|
|
|
|
mtspr SPRN_SPURR,r4
|
|
|
|
ld r4,_PURR(r1)
|
|
|
|
mtspr SPRN_PURR,r4
|
|
|
|
ld r4,_DSCR(r1)
|
|
|
|
mtspr SPRN_DSCR,r4
|
|
|
|
ld r4,_WORT(r1)
|
|
|
|
mtspr SPRN_WORT,r4
|
|
|
|
|
2016-07-08 14:20:49 +08:00
|
|
|
/* Call cur_cpu_spec->cpu_restore() */
|
|
|
|
LOAD_REG_ADDR(r4, cur_cpu_spec)
|
|
|
|
ld r4,0(r4)
|
|
|
|
ld r12,CPU_SPEC_RESTORE(r4)
|
|
|
|
#ifdef PPC64_ELF_ABI_v1
|
|
|
|
ld r12,0(r12)
|
|
|
|
#endif
|
|
|
|
mtctr r12
|
|
|
|
bctrl
|
|
|
|
|
powerpc/powernv: Save/Restore additional SPRs for stop4 cpuidle
The stop4 idle state on POWER9 is a deep idle state which loses
hypervisor resources, but whose latency is low enough that it can be
exposed via cpuidle.
Until now, the deep idle states which lose hypervisor resources (eg:
winkle) were only exposed via CPU-Hotplug. Hence currently on wakeup
from such states, barring a few SPRs which need to be restored to
their older value, rest of the SPRS are reinitialized to their values
corresponding to that at boot time.
When stop4 is used in the context of cpuidle, we want these additional
SPRs to be restored to their older value, to ensure that the context
on the CPU coming back from idle is same as it was before going idle.
In this patch, we define a SPR save area in PACA (since we have used
up the volatile register space in the stack) and on POWER9, we restore
SPRN_PID, SPRN_LDBAR, SPRN_FSCR, SPRN_HFSCR, SPRN_MMCRA, SPRN_MMCR1,
SPRN_MMCR2 to the values they had before entering stop.
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-21 18:41:37 +08:00
|
|
|
/*
|
|
|
|
* On POWER9, we can come here on wakeup from a cpuidle stop state.
|
|
|
|
* Hence restore the additional SPRs to the saved value.
|
|
|
|
*
|
|
|
|
* On POWER8, we come here only on winkle. Since winkle is used
|
|
|
|
* only in the case of CPU-Hotplug, we don't need to restore
|
|
|
|
* the additional SPRs.
|
|
|
|
*/
|
2017-05-16 16:49:45 +08:00
|
|
|
BEGIN_FTR_SECTION
|
powerpc/powernv: Save/Restore additional SPRs for stop4 cpuidle
The stop4 idle state on POWER9 is a deep idle state which loses
hypervisor resources, but whose latency is low enough that it can be
exposed via cpuidle.
Until now, the deep idle states which lose hypervisor resources (eg:
winkle) were only exposed via CPU-Hotplug. Hence currently on wakeup
from such states, barring a few SPRs which need to be restored to
their older value, rest of the SPRS are reinitialized to their values
corresponding to that at boot time.
When stop4 is used in the context of cpuidle, we want these additional
SPRs to be restored to their older value, to ensure that the context
on the CPU coming back from idle is same as it was before going idle.
In this patch, we define a SPR save area in PACA (since we have used
up the volatile register space in the stack) and on POWER9, we restore
SPRN_PID, SPRN_LDBAR, SPRN_FSCR, SPRN_HFSCR, SPRN_MMCRA, SPRN_MMCR1,
SPRN_MMCR2 to the values they had before entering stop.
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-21 18:41:37 +08:00
|
|
|
bl power9_restore_additional_sprs
|
2017-05-16 16:49:45 +08:00
|
|
|
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
|
2014-12-10 02:56:53 +08:00
|
|
|
hypervisor_state_restored:
|
|
|
|
|
2017-06-13 21:05:51 +08:00
|
|
|
mr r12,r19
|
2016-07-08 14:20:44 +08:00
|
|
|
mtlr r17
|
2017-04-19 21:05:44 +08:00
|
|
|
blr /* return to pnv_powersave_wakeup */
|
2014-02-26 08:08:43 +08:00
|
|
|
|
2014-12-10 02:56:52 +08:00
|
|
|
fastsleep_workaround_at_exit:
|
|
|
|
li r3,1
|
|
|
|
li r4,0
|
2017-02-07 13:03:17 +08:00
|
|
|
bl opal_config_cpu_idle_state
|
2014-12-10 02:56:52 +08:00
|
|
|
b timebase_resync
|
|
|
|
|
powerpc/powernv: Return to cpu offline loop when finished in KVM guest
When a secondary hardware thread has finished running a KVM guest, we
currently put that thread into nap mode using a nap instruction in
the KVM code. This changes the code so that instead of doing a nap
instruction directly, we instead cause the call to power7_nap() that
put the thread into nap mode to return. The reason for doing this is
to avoid having the KVM code having to know what low-power mode to
put the thread into.
In the case of a secondary thread used to run a KVM guest, the thread
will be offline from the point of view of the host kernel, and the
relevant power7_nap() call is the one in pnv_smp_cpu_disable().
In this case we don't want to clear pending IPIs in the offline loop
in that function, since that might cause us to miss the wakeup for
the next time the thread needs to run a guest. To tell whether or
not to clear the interrupt, we use the SRR1 value returned from
power7_nap(), and check if it indicates an external interrupt. We
arrange that the return from power7_nap() when we have finished running
a guest returns 0, so pending interrupts don't get flushed in that
case.
Note that it is important a secondary thread that has finished
executing in the guest, or that didn't have a guest to run, should
not return to power7_nap's caller while the kvm_hstate.hwthread_req
flag in the PACA is non-zero, because the return from power7_nap
will reenable the MMU, and the MMU might still be in guest context.
In this situation we spin at low priority in real mode waiting for
hwthread_req to become zero.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-03 11:48:40 +08:00
|
|
|
/*
|
|
|
|
* R3 here contains the value that will be returned to the caller
|
|
|
|
* of power7_nap.
|
2017-06-13 21:05:51 +08:00
|
|
|
* R12 contains SRR1 for CHECK_HMI_INTERRUPT.
|
powerpc/powernv: Return to cpu offline loop when finished in KVM guest
When a secondary hardware thread has finished running a KVM guest, we
currently put that thread into nap mode using a nap instruction in
the KVM code. This changes the code so that instead of doing a nap
instruction directly, we instead cause the call to power7_nap() that
put the thread into nap mode to return. The reason for doing this is
to avoid having the KVM code having to know what low-power mode to
put the thread into.
In the case of a secondary thread used to run a KVM guest, the thread
will be offline from the point of view of the host kernel, and the
relevant power7_nap() call is the one in pnv_smp_cpu_disable().
In this case we don't want to clear pending IPIs in the offline loop
in that function, since that might cause us to miss the wakeup for
the next time the thread needs to run a guest. To tell whether or
not to clear the interrupt, we use the SRR1 value returned from
power7_nap(), and check if it indicates an external interrupt. We
arrange that the return from power7_nap() when we have finished running
a guest returns 0, so pending interrupts don't get flushed in that
case.
Note that it is important a secondary thread that has finished
executing in the guest, or that didn't have a guest to run, should
not return to power7_nap's caller while the kvm_hstate.hwthread_req
flag in the PACA is non-zero, because the return from power7_nap
will reenable the MMU, and the MMU might still be in guest context.
In this situation we spin at low priority in real mode waiting for
hwthread_req to become zero.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-03 11:48:40 +08:00
|
|
|
*/
|
2017-04-19 21:05:44 +08:00
|
|
|
.global pnv_wakeup_loss
|
|
|
|
pnv_wakeup_loss:
|
2011-01-24 15:42:41 +08:00
|
|
|
ld r1,PACAR1(r13)
|
2014-07-29 21:10:13 +08:00
|
|
|
BEGIN_FTR_SECTION
|
|
|
|
CHECK_HMI_INTERRUPT
|
|
|
|
END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
|
2011-01-24 15:42:41 +08:00
|
|
|
REST_NVGPRS(r1)
|
|
|
|
REST_GPR(2, r1)
|
2017-06-13 21:05:51 +08:00
|
|
|
ld r4,PACAKMSR(r13)
|
|
|
|
ld r5,_LINK(r1)
|
powerpc/powernv: Return to cpu offline loop when finished in KVM guest
When a secondary hardware thread has finished running a KVM guest, we
currently put that thread into nap mode using a nap instruction in
the KVM code. This changes the code so that instead of doing a nap
instruction directly, we instead cause the call to power7_nap() that
put the thread into nap mode to return. The reason for doing this is
to avoid having the KVM code having to know what low-power mode to
put the thread into.
In the case of a secondary thread used to run a KVM guest, the thread
will be offline from the point of view of the host kernel, and the
relevant power7_nap() call is the one in pnv_smp_cpu_disable().
In this case we don't want to clear pending IPIs in the offline loop
in that function, since that might cause us to miss the wakeup for
the next time the thread needs to run a guest. To tell whether or
not to clear the interrupt, we use the SRR1 value returned from
power7_nap(), and check if it indicates an external interrupt. We
arrange that the return from power7_nap() when we have finished running
a guest returns 0, so pending interrupts don't get flushed in that
case.
Note that it is important a secondary thread that has finished
executing in the guest, or that didn't have a guest to run, should
not return to power7_nap's caller while the kvm_hstate.hwthread_req
flag in the PACA is non-zero, because the return from power7_nap
will reenable the MMU, and the MMU might still be in guest context.
In this situation we spin at low priority in real mode waiting for
hwthread_req to become zero.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-03 11:48:40 +08:00
|
|
|
ld r6,_CCR(r1)
|
2011-01-24 15:42:41 +08:00
|
|
|
addi r1,r1,INT_FRAME_SIZE
|
2017-06-13 21:05:51 +08:00
|
|
|
mtlr r5
|
powerpc/powernv: Return to cpu offline loop when finished in KVM guest
When a secondary hardware thread has finished running a KVM guest, we
currently put that thread into nap mode using a nap instruction in
the KVM code. This changes the code so that instead of doing a nap
instruction directly, we instead cause the call to power7_nap() that
put the thread into nap mode to return. The reason for doing this is
to avoid having the KVM code having to know what low-power mode to
put the thread into.
In the case of a secondary thread used to run a KVM guest, the thread
will be offline from the point of view of the host kernel, and the
relevant power7_nap() call is the one in pnv_smp_cpu_disable().
In this case we don't want to clear pending IPIs in the offline loop
in that function, since that might cause us to miss the wakeup for
the next time the thread needs to run a guest. To tell whether or
not to clear the interrupt, we use the SRR1 value returned from
power7_nap(), and check if it indicates an external interrupt. We
arrange that the return from power7_nap() when we have finished running
a guest returns 0, so pending interrupts don't get flushed in that
case.
Note that it is important a secondary thread that has finished
executing in the guest, or that didn't have a guest to run, should
not return to power7_nap's caller while the kvm_hstate.hwthread_req
flag in the PACA is non-zero, because the return from power7_nap
will reenable the MMU, and the MMU might still be in guest context.
In this situation we spin at low priority in real mode waiting for
hwthread_req to become zero.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-03 11:48:40 +08:00
|
|
|
mtcr r6
|
2017-06-13 21:05:51 +08:00
|
|
|
mtmsrd r4
|
|
|
|
blr
|
2011-01-24 15:42:41 +08:00
|
|
|
|
powerpc/powernv: Return to cpu offline loop when finished in KVM guest
When a secondary hardware thread has finished running a KVM guest, we
currently put that thread into nap mode using a nap instruction in
the KVM code. This changes the code so that instead of doing a nap
instruction directly, we instead cause the call to power7_nap() that
put the thread into nap mode to return. The reason for doing this is
to avoid having the KVM code having to know what low-power mode to
put the thread into.
In the case of a secondary thread used to run a KVM guest, the thread
will be offline from the point of view of the host kernel, and the
relevant power7_nap() call is the one in pnv_smp_cpu_disable().
In this case we don't want to clear pending IPIs in the offline loop
in that function, since that might cause us to miss the wakeup for
the next time the thread needs to run a guest. To tell whether or
not to clear the interrupt, we use the SRR1 value returned from
power7_nap(), and check if it indicates an external interrupt. We
arrange that the return from power7_nap() when we have finished running
a guest returns 0, so pending interrupts don't get flushed in that
case.
Note that it is important a secondary thread that has finished
executing in the guest, or that didn't have a guest to run, should
not return to power7_nap's caller while the kvm_hstate.hwthread_req
flag in the PACA is non-zero, because the return from power7_nap
will reenable the MMU, and the MMU might still be in guest context.
In this situation we spin at low priority in real mode waiting for
hwthread_req to become zero.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-03 11:48:40 +08:00
|
|
|
/*
|
|
|
|
* R3 here contains the value that will be returned to the caller
|
|
|
|
* of power7_nap.
|
2017-06-13 21:05:51 +08:00
|
|
|
* R12 contains SRR1 for CHECK_HMI_INTERRUPT.
|
powerpc/powernv: Return to cpu offline loop when finished in KVM guest
When a secondary hardware thread has finished running a KVM guest, we
currently put that thread into nap mode using a nap instruction in
the KVM code. This changes the code so that instead of doing a nap
instruction directly, we instead cause the call to power7_nap() that
put the thread into nap mode to return. The reason for doing this is
to avoid having the KVM code having to know what low-power mode to
put the thread into.
In the case of a secondary thread used to run a KVM guest, the thread
will be offline from the point of view of the host kernel, and the
relevant power7_nap() call is the one in pnv_smp_cpu_disable().
In this case we don't want to clear pending IPIs in the offline loop
in that function, since that might cause us to miss the wakeup for
the next time the thread needs to run a guest. To tell whether or
not to clear the interrupt, we use the SRR1 value returned from
power7_nap(), and check if it indicates an external interrupt. We
arrange that the return from power7_nap() when we have finished running
a guest returns 0, so pending interrupts don't get flushed in that
case.
Note that it is important a secondary thread that has finished
executing in the guest, or that didn't have a guest to run, should
not return to power7_nap's caller while the kvm_hstate.hwthread_req
flag in the PACA is non-zero, because the return from power7_nap
will reenable the MMU, and the MMU might still be in guest context.
In this situation we spin at low priority in real mode waiting for
hwthread_req to become zero.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-03 11:48:40 +08:00
|
|
|
*/
|
2017-04-19 21:05:44 +08:00
|
|
|
pnv_wakeup_noloss:
|
2011-12-06 03:47:26 +08:00
|
|
|
lbz r0,PACA_NAPSTATELOST(r13)
|
|
|
|
cmpwi r0,0
|
2016-07-08 14:20:46 +08:00
|
|
|
bne pnv_wakeup_loss
|
2017-06-13 21:05:51 +08:00
|
|
|
ld r1,PACAR1(r13)
|
2014-07-29 21:10:13 +08:00
|
|
|
BEGIN_FTR_SECTION
|
|
|
|
CHECK_HMI_INTERRUPT
|
|
|
|
END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
|
2017-06-13 21:05:51 +08:00
|
|
|
ld r4,PACAKMSR(r13)
|
2011-01-24 15:42:41 +08:00
|
|
|
ld r5,_NIP(r1)
|
2017-06-13 21:05:51 +08:00
|
|
|
ld r6,_CCR(r1)
|
2011-01-24 15:42:41 +08:00
|
|
|
addi r1,r1,INT_FRAME_SIZE
|
2017-06-13 21:05:51 +08:00
|
|
|
mtlr r5
|
powerpc/powernv: Restore non-volatile CRs after nap
Patches 7cba160ad "powernv/cpuidle: Redesign idle states management"
and 77b54e9f2 "powernv/powerpc: Add winkle support for offline cpus"
use non-volatile condition registers (cr2, cr3 and cr4) early in the system
reset interrupt handler (system_reset_pSeries()) before it has been determined
if state loss has occurred. If state loss has not occurred, control returns via
the power7_wakeup_noloss() path which does not restore those condition
registers, leaving them corrupted.
Fix this by restoring the condition registers in the power7_wakeup_noloss()
case.
This is apparent when running a KVM guest on hardware that does not
support winkle or sleep and the guest makes use of secondary threads. In
practice this means Power7 machines, though some early unreleased Power8
machines may also be susceptible.
The secondary CPUs are taken off line before the guest is started and
they call pnv_smp_cpu_kill_self(). This checks support for sleep
states (in this case there is no support) and power7_nap() is called.
When the CPU is woken, power7_nap() returns and because the CPU is
still off line, the main while loop executes again. The sleep states
support test is executed again, but because the tested values cannot
have changed, the compiler has optimized the test away and instead we
rely on the result of the first test, which has been left in cr3
and/or cr4. With the result overwritten, the wrong branch is taken and
power7_winkle() is called on a CPU that does not support it, leading
to it stalling.
Fixes: 7cba160ad789 ("powernv/cpuidle: Redesign idle states management")
Fixes: 77b54e9f213f ("powernv/powerpc: Add winkle support for offline cpus")
[mpe: Massage change log a bit more]
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-01 14:50:34 +08:00
|
|
|
mtcr r6
|
2017-06-13 21:05:51 +08:00
|
|
|
mtmsrd r4
|
|
|
|
blr
|