Commit Graph

933190 Commits

Author SHA1 Message Date
Wei Yongjun 92fe8483b1 cpuidle/pseries: Make symbol 'pseries_idle_driver' static
The sparse tool complains as follows:

drivers/cpuidle/cpuidle-pseries.c:25:23: warning:
 symbol 'pseries_idle_driver' was not declared. Should it be static?

'pseries_idle_driver' is not used outside of this file, so marks
it static.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200714142424.66648-1-weiyongjun1@huawei.com
2020-07-16 13:12:45 +10:00
Anton Blanchard 89c140bbae pseries: Fix 64 bit logical memory block panic
Booting with a 4GB LMB size causes us to panic:

  qemu-system-ppc64: OS terminated: OS panic:
      Memory block size not suitable: 0x0

Fix pseries_memory_block_size() to handle 64 bit LMBs.

Cc: stable@vger.kernel.org
Signed-off-by: Anton Blanchard <anton@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200715000820.1255764-1-anton@ozlabs.org
2020-07-16 13:12:45 +10:00
YueHaibing 29d9407e10 powerpc/xive: Remove unused inline function xive_kexec_teardown_cpu()
commit e27e0a9465 ("powerpc/xive: Remove xive_kexec_teardown_cpu()")
left behind this, remove it.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200715025040.33952-1-yuehaibing@huawei.com
2020-07-16 13:12:44 +10:00
Sourabh Jain ba608c4fa1 powerpc/fadump: fix race between pstore write and fadump crash trigger
When we enter into fadump crash path via system reset we fail to update
the pstore.

On the system reset path we first update the pstore then we go for fadump
crash. But the problem here is when all the CPUs try to get the pstore
lock to initiate the pstore write, only one CPUs will acquire the lock
and proceed with the pstore write. Since it in NMI context CPUs that fail
to get lock do not wait for their turn to write to the pstore and simply
proceed with the next operation which is fadump crash. One of the CPU who
proceeded with fadump crash path triggers the crash and does not wait for
the CPU who gets the pstore lock to complete the pstore update.

Timeline diagram to depicts the sequence of events that leads to an
unsuccessful pstore update when we hit fadump crash path via system reset.

                 1    2     3    ...      n   CPU Threads
                 |    |     |             |
                 |    |     |             |
 Reached to   -->|--->|---->| ----------->|
 system reset    |    |     |             |
 path            |    |     |             |
                 |    |     |             |
 Try to       -->|--->|---->|------------>|
 acquire the     |    |     |             |
 pstore lock     |    |     |             |
                 |    |     |             |
                 |    |     |             |
 Got the      -->| +->|     |             |<-+
 pstore lock     | |  |     |             |  |-->  Didn't get the
                 | --------------------------+     lock and moving
                 |    |     |             |        ahead on fadump
                 |    |     |             |        crash path
                 |    |     |             |
  Begins the  -->|    |     |             |
  process to     |    |     |             |<-- Got the chance to
  update the     |    |     |             |    trigger the crash
  pstore         | -> |     |    ... <-   |
                 | |  |     |         |   |
                 | |  |     |         |   |<-- Triggers the
                 | |  |     |         |   |    crash
                 | |  |     |         |   |      ^
                 | |  |     |         |   |      |
  Writing to  -->| |  |     |         |   |      |
  pstore         | |  |     |         |   |      |
                   |                  |          |
       ^           |__________________|          |
       |               CPU Relax                 |
       |                                         |
       +-----------------------------------------+
                          |
                          v
            Race: crash triggered before pstore
                  update completes

To avoid this race condition a barrier is added on crash_fadump path, it
prevents the CPU to trigger the crash until all the online CPUs completes
their task.

A barrier is added to make sure all the secondary CPUs hit the
crash_fadump function before we initiates the crash. A timeout is kept to
ensure the primary CPU (one who initiates the crash) do not wait for
secondary CPUs indefinitely.

Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200713052435.183750-1-sourabhjain@linux.ibm.com
2020-07-16 13:12:44 +10:00
Anton Blanchard ade7667a98 powerpc: Add cputime_to_nsecs()
Generic code has a wrapper to implement cputime_to_nsecs() on top of
cputime_to_usecs() but we can easily return the full nanosecond
resolution directly.

Signed-off-by: Anton Blanchard <anton@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200713083601.1103978-1-anton@ozlabs.org
2020-07-16 13:12:43 +10:00
Balamuruhan S e4208f1399 powerpc/ppc-opcode: Fold PPC_INST_* macros into PPC_RAW_* macros
Lots of PPC_INST_* macros are used only ever in PPC_* macros, fold
those PPC_INST_* into PPC_RAW_* to avoid using PPC_INST_*
accidentally.

Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
[mpe: Deal with PHWSYNC, PLWSYNC]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200624113038.908074-7-bala24@linux.ibm.com
2020-07-16 13:12:43 +10:00
Balamuruhan S 357c572948 powerpc/ppc-opcode: Reuse raw instruction macros to stringify
Wrap existing stringify macros to reuse raw instruction encoding
macros that are newly added.

Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
[mpe: Add DCBFPS, DCBSTPS, PHWSYNC, PLWSYNC]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200624113038.908074-6-bala24@linux.ibm.com
2020-07-16 13:12:43 +10:00
Balamuruhan S 3a18123791 powerpc/ppc-opcode: Consolidate powerpc instructions from bpf_jit.h
Move macro definitions of powerpc instructions from bpf_jit.h to
ppc-opcode.h and adopt the users of the macros accordingly. `PPC_MR()`
is defined twice in bpf_jit.h, remove the duplicate one.

Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200624113038.908074-5-bala24@linux.ibm.com
2020-07-16 13:12:42 +10:00
Balamuruhan S 0654186510 powerpc/bpf_jit: Reuse instruction macros from ppc-opcode.h
Remove duplicate macro definitions from bpf_jit.h and reuse the macros
from ppc-opcode.h

Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200624113038.908074-4-bala24@linux.ibm.com
2020-07-16 13:12:42 +10:00
Balamuruhan S 1d33dd8408 powerpc/ppc-opcode: Move ppc instruction encoding from test_emulate_step
Few ppc instructions are encoded in test_emulate_step.c, consolidate
them and use it from ppc-opcode.h

Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200624113038.908074-3-bala24@linux.ibm.com
2020-07-16 13:12:42 +10:00
Balamuruhan S db551f8cc6 powerpc/ppc-opcode: Introduce PPC_RAW_* macros for base instruction encoding
Introduce PPC_RAW_* macros to have all the bare encoding of ppc
instructions. Move `VSX_XX*()` and `TMRN()` macros up to reuse it.

Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
[mpe: Add DCBFPS, DCBSTPS, PHWSYNC, PLWSYNC]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200624113038.908074-2-bala24@linux.ibm.com
2020-07-16 13:12:41 +10:00
Kajol Jain 792f73f747 powerpc/hv-24x7: Add sysfs files inside hv-24x7 device to show cpumask
Patch here adds a cpumask attr to hv_24x7 pmu along with ABI documentation.

Primary use to expose the cpumask is for the perf tool which has the
capability to parse the driver sysfs folder and understand the
cpumask file. Having cpumask file will reduce the number of perf command
line parameters (will avoid "-C" option in the perf tool
command line). It can also notify the user which is
the current cpu used to retrieve the counter data.

command:# cat /sys/devices/hv_24x7/interface/cpumask
0

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200709051836.723765-3-kjain@linux.ibm.com
2020-07-16 13:12:41 +10:00
Kajol Jain 1a8f0886a6 powerpc/perf/hv-24x7: Add cpu hotplug support
Patch here adds cpu hotplug functions to hv_24x7 pmu.
A new cpuhp_state "CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE" enum
is added.

The online callback function updates the cpumask only if its
empty. As the primary intention of adding hotplug support
is to designate a CPU to make HCALL to collect the
counter data.

The offline function test and clear corresponding cpu in a cpumask
and update cpumask to any other active cpu.

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200709051836.723765-2-kjain@linux.ibm.com
2020-07-16 13:12:41 +10:00
Nathan Lynch e978a3ccaa powerpc/pseries: remove obsolete memory hotplug DT notifier code
pseries_update_drconf_memory() runs from a DT notifier in response to
an update to the ibm,dynamic-memory property of the
/ibm,dynamic-reconfiguration-memory node. This property is an older
less compact format than the ibm,dynamic-memory-v2 property used in
most currently supported firmwares. There has never been an equivalent
function for the v2 property.

pseries_update_drconf_memory() compares the 'assigned' flag for each
LMB in the old vs new properties and adds or removes the block
accordingly. However it appears to be of no actual utility:

* Partition suspension and PRRNs are specified only to change LMBs'
  NUMA affinity information. This notifier should be a no-op for those
  scenarios since the assigned flags should not change.

* The memory hotplug/DLPAR path has a hack which short-circuits
  execution of the notifier:
     dlpar_memory()
        ...
        rtas_hp_event = true;
        drmem_update_dt()
           of_update_property()
              pseries_memory_notifier()
                 pseries_update_drconf_memory()
                    if (rtas_hp_event) return;

So this code only makes sense as a relic of the time when more of the
DLPAR workflow took place in user space. I don't see a purpose for it
now.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200612051238.1007764-19-nathanl@linux.ibm.com
2020-07-16 13:12:41 +10:00
Nathan Lynch 38c392cef1 powerpc/pseries: remove dlpar_cpu_readd()
dlpar_cpu_readd() is unused now.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200612051238.1007764-18-nathanl@linux.ibm.com
2020-07-16 13:12:40 +10:00
Nathan Lynch 4abe60c644 powerpc/pseries: remove memory "re-add" implementation
dlpar_memory() no longer has any callers which pass
PSERIES_HP_ELOG_ACTION_READD. Remove this case and the corresponding
unreachable code.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200612051238.1007764-17-nathanl@linux.ibm.com
2020-07-16 13:12:40 +10:00
Nathan Lynch bb7c3d36e3 powerpc/pseries: remove prrn special case from DT update path
pseries_devicetree_update() is no longer called with PRRN_SCOPE. The
purpose of prrn_update_node() was to remove and then add back a LMB
whose NUMA assignment had changed. This has never been reliable, and
this codepath has been default-disabled for several releases. Remove
prrn_update_node().

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200612051238.1007764-16-nathanl@linux.ibm.com
2020-07-16 13:12:39 +10:00
Nathan Lynch cdf082c457 powerpc/numa: remove arch_update_cpu_topology
Since arch_update_cpu_topology() doesn't do anything on powerpc now,
remove it and associated dead code.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200612051238.1007764-15-nathanl@linux.ibm.com
2020-07-16 13:12:39 +10:00
Nathan Lynch 042ef7cc43 powerpc/numa: remove prrn_is_enabled()
All users of this prrn_is_enabled() are gone; remove it.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200612051238.1007764-14-nathanl@linux.ibm.com
2020-07-16 13:12:39 +10:00
Nathan Lynch 91713ac377 powerpc/rtasd: simplify handle_rtas_event(), emit message on events
prrn_is_enabled() always returns false/0, so handle_rtas_event() can
be simplified and some dead code can be removed. Use machine_is()
instead of #ifdef to run this code only on pseries, and add an
informational ratelimited message that we are ignoring the
events. PRRN events are relatively rare in normal operation and
usually arise from operator-initiated actions such as a DPO (Dynamic
Platform Optimizer) run.

Eventually we do want to consume these events and update the device
tree, but that needs more care to be safe vs LPM and DLPAR.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200612051238.1007764-13-nathanl@linux.ibm.com
2020-07-16 13:12:38 +10:00
Nathan Lynch 1835303e56 powerpc/numa: remove start/stop_topology_update()
These APIs have become no-ops, so remove them and all call sites.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200612051238.1007764-12-nathanl@linux.ibm.com
2020-07-16 13:12:38 +10:00
Nathan Lynch b1815aeac7 powerpc/numa: remove timed_topology_update()
timed_topology_update is a no-op now, so remove it and all call sites.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200612051238.1007764-11-nathanl@linux.ibm.com
2020-07-16 13:12:37 +10:00
Nathan Lynch 893ec6461f powerpc/numa: stub out numa_update_cpu_topology()
Previous changes have removed the code which sets bits in
cpu_associativity_changes_mask and thus it is never modifed at
runtime. From this we can reason that numa_update_cpu_topology()
always returns 0 without doing anything. Remove the body of
numa_update_cpu_topology() and remove all code which becomes
unreachable as a result.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200612051238.1007764-10-nathanl@linux.ibm.com
2020-07-16 13:12:37 +10:00
Nathan Lynch 9fb8b5fd1b powerpc/numa: remove vphn_enabled and prrn_enabled internal flags
These flags are always zero now; remove them and suitably adjust the
remaining references to them.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200612051238.1007764-9-nathanl@linux.ibm.com
2020-07-16 13:12:37 +10:00
Nathan Lynch 6325cb4a4e powerpc/numa: remove unreachable topology workqueue code
Since vphn_enabled is always 0, we can remove the call to
topology_schedule_update() and remove the code which becomes
unreachable as a result.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200612051238.1007764-8-nathanl@linux.ibm.com
2020-07-16 13:12:36 +10:00
Nathan Lynch 50e0cf3742 powerpc/numa: remove unreachable topology timer code
Since vphn_enabled is always 0, we can stub out
timed_topology_update() and remove the code which becomes unreachable.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200612051238.1007764-7-nathanl@linux.ibm.com
2020-07-16 13:12:36 +10:00
Nathan Lynch e6eacf8eb4 powerpc/numa: make vphn_enabled, prrn_enabled flags const
Previous changes have made it so these flags are never changed;
enforce this by making them const.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200612051238.1007764-6-nathanl@linux.ibm.com
2020-07-16 13:12:36 +10:00
Nathan Lynch 7d35bef96a powerpc/numa: remove unreachable topology update code
Since the topology_updates_enabled flag is now always false, remove it
and the code which has become unreachable. This is the minimum change
that prevents 'defined but unused' warnings emitted by the compiler
after stubbing out the start/stop_topology_updates() functions.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200612051238.1007764-5-nathanl@linux.ibm.com
2020-07-16 13:12:35 +10:00
Nathan Lynch c30f931e89 powerpc/numa: remove ability to enable topology updates
Remove the /proc/powerpc/topology_updates interface and the
topology_updates=on/off command line argument. The internal
topology_updates_enabled flag remains for now, but always false.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200612051238.1007764-4-nathanl@linux.ibm.com
2020-07-16 13:12:35 +10:00
Nathan Lynch ec2fc2a9e9 powerpc/rtas: don't online CPUs for partition suspend
Partition suspension, used for hibernation and migration, requires
that the OS place all but one of the LPAR's processor threads into one
of two states prior to calling the ibm,suspend-me RTAS function:

  * the architected offline state (via RTAS stop-self); or
  * the H_JOIN hcall, which does not return until the partition
    resumes execution

Using H_CEDE as the offline mode, introduced by
commit 3aa565f53c ("powerpc/pseries: Add hooks to put the CPU into
an appropriate offline state"), means that any threads which are
offline from Linux's point of view must be moved to one of those two
states before a partition suspension can proceed.

This was eventually addressed in commit 120496ac2d ("powerpc: Bring
all threads online prior to migration/hibernation"), which added code
to temporarily bring up any offline processor threads so they can call
H_JOIN. Conceptually this is fine, but the implementation has had
multiple races with cpu hotplug operations initiated from user
space[1][2][3], the error handling is fragile, and it generates
user-visible cpu hotplug events which is a lot of noise for a platform
feature that's supposed to minimize disruption to workloads.

With commit 3aa565f53c ("powerpc/pseries: Add hooks to put the CPU
into an appropriate offline state") reverted, this code becomes
unnecessary, so remove it. Since any offline CPUs now are truly
offline from the platform's point of view, it is no longer necessary
to bring up CPUs only to have them call H_JOIN and then go offline
again upon resuming. Only active threads are required to call H_JOIN;
stopped threads can be left alone.

[1] commit a6717c01dd ("powerpc/rtas: use device model APIs and
    serialization during LPM")
[2] commit 9fb603050f ("powerpc/rtas: retry when cpu offline races
    with suspend/migration")
[3] commit dfd718a2ed ("powerpc/rtas: Fix a potential race between
    CPU-Offline & Migration")

Fixes: 120496ac2d ("powerpc: Bring all threads online prior to migration/hibernation")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200612051238.1007764-3-nathanl@linux.ibm.com
2020-07-16 13:12:35 +10:00
Nathan Lynch 48f6e7f6d9 powerpc/pseries: remove cede offline state for CPUs
This effectively reverts commit 3aa565f53c ("powerpc/pseries: Add
hooks to put the CPU into an appropriate offline state"), which added
an offline mode for CPUs which uses the H_CEDE hcall instead of the
architected stop-self RTAS function in order to facilitate "folding"
of dedicated mode processors on PowerVM platforms to achieve energy
savings. This has been the default offline mode since its
introduction.

There's nothing about stop-self that would prevent the hypervisor from
achieving the energy savings available via H_CEDE, so the original
premise of this change appears to be flawed.

I also have encountered the claim that the transition to and from
ceded state is much faster than stop-self/start-cpu. Certainly we
would not want to use stop-self as an *idle* mode. That is what H_CEDE
is for. However, this difference is insignificant in the context of
Linux CPU hotplug, where the latency of an offline or online operation
on current systems is on the order of 100ms, mainly attributable to
all the various subsystems' cpuhp callbacks.

The cede offline mode also prevents accurate accounting, as discussed
before:
https://lore.kernel.org/linuxppc-dev/1571740391-3251-1-git-send-email-ego@linux.vnet.ibm.com/

Unconditionally use stop-self to offline processor threads. This is
the architected method for offlining CPUs on PAPR systems.

The "cede_offline" boot parameter is rendered obsolete.

Removing this code enables the removal of the partition suspend code
which temporarily onlines all present CPUs.

Fixes: 3aa565f53c ("powerpc/pseries: Add hooks to put the CPU into an appropriate offline state")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200612051238.1007764-2-nathanl@linux.ibm.com
2020-07-16 13:12:34 +10:00
Nicholas Piggin 4d24e21cc6 powerpc/security: Allow for processors that flush the link stack using the special bcctr
If both count cache and link stack are to be flushed, and can be flushed
with the special bcctr, patch that in directly to the flush/branch nop
site.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200609070610.846703-7-npiggin@gmail.com
2020-07-16 13:12:32 +10:00
Nicholas Piggin 70d7cdaf05 powerpc/64s: Move branch cache flushing bcctr variant to ppc-ops.h
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200609070610.846703-6-npiggin@gmail.com
2020-07-16 13:12:32 +10:00
Nicholas Piggin c0036549a9 powerpc/security: split branch cache flush toggle from code patching
Branch cache flushing code patching has inter-dependencies on both the
link stack and the count cache flushing state.

To make the code clearer and to separate the link stack and count
cache handling, split the "toggle" (setting up variables and printing
enable/disable) from the code patching.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Always print something, even if the flush is disabled]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200609070610.846703-5-npiggin@gmail.com
2020-07-16 13:12:32 +10:00
Nicholas Piggin 1afe00c74f powerpc/security: make display of branch cache flush more consistent
Make the count-cache and link-stack messages look the same

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200609070610.846703-4-npiggin@gmail.com
2020-07-16 13:12:31 +10:00
Nicholas Piggin c06ac27710 powerpc/security: change link stack flush state to the flush type enum
Prepare to allow for hardware link stack flushing by using the
none/sw/hw type, same as the count cache state.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200609070610.846703-3-npiggin@gmail.com
2020-07-16 13:12:31 +10:00
Nicholas Piggin 1026798c64 powerpc/security: re-name count cache flush to branch cache flush
The count cache flush mostly refers to both count cache and link stack
flushing. As a first step to untangling these a bit, re-name the bits
that apply to both.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200609070610.846703-2-npiggin@gmail.com
2020-07-16 13:12:31 +10:00
Nicholas Piggin b2b46304e9 powerpc: re-initialise lazy FPU/VEC counters on every fault
When a FP/VEC/VSX unavailable fault loads registers and enables the
facility in the MSR, re-set the lazy restore counters to 1 rather
than incrementing them so every fault gets the same number of
restores before the next fault.

This probably shouldn't be a practical change because if a lazy counter
was non-zero then it should have been restored and would not cause a
fault when userspace tries to access it. However the code and comment
implies otherwise so that's misleading and unnecessary.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200623234139.2262227-3-npiggin@gmail.com
2020-07-16 13:00:24 +10:00
Nicholas Piggin 01eb01877f powerpc/64s: Fix restore_math unnecessarily changing MSR
Before returning to user, if there are missing FP/VEC/VSX bits from the
user MSR then those registers had been saved and must be restored again
before use. restore_math will decide whether to restore immediately, or
skip the restore and let fp/vec/vsx unavailable faults demand load the
registers.

Each time restore_math restores one of the FP/VSX or VEC register sets
is loaded, an 8-bit counter is incremented (load_fp and load_vec). When
these wrap to zero, restore_math no longer restores that register set
until after they are next demand faulted.

It's quite usual for those counters to have different values, so if one
wraps to zero and restore_math no longer restores its registers or user
MSR bit but the other is not zero yet does not need to be restored
(because the kernel is not frequently using the FPU), then restore_math
will be called and it will also not return in the early exit check.
This causes msr_check_and_set to test and set the MSR at every kernel
exit despite having no work to do.

This can cause workloads (e.g., a NULL syscall microbenchmark) to run
fast for a time while both counters are non-zero, then slow down when
one of the counters reaches zero, then speed up again after the second
counter reaches zero. The cost is significant, about 10% slowdown on a
NULL syscall benchmark, and the jittery behaviour is very undesirable.

Fix this by having restore_math test all conditions first, and only
update MSR if we will be loading registers.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200623234139.2262227-2-npiggin@gmail.com
2020-07-16 13:00:24 +10:00
Nicholas Piggin 891b4fe8fe powerpc/64s: restore_math remove TM test
The TM test in restore_math added by commit dc16b553c9 ("powerpc:
Always restore FPU/VEC/VSX if hardware transactional memory in use") is
no longer necessary after commit a8318c13e7 ("powerpc/tm: Fix
restoring FP/VMX facility incorrectly on interrupts"), which removed
the cases where restore_math has to restore if TM is active.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200623234139.2262227-1-npiggin@gmail.com
2020-07-16 13:00:24 +10:00
Aneesh Kumar K.V 8c26ab7266 powerpc/pmem: Initialize pmem device on newer hardware
With kernel now supporting new pmem flush/sync instructions, we can now
enable the kernel to initialize the device. On P10 these devices would
appear with a new compatible string. For PAPR device we have

compatible       "ibm,pmemory-v2"

and for OF pmem device we have

compatible       "pmem-region-v2"

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200701072235.223558-8-aneesh.kumar@linux.ibm.com
2020-07-16 13:00:23 +10:00
Aneesh Kumar K.V 436499ab86 powerpc/pmem: Avoid the barrier in flush routines
nvdimm expect the flush routines to just mark the cache clean. The barrier
that mark the store globally visible is done in nvdimm_flush().

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200701072235.223558-7-aneesh.kumar@linux.ibm.com
2020-07-16 13:00:23 +10:00
Aneesh Kumar K.V 76e6c73f33 powerpc/pmem: Update ppc64 to use the new barrier instruction.
pmem on POWER10 can now use phwsync instead of hwsync to ensure
all previous writes are architecturally visible for the platform
buffer flush.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200701072235.223558-6-aneesh.kumar@linux.ibm.com
2020-07-16 13:00:23 +10:00
Aneesh Kumar K.V 3e79f082eb libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier
Architectures like ppc64 provide persistent memory specific barriers
that will ensure that all stores for which the modifications are
written to persistent storage by preceding dcbfps and dcbstps
instructions have updated persistent storage before any data
access or data transfer caused by subsequent instructions is initiated.
This is in addition to the ordering done by wmb()

Update nvdimm core such that architecture can use barriers other than
wmb to ensure all previous writes are architecturally visible for
the platform buffer flush.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200701072235.223558-5-aneesh.kumar@linux.ibm.com
2020-07-16 13:00:22 +10:00
Aneesh Kumar K.V d358042793 powerpc/pmem: Add flush routines using new pmem store and sync instruction
Start using dcbstps; phwsync; sequence for flushing persistent memory range.
The new instructions are implemented as a variant of dcbf and hwsync and on
P8 and P9 they will be executed as those instructions. We avoid using them on
older hardware. This helps to avoid difficult to debug bugs.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200701072235.223558-4-aneesh.kumar@linux.ibm.com
2020-07-16 13:00:22 +10:00
Aneesh Kumar K.V 32db09d992 powerpc/pmem: Add new instructions for persistent storage and sync
POWER10 introduces two new variants of dcbf instructions (dcbstps and dcbfps)
that can be used to write modified locations back to persistent storage.

Additionally, POWER10 also introduce phwsync and plwsync which can be used
to establish order of these writes to persistent storage.

This patch exposes these instructions to the rest of the kernel. The existing
dcbf and hwsync instructions in P8 and P9 are adequate to enable appropriate
synchronization with OpenCAPI-hosted persistent storage. Hence the new
instructions are added as a variant of the old ones that old hardware
won't differentiate.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200701072235.223558-3-aneesh.kumar@linux.ibm.com
2020-07-16 13:00:22 +10:00
Aneesh Kumar K.V c83040192f powerpc/pmem: Restrict papr_scm to P8 and above.
The PAPR based virtualized persistent memory devices are only supported on
POWER9 and above. In the followup patch, the kernel will switch the persistent
memory cache flush functions to use a new `dcbf` variant instruction. The new
instructions even though added in ISA 3.1 works even on P8 and P9 because these
are implemented as a variant of existing `dcbf` and `hwsync` and on P8 and
P9 behaves as such.

Considering these devices are only supported on P8 and above,  update the driver
to prevent a P7-compat guest from using persistent memory devices.

We don't update of_pmem driver with the same condition, because, on bare-metal,
the firmware enables pmem support only on P9 and above. There the kernel depends
on OPAL firmware to restrict exposing persistent memory related device tree
entries on older hardware. of_pmem.ko is written without any arch dependency and
we don't want to add ppc64 specific cpu feature check in of_pmem driver.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200701072235.223558-2-aneesh.kumar@linux.ibm.com
2020-07-16 13:00:21 +10:00
Nicholas Piggin dd3d9aa558 powerpc/mm/book3s64/radix: Off-load TLB invalidations to host when !GTSE
When platform doesn't support GTSE, let TLB invalidation requests
for radix guests be off-loaded to the host using H_RPT_INVALIDATE
hcall.

	[hcall wrapper, error path handling and renames]

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200703053608.12884-4-bharata@linux.ibm.com
2020-07-16 13:00:21 +10:00
Bharata B Rao b6c8417507 powerpc/pseries: H_REGISTER_PROC_TBL should ask for GTSE only if enabled
H_REGISTER_PROC_TBL asks for GTSE by default. GTSE flag bit should
be set only when GTSE is supported.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200703053608.12884-3-bharata@linux.ibm.com
2020-07-16 13:00:21 +10:00
Bharata B Rao 029ab30b4c powerpc/mm: Enable radix GTSE only if supported.
Make GTSE an MMU feature and enable it by default for radix.
However for guest, conditionally enable it if hypervisor supports
it via OV5 vector. Let prom_init ask for radix GTSE only if the
support exists.

Having GTSE as an MMU feature will make it easy to enable radix
without GTSE. Currently radix assumes GTSE is enabled by default.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200703053608.12884-2-bharata@linux.ibm.com
2020-07-16 13:00:21 +10:00