KVM guests have certain restrictions and performance quirks when using
doorbells. This patch moves the EPAPR KVM guest test so it can be shared
with PSERIES, and uses that in doorbell setup code to apply the KVM
guest quirks and improves IPI performance for two cases:
- PowerVM guests may now use doorbells even if they are secure.
- KVM guests no longer use doorbells if XIVE is available.
There is a valid complaint that "KVM guest" is not a very reasonable
thing to test for, it's preferable for the hypervisor to advertise
particular behaviours to the guest so they could change if the
hypervisor implementation or configuration changes. However in this case
we were already assuming a KVM guest worst case, so this patch is about
containing those quirks. If KVM later advertises fast doorbells, we
should test for that and override the quirks.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726035155.1424103-4-npiggin@gmail.com
KVM supports msgsndp in guests by trapping and emulating the
instruction, so it was decided to always use XIVE for IPIs if it is
available. However on PowerVM systems, msgsndp can be used and gives
better performance. On large systems, high XIVE interrupt rates can
have sub-linear scaling, and using msgsndp can reduce the load on
the interrupt controller.
So switch to using core local doorbells even if XIVE is available.
This reduces performance for KVM guests with an SMT topology by
about 50% for ping-pong context switching between SMT vCPUs. An
option vector (or dt-cpu-ftrs) could be defined to disable msgsndp
to get KVM performance back.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726035155.1424103-3-npiggin@gmail.com
This effectively reverts commit 3aa565f53c ("powerpc/pseries: Add
hooks to put the CPU into an appropriate offline state"), which added
an offline mode for CPUs which uses the H_CEDE hcall instead of the
architected stop-self RTAS function in order to facilitate "folding"
of dedicated mode processors on PowerVM platforms to achieve energy
savings. This has been the default offline mode since its
introduction.
There's nothing about stop-self that would prevent the hypervisor from
achieving the energy savings available via H_CEDE, so the original
premise of this change appears to be flawed.
I also have encountered the claim that the transition to and from
ceded state is much faster than stop-self/start-cpu. Certainly we
would not want to use stop-self as an *idle* mode. That is what H_CEDE
is for. However, this difference is insignificant in the context of
Linux CPU hotplug, where the latency of an offline or online operation
on current systems is on the order of 100ms, mainly attributable to
all the various subsystems' cpuhp callbacks.
The cede offline mode also prevents accurate accounting, as discussed
before:
https://lore.kernel.org/linuxppc-dev/1571740391-3251-1-git-send-email-ego@linux.vnet.ibm.com/
Unconditionally use stop-self to offline processor threads. This is
the architected method for offlining CPUs on PAPR systems.
The "cede_offline" boot parameter is rendered obsolete.
Removing this code enables the removal of the partition suspend code
which temporarily onlines all present CPUs.
Fixes: 3aa565f53c ("powerpc/pseries: Add hooks to put the CPU into an appropriate offline state")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200612051238.1007764-2-nathanl@linux.ibm.com
The replacement of <asm/pgrable.h> with <linux/pgtable.h> made the include
of the latter in the middle of asm includes. Fix this up with the aid of
the below script and manual adjustments here and there.
import sys
import re
if len(sys.argv) is not 3:
print "USAGE: %s <file> <header>" % (sys.argv[0])
sys.exit(1)
hdr_to_move="#include <linux/%s>" % sys.argv[2]
moved = False
in_hdrs = False
with open(sys.argv[1], "r") as f:
lines = f.readlines()
for _line in lines:
line = _line.rstrip('
')
if line == hdr_to_move:
continue
if line.startswith("#include <linux/"):
in_hdrs = True
elif not moved and in_hdrs:
moved = True
print hdr_to_move
print line
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-4-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The include/linux/pgtable.h is going to be the home of generic page table
manipulation functions.
Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and
make the latter include asm/pgtable.h.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Normally, the HV emulates some instructions like MSGSNDP, MSGCLRP
from a KVM guest. To emulate the instructions, it must first read
the instruction from the guest's memory and decode its parameters.
However for a secure guest (aka SVM), the page containing the
instruction is in secure memory and the HV cannot access directly.
It would need the Ultravisor (UV) to facilitate accessing the
instruction and parameters but the UV currently does not have
the support for such accesses.
Until the UV has such support, disable doorbells in SVMs. This might
incur a performance hit but that is yet to be quantified.
With this patch applied (needed only in SVMs not needed for HV) we
are able to launch SVM guests with multi-core support. Eg:
qemu -smp sockets=2,cores=2,threads=2.
Fix suggested by Benjamin Herrenschmidt. Thanks to input from
Paul Mackerras, Ram Pai and Michael Anderson.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190820021326.6884-13-bauerman@linux.ibm.com
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 3029 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bring in yet another series that touches KVM code, and might need to
be merged into the kvm-ppc branch to resolve conflicts.
This required some changes in pnv_power9_force_smt4_catch/release()
due to the paca array becomming an array of pointers.
Change the paca array into an array of pointers to pacas. Allocate
pacas individually.
This allows flexibility in where the PACAs are allocated. Future work
will allocate them node-local. Platforms that don't have address limits
on PACAs would be able to defer PACA allocations until later in boot
rather than allocate all possible ones up-front then freeing unused.
This is slightly more overhead (one additional indirection) for cross
CPU paca references, but those aren't too common.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Back in 2013 we added some hypercall wrappers which misspelled
"plpar" (P-series Logical PARtition) as "plapr".
Visually they're hard to distinguish and it almost doesn't matter, but
it is confusing when grepping to miss some calls because of the typo.
They've also started spreading, so before they take over let's fix
them all to be "plpar".
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This is the framework for using XIVE in a PowerVM guest. The support
is very similar to the native one in a much simpler form.
Each source is associated with an Event State Buffer (ESB). This is a
two bit state machine which is used to trigger events. The bits are
named "P" (pending) and "Q" (queued) and can be controlled by MMIO.
The Guest OS registers event (or notifications) queues on which the HW
will post event data for a target to notify.
Instead of OPAL calls, a set of Hypervisors call are used to configure
the interrupt sources and the event/notification queues of the guest:
- H_INT_GET_SOURCE_INFO
used to obtain the address of the MMIO page of the Event State
Buffer (PQ bits) entry associated with the source.
- H_INT_SET_SOURCE_CONFIG
assigns a source to a "target".
- H_INT_GET_SOURCE_CONFIG
determines to which "target" and "priority" is assigned to a source
- H_INT_GET_QUEUE_INFO
returns the address of the notification management page associated
with the specified "target" and "priority".
- H_INT_SET_QUEUE_CONFIG
sets or resets the event queue for a given "target" and "priority".
It is also used to set the notification config associated with the
queue, only unconditional notification for the moment. Reset is
performed with a queue size of 0 and queueing is disabled in that
case.
- H_INT_GET_QUEUE_CONFIG
returns the queue settings for a given "target" and "priority".
- H_INT_RESET
resets all of the partition's interrupt exploitation structures to
their initial state, losing all configuration set via the hcalls
H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.
- H_INT_SYNC
issue a synchronisation on a source to make sure sure all
notifications have reached their queue.
As for XICS, the XIVE interface for the guest is described in the
device tree under the "interrupt-controller" node. A couple of new
properties are specific to XIVE :
- "reg"
contains the base address and size of the thread interrupt
managnement areas (TIMA), also called rings, for the User level and
for the Guest OS level. Only the Guest OS level is taken into
account today.
- "ibm,xive-eq-sizes"
the size of the event queues. One cell per size supported, contains
log2 of size, in ascending order.
- "ibm,xive-lisn-ranges"
the interrupt numbers ranges assigned to the guest. These are
allocated using a simple bitmap.
and also :
- "/ibm,plat-res-int-priorities"
contains a list of priorities that the hypervisor has reserved for
its own use.
Tested with a QEMU XIVE model for pseries and with the Power hypervisor.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
nr_cpu_ids can be limited by nr_cpus boot parameter, whereas NR_CPUS is a
compile time constant, which shouldn't be compared against during cpu kick.
Signed-off-by: Santosh Sivaraj <santosh@fossix.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
During secondary start, we do not need to BUG_ON if an invalid CPU number
is passed. We already print an error if secondary cannot be started, so
just return an error instead.
Signed-off-by: Santosh Sivaraj <santosh@fossix.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Have the NMI IPI code use this op when the platform defines it.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Change the doorbell callers to know about their msgsnd addressing,
rather than have them set a per-cpu target data tag at boot that gets
sent to the cause_ipi functions. The data is only used for doorbell IPI
functions, no other IPI types, so it makes sense to keep that detail
local to doorbell.
Have the platform code understand doorbell IPIs, rather than the
interrupt controller code understand them. Platform code can look at
capabilities it has available and decide which to use.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
MPIC was only used by Power3 which is now unsupported, so remove MPIC
code. XICS is now the only supported interrupt controller for
pSeries so do some cleanups too.
Signed-off-by: Rashmica Gupta <rashmicy@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
smp_ops->probe() is currently supposed to return the number of cpus in
the system.
The last actual usage of the value was removed in May 2007 in e147ec8f18
"[POWERPC] Simplify smp_space_timers". We still passed the value around
until June 2010 when even that was finally removed in c1aa687d49
"powerpc: Clean up obsolete code relating to decrementer and timebase".
So drop that requirement, probe() now returns void, and update all
implementations.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
There is no need to put a function descriptor in
__secondary_hold_spinloop. Use ppc_function_entry to get the
instruction address and put it in __secondary_hold_spinloop instead.
Also fix an issue where we assumed cur_cpu_spec held a function
descriptor.
Signed-off-by: Anton Blanchard <anton@samba.org>
Starting secondary CPUs early on from Open Firmware and placing them
in a holding spin loop slows down the boot process significantly under
some hypervisors such as KVM.
This is also unnecessary when RTAS supports querying the CPU state
So let's not do it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
As a part of pseries_idle backend driver cleanup to make
the code common to both pseries and powernv platforms, it
is necessary to move the backend-driver code to drivers/cpuidle.
As a pre-requisite for that, it is essential to move plpar_wrapper.h
to include/asm.
Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
T4, Cell, powernv, and pseries had the same implementation, so switch
them to use a generic version. A2 apparently had a version, but
removed it at some point, so we remove the declaration, too.
Signed-off-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
'system_state < SYSTEM_RUNNING' will have same effect
with 'system_state == SYSTEM_BOOTING', but the later
one is more clearer.
Signed-off-by: liguang <lig.fnst@cn.fujitsu.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch actually hooks up doorbell interrupts on POWER8:
- Select the PPC_DOORBELL Kconfig option from PPC_PSERIES
- Add the doorbell CPU feature bit to POWER8
- We define a new pSeries_cause_ipi_mux() function that issues a
doorbell interrupt if the recipient is another thread within the same
core as the sender. If the recipient is in a different core it falls
back to using XICS to deliver the IPI as before.
- During pSeries_smp_probe() at boot, we check if doorbell interrupts
are supported. If they are we set the cause_ipi function pointer to
the above mentioned function, otherwise we leave it as whichever XICS
cause_ipi function was determined by xics_smp_probe().
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Tested-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CONFIG_HOTPLUG is going away as an option. As a result, the __dev*
markings need to be removed.
This change removes the use of __devinit, __devexit_p, __devinitdata,
__devinitconst, and __devexit from these drivers.
Based on patches originally written by Bill Pemberton, but redone by me
in order to handle some of the coding style issues better, by hand.
Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Remove the pSeries_reconfig.h header file. At this point there is only one
definition in the file, pSeries_coalesce_init(), which can be
moved to rtas.h.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The following patch is to remove the pseries_notify_add_cpu() call
and replace it by a hot plug notifier.
This would prevent cpuidle resources being released and allocated each
time cpu comes online on pseries.
The earlier design was causing a lockdep problem
in start_secondary as reported on this thread
-https://lkml.org/lkml/2012/5/17/2
This applies on 3.4-rc7
Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Disintegrate asm/system.h for PowerPC.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
cc: linuxppc-dev@lists.ozlabs.org
* 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (73 commits)
arm: fix up some samsung merge sysdev conversion problems
firmware: Fix an oops on reading fw_priv->fw in sysfs loading file
Drivers:hv: Fix a bug in vmbus_driver_unregister()
driver core: remove __must_check from device_create_file
debugfs: add missing #ifdef HAS_IOMEM
arm: time.h: remove device.h #include
driver-core: remove sysdev.h usage.
clockevents: remove sysdev.h
arm: convert sysdev_class to a regular subsystem
arm: leds: convert sysdev_class to a regular subsystem
kobject: remove kset_find_obj_hinted()
m86k: gpio - convert sysdev_class to a regular subsystem
mips: txx9_sram - convert sysdev_class to a regular subsystem
mips: 7segled - convert sysdev_class to a regular subsystem
sh: dma - convert sysdev_class to a regular subsystem
sh: intc - convert sysdev_class to a regular subsystem
power: suspend - convert sysdev_class to a regular subsystem
power: qe_ic - convert sysdev_class to a regular subsystem
power: cmm - convert sysdev_class to a regular subsystem
s390: time - convert sysdev_class to a regular subsystem
...
Fix up conflicts with 'struct sysdev' removal from various platform
drivers that got changed:
- arch/arm/mach-exynos/cpu.c
- arch/arm/mach-exynos/irq-eint.c
- arch/arm/mach-s3c64xx/common.c
- arch/arm/mach-s3c64xx/cpu.c
- arch/arm/mach-s5p64x0/cpu.c
- arch/arm/mach-s5pv210/common.c
- arch/arm/plat-samsung/include/plat/cpu.h
- arch/powerpc/kernel/sysfs.c
and fix up cpu_is_hotpluggable() as per Greg in include/linux/cpu.h
The sysdev.h file should not be needed by any in-kernel code, so remove
the .h file from these random files that seem to still want to include
it.
The sysdev code will be going away soon, so this include needs to be
removed no matter what.
Cc: Jiandong Zheng <jdzheng@broadcom.com>
Cc: Scott Branden <sbranden@broadcom.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Kukjin Kim <kgene.kim@samsung.com>
Cc: David Brown <davidb@codeaurora.org>
Cc: Daniel Walker <dwalker@fifo99.com>
Cc: Bryan Huntsman <bryanh@codeaurora.org>
Cc: Ben Dooks <ben-linux@fluff.org>
Cc: Wan ZongShun <mcuos.com@gmail.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: "Venkatesh Pallipadi
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
This patch implements a back-end cpuidle driver for pSeries
based on pseries_dedicated_idle_loop and pseries_shared_idle_loop
routines. The driver is built only if CONFIG_CPU_IDLE is set. This
cpuidle driver uses global registration of idle states and
not per-cpu.
Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com>
Signed-off-by: Arun R Bharadwaj <arun.r.bharadwaj@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
None of the files touched here are modules, and they are not
exporting any symbols either -- so there is no need to be including
the module.h. Builds of all the files remains successful.
Even kernel/module.c does not need to include it, since it includes
linux/moduleloader.h instead.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>
Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (99 commits)
drivers/virt: add missing linux/interrupt.h to fsl_hypervisor.c
powerpc/85xx: fix mpic configuration in CAMP mode
powerpc: Copy back TIF flags on return from softirq stack
powerpc/64: Make server perfmon only built on ppc64 server devices
powerpc/pseries: Fix hvc_vio.c build due to recent changes
powerpc: Exporting boot_cpuid_phys
powerpc: Add CFAR to oops output
hvc_console: Add kdb support
powerpc/pseries: Fix hvterm_raw_get_chars to accept < 16 chars, fixing xmon
powerpc/irq: Quieten irq mapping printks
powerpc: Enable lockup and hung task detectors in pseries and ppc64 defeconfigs
powerpc: Add mpt2sas driver to pseries and ppc64 defconfig
powerpc: Disable IRQs off tracer in ppc64 defconfig
powerpc: Sync pseries and ppc64 defconfigs
powerpc/pseries/hvconsole: Fix dropped console output
hvc_console: Improve tty/console put_chars handling
powerpc/kdump: Fix timeout in crash_kexec_wait_realmode
powerpc/mm: Fix output of total_ram.
powerpc/cpufreq: Add cpufreq driver for Momentum Maple boards
powerpc: Correct annotations of pmu registration functions
...
Fix up trivial Kconfig/Makefile conflicts in arch/powerpc, drivers, and
drivers/cpufreq
Many stupid corrections of duplicated includes based on the output of
scripts/checkincludes.pl.
Signed-off-by: Vitaliy Ivanov <vitalivanov@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
On many platforms (including pSeries), smp_ops->message_pass is always
smp_muxed_ipi_message_pass. This changes arch/powerpc/kernel/smp.c so
that if smp_ops->message_pass is NULL, it calls smp_muxed_ipi_message_pass
directly.
This means that a platform doesn't need to set both .message_pass and
.cause_ipi, only one of them. It is a slight performance improvement
in that it gets rid of an indirect function call at the expense of a
predictable conditional branch.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Consolidate the mux and demux of ipi messages into smp.c and call
a new smp_ops callback to actually trigger the ipi.
The powerpc architecture code is optimised for having 4 distinct
ipi triggers, which are mapped to 4 distinct messages (ipi many, ipi
single, scheduler ipi, and enter debugger). However, several interrupt
controllers only provide a single software triggered interrupt that
can be delivered to each cpu. To resolve this limitation, each smp_ops
implementation created a per-cpu variable that is manipulated with atomic
bitops. Since these lines will be contended they are optimialy marked as
shared_aligned and take a full cache line for each cpu. Distro kernels
may have 2 or 3 of these in their config, each taking per-cpu space
even though at most one will be in use.
This consolidation removes smp_message_recv and replaces the single call
actions cases with direct calls from the common message recognition loop.
The complicated debugger ipi case with its muxed crash handling code is
moved to debug_ipi_action which is now called from the demux code (instead
of the multi-message action calling smp_message_recv).
I put a call to reschedule_action to increase the likelyhood of correctly
merging the anticipated scheduler_ipi() hook coming from the scheduler
tree; that single required call can be inlined later.
The actual message decode is a copy of the old pseries xics code with its
memory barriers and cache line spacing, augmented with a per-cpu unsigned
long based on the book-e doorbell code. The optional data is set via a
callback from the implementation and is passed to the new cause-ipi hook
along with the logical cpu number. While currently only the doorbell
implemntation uses this data it should be almost zero cost to retrieve and
pass it -- it adds a single register load for the argument from the same
cache line to which we just completed a store and the register is dead
on return from the call. I extended the data element from unsigned int
to unsigned long in case some other code wanted to associate a pointer.
The doorbell check_self is replaced by a call to smp_muxed_ipi_resend,
conditioned on the CPU_DBELL feature. The ifdef guard could be relaxed
to CONFIG_SMP but I left it with BOOKE for now.
Also, the doorbell interrupt vector for book-e was not calling irq_enter
and irq_exit, which throws off cpu accounting and causes code to not
realize it is running in interrupt context. Add the missing calls.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When we start a cpu we use smp_ops->kick_cpu(), which currently
returns void, it should be able to fail. Convert it to return
int, and update all uses.
Convert all the current error cases to return -ENOENT, which is
what would eventually be returned by __cpu_up() currently when
it doesn't detect the cpu as coming up in time.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This is a significant rework of the XICS driver, too significant to
conveniently break it up into a series of smaller patches to be honest.
The driver is moved to a more generic location to allow new platforms
to use it, and is broken up into separate ICP and ICS "backends". For
now we have the native and "hypervisor" ICP backends and one common
RTAS ICS backend.
The driver supports one ICP backend instanciation, and many ICS ones,
in order to accomodate future platforms with multiple possibly different
interrupt "sources" mechanisms.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
If a given firmware doesn't have a token to support query-cpu-stopped-state,
its not likely to change during the lifetime of the kernel.
Only print this information once, not once per secondary thread.
While here, make the line wrap grep friendly.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The 'smt_enabled=X' boot option does not handle values of X > 2.
For Power 7 processors with smt modes of 0,1,2,3, and 4 this does
not work. This patch allows the smt_enabled option to be set to
any value limited to a max equal to the number of threads per
core.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Use new cpumask functions in pseries SMP startup code.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently we always call start-cpu irrespective of if the CPU is
stopped or not. Unfortunatley on POWER7, firmware seems to not like
start-cpu being called when a cpu already been started. This was not
the case on POWER6 and earlier.
This patch checks to see if the CPU is stopped or not via an
query-cpu-stopped-state call, and only calls start-cpu on CPUs which
are stopped.
This fixes a bug with kexec on POWER7 on PHYP where only the primary
thread would make it to the second kernel.
Reported-by: Ankita Garg <ankita@linux.vnet.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This moves query_cpu_stopped() out of the hotplug cpu code and into
smp.c so it can called in other places and renames it to
smp_query_cpu_stopped().
It also cleans up the return values by adding some #defines
Cc: <stable@kernel.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
String constants that are continued on subsequent lines with \
are not good.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
If an online-attempt on a CPU which has been offlined using H_CEDE
with an appropriate cede latency hint fails, don't panic.
Instead print the error message and let the __cpu_up() code notify the
CPU Hotplug framework of the failure, which in turn can notify the
other subsystem through CPU_UP_CANCELED.
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When a CPU is offlined on POWER currently, we call rtas_stop_self() and hand
the CPU back to the resource pool. This path is used for DLPAR which will
cause a change in the LPAR configuration which will be visible outside.
This patch changes the default state a CPU is put into when it is offlined.
On platforms which support ceding the processor to the hypervisor with
latency hint specifier value, during a cpu offline operation,
instead of calling rtas_stop_self(), we cede the vCPU to the hypervisor
while passing a latency hint specifier value. The Hypervisor can use this hint
to provide better energy savings. Also, during the offline
operation, the control of the vCPU remains with the LPAR as oppposed to
returning it to the resource pool.
The patch achieves this by creating an infrastructure to set the
preferred_offline_state() which can be either
- CPU_STATE_OFFLINE: which is the current behaviour of calling
rtas_stop_self()
- CPU_STATE_INACTIVE: which cedes the vCPU to the hypervisor with the latency
hint specifier.
The codepath which wants to perform a DLPAR operation can set the
preferred_offline_state() of a CPU to CPU_STATE_OFFLINE before invoking
cpu_down().
The patch also provides a boot-time command line argument to disable/enable
CPU_STATE_INACTIVE.
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Those definitions are currently declared extern in the .c file where
they are used, move them to a header file instead.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Several platforms use their own copy of what is essentially the same code,
using RTAS to synchronize the timebases when bringing up new CPUs. This
moves it all into a single common implementation and additionally
turns the spinlock into a raw spinlock since the former can rely on
the timebase not being frozen when spinlock debugging is enabled, and finally
masks interrupts while the timebase is disabled.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>