Commit Graph

505 Commits

Author SHA1 Message Date
Alexey Kardashevskiy ec132efaa8 spapr: Support NVIDIA V100 GPU with NVLink2
NVIDIA V100 GPUs have on-board RAM which is mapped into the host memory
space and accessible as normal RAM via an NVLink bus. The VFIO-PCI driver
implements special regions for such GPUs and emulates an NVLink bridge.
NVLink2-enabled POWER9 CPUs also provide address translation services
which includes an ATS shootdown (ATSD) register exported via the NVLink
bridge device.

This adds a quirk to VFIO to map the GPU memory and create an MR;
the new MR is stored in a PCI device as a QOM link. The sPAPR PCI uses
this to get the MR and map it to the system address space.
Another quirk does the same for ATSD.

This adds additional steps to sPAPR PHB setup:

1. Search for specific GPUs and NPUs, collect findings in
sPAPRPHBState::nvgpus, manage system address space mappings;

2. Add device-specific properties such as "ibm,npu", "ibm,gpu",
"memory-block", "link-speed" to advertise the NVLink2 function to
the guest;

3. Add "mmio-atsd" to vPHB to advertise the ATSD capability;

4. Add new memory blocks (with extra "linux,memory-usable" to prevent
the guest OS from accessing the new memory until it is onlined) and
npuphb# nodes representing an NPU unit for every vPHB as the GPU driver
uses it for link discovery.

This allocates space for GPU RAM and ATSD like we do for MMIOs by
adding 2 new parameters to the phb_placement() hook. Older machine types
set these to zero.

This puts new memory nodes in a separate NUMA node to as the GPU RAM
needs to be configured equally distant from any other node in the system.
Unlike the host setup which assigns numa ids from 255 downwards, this
adds new NUMA nodes after the user configures nodes or from 1 if none
were configured.

This adds requirement similar to EEH - one IOMMU group per vPHB.
The reason for this is that ATSD registers belong to a physical NPU
so they cannot invalidate translations on GPUs attached to another NPU.
It is guaranteed by the host platform as it does not mix NVLink bridges
or GPUs from different NPU in the same IOMMU group. If more than one
IOMMU group is detected on a vPHB, this disables ATSD support for that
vPHB and prints a warning.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[aw: for vfio portions]
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Message-Id: <20190312082103.130561-1-aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-04-26 10:41:23 +10:00
David Gibson 0a794529bd spapr: Simplify handling of host-serial and host-model values
27461d69a0 "ppc: add host-serial and host-model machine attributes
(CVE-2019-8934)" introduced 'host-serial' and 'host-model' machine
properties for spapr to explicitly control the values advertised to the
guest in device tree properties with the same names.

The previous behaviour on KVM was to unconditionally populate the device
tree with the real host serial number and model, which leaks possibly
sensitive information about the host to the guest.

To maintain compatibility for old machine types, we allowed those props
to be set to "passthrough" to take the value from the host as before.  Or
they could be set to "none" to explicitly omit the device tree items.

Special casing specific values on what's otherwise a user supplied string
is very ugly.  So, this patch simplifies things by implementing the
backwards compatibility in a different way: we have a machine class flag
set for the older machines, and we only load the host values into the
device tree if A) they're not set by the user and B) we have that flag set.

This does mean that the "passthrough" functionality is no longer available
with the current machine type.  That's ok though: if a user or management
layer really wants the information passed through they can read it
themselves (OpenStack Nova already does something similar for x86).

It also means the user can't explicitly ask for the values to be omitted
on the old machine types.  I think that's an acceptable trade-off: if you
care enough about not leaking the host information you can either move to
the new machine type, or use a dummy value for the properties.

For the new machine type, this also removes an odd inconsistency
between running on a POWER and non-POWER (or non-Linux) hosts: if the
host information couldn't be read from where we expect (in the host's
device tree as exposed by Linux), we'd fallback to omitting the guest
device tree items.

While we're there, improve some poorly worded comments, and the help text
for the properties.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
2019-03-29 10:25:50 +11:00
David Gibson ce2918cbc3 spapr: Use CamelCase properly
The qemu coding standard is to use CamelCase for type and structure names,
and the pseries code follows that... sort of.  There are quite a lot of
places where we bend the rules in order to preserve the capitalization of
internal acronyms like "PHB", "TCE", "DIMM" and most commonly "sPAPR".

That was a bad idea - it frequently leads to names ending up with hard to
read clusters of capital letters, and means they don't catch the eye as
type identifiers, which is kind of the point of the CamelCase convention in
the first place.

In short, keeping type identifiers look like CamelCase is more important
than preserving standard capitalization of internal "words".  So, this
patch renames a heap of spapr internal type names to a more standard
CamelCase.

In addition to case changes, we also make some other identifier renames:
  VIOsPAPR* -> SpaprVio*
    The reverse word ordering was only ever used to mitigate the capital
    cluster, so revert to the natural ordering.
  VIOsPAPRVTYDevice -> SpaprVioVty
  VIOsPAPRVLANDevice -> SpaprVioVlan
    Brevity, since the "Device" didn't add useful information
  sPAPRDRConnector -> SpaprDrc
  sPAPRDRConnectorClass -> SpaprDrcClass
    Brevity, and makes it clearer this is the same thing as a "DRC"
    mentioned in many other places in the code

This is 100% a mechanical search-and-replace patch.  It will, however,
conflict with essentially any and all outstanding patches touching the
spapr code.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:05 +11:00
Cédric Le Goater 5dad902ce0 ppc/pnv: POWER9 XSCOM quad support
The POWER9 processor does not support per-core frequency control. The
cores are arranged in groups of four, along with their respective L2
and L3 caches, into a structure known as a Quad. The frequency must be
managed at the Quad level.

Provide a basic Quad model to fake the settings done by the firmware
on the Non-Cacheable Unit (NCU). Each core pair (EX) needs a special
BAR setting for the TIMA area of XIVE because it resides on the same
address on all chips.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-12-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater 90ef386c74 ppc/pnv: extend XSCOM core support for POWER9
Provide a new class attribute to define XSCOM operations per CPU
family and add a couple of XSCOM addresses controlling the power
management states of the core on POWER9.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-11-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater 6598a70d00 ppc/pnv: add a OCC model for POWER9
The OCC on POWER9 is very similar to the one found on POWER8. Provide
the same routines with P9 values for the registers and IRQ number.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-10-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater 3233838cd1 ppc/pnv: add a OCC model class
To ease the introduction of the OCC model for POWER9, provide a new
class attributes to define XSCOM operations per CPU family and a PSI
IRQ number.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190307223548.20516-9-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater 8207b90604 ppc/pnv: add SerIRQ routing registers
This is just a simple reminder that SerIRQ routing should be
addressed.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-8-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater 15376c66fa ppc/pnv: add a LPC Controller model for POWER9
The LPC Controller on POWER9 is very similar to the one found on
POWER8 but accesses are now done via on MMIOs, without the XSCOM and
ECCB logic. The device tree is populated differently so we add a
specific POWER9 routine for the purpose.

SerIRQ routing is yet to be done.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-7-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater 64d011d56e ppc/pnv: add a 'dt_isa_nodename' to the chip
The ISA bus has a different DT nodename on POWER9. Compute the name
when the PnvChip is realized, that is before it is used by the machine
to populate the device tree with the ISA devices.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-6-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater 82514be28b ppc/pnv: add a LPC Controller class model
It will ease the introduction of the LPC Controller model for POWER9.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190307223548.20516-5-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater c38536bc80 ppc/pnv: add a PSI bridge model for POWER9
The PSI bridge on POWER9 is very similar to POWER8. The BAR is still
set through XSCOM but the controls are now entirely done with MMIOs.
More interrupts are defined and the interrupt controller interface has
changed to XIVE. The POWER9 model is a first example of the usage of
the notify() handler of the XiveNotifier interface, linking the PSI
XiveSource to its owning device model.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-3-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater ae85605531 ppc/pnv: add a PSI bridge class model
To ease the introduction of the PSI bridge model for POWER9, abstract
the POWER chip differences in a PnvPsi class model and introduce a
specific Pnv8Psi type for POWER8. POWER8 interface to the interrupt
controller is still XICS whereas POWER9 uses the new XIVE model.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190307223548.20516-2-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Alexey Kardashevskiy 5f36666722 spapr_iommu: Do not replay mappings from just created DMA window
On sPAPR vfio_listener_region_add() is called in 2 situations:
1. a new listener is registered from vfio_connect_container();
2. a new IOMMU Memory Region is added from rtas_ibm_create_pe_dma_window().

In both cases vfio_listener_region_add() calls
memory_region_iommu_replay() to notify newly registered IOMMU notifiers
about existing mappings which is totally desirable for case 1.

However for case 2 it is nothing but noop as the window has just been
created and has no valid mappings so replaying those does not do anything.
It is barely noticeable with usual guests but if the window happens to be
really big, such no-op replay might take minutes and trigger RCU stall
warnings in the guest.

For example, a upcoming GPU RAM memory region mapped at 64TiB (right
after SPAPR_PCI_LIMIT) causes a 64bit DMA window to be at least 128TiB
which is (128<<40)/0x10000=2.147.483.648 TCEs to replay.

This mitigates the problem by adding an "skipping_replay" flag to
sPAPRTCETable and defining sPAPR own IOMMU MR replay() hook which does
exactly the same thing as the generic one except it returns early if
@skipping_replay==true.

Another way of fixing this would be delaying replay till the very first
H_PUT_TCE but this does not work if in-kernel H_PUT_TCE handler is
enabled (a likely case).

When "ibm,create-pe-dma-window" is complete, the guest will map only
required regions of the huge DMA window.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20190307050518.64968-2-aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater d8e4aad533 ppc/pnv: introduce a new pic_print_info() operation to the chip model
The POWER9 and POWER8 processors have different interrupt controllers,
and reporting their state requires calling different helper routines.

However, the interrupt presenters are still handled in the higher
level pic_print_info() routine because they are not related to the
chip.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190306085032.15744-9-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater eb859a27e1 ppc/pnv: introduce a new dt_populate() operation to the chip model
The POWER9 and POWER8 processors have a different set of devices and a
different device tree layout.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190306085032.15744-8-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater 2dfa91a2aa ppc/pnv: add a XIVE interrupt controller model for POWER9
This is a simple model of the POWER9 XIVE interrupt controller for the
PowerNV machine which only addresses the needs of the skiboot
firmware. The PowerNV model reuses the common XIVE framework developed
for sPAPR as the fundamentals aspects are quite the same. The
difference are outlined below.

The controller initial BAR configuration is performed using the XSCOM
bus from there, MMIO are used for further configuration.

The MMIO regions exposed are :

 - Interrupt controller registers
 - ESB pages for IPIs and ENDs
 - Presenter MMIO (Not used)
 - Thread Interrupt Management Area MMIO, direct and indirect

The virtualization controller MMIO region containing the IPI ESB pages
and END ESB pages is sub-divided into "sets" which map portions of the
VC region to the different ESB pages. These are modeled with custom
address spaces and the XiveSource and XiveENDSource objects are sized
to the maximum allowed by HW. The memory regions are resized at
run-time using the configuration of EDT set translation table provided
by the firmware.

The XIVE virtualization structure tables (EAT, ENDT, NVTT) are now in
the machine RAM and not in the hypervisor anymore. The firmware
(skiboot) configures these tables using Virtual Structure Descriptor
defining the characteristics of each table : SBE, EAS, END and
NVT. These are later used to access the virtual interrupt entries. The
internal cache of these tables in the interrupt controller is updated
and invalidated using a set of registers.

Still to address to complete the model but not fully required is the
support for block grouping. Escalation support will be necessary for
KVM guests.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190306085032.15744-7-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater 956b8f468d ppc/pnv: change the CPU machine_data presenter type to Object *
The POWER9 PowerNV machine will use a XIVE interrupt presenter type.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190306085032.15744-6-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater a58a18adee ppc/pnv: export the xive_router_notify() routine
The PowerNV machine with need to encode the block id in the source
interrupt number before forwarding the source event notification to
the Router.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190306085032.15744-5-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater f9b9db3860 ppc/xive: export the TIMA memory accessors
The PowerNV machine can perform indirect loads and stores on the TIMA
on behalf of another CPU. Give the controller the possibility to call
the TIMA memory accessors with a XiveTCTX of its choice.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190306085032.15744-4-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Cédric Le Goater 051e2973bf ppc: externalize ppc_get_vcpu_by_pir()
We will use it to get the CPU interrupt presenter in XIVE when the
TIMA is accessed from the indirect page.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190306085032.15744-3-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
David Gibson e075623aa5 spapr: Force SPAPR_MEMORY_BLOCK_SIZE to be a hwaddr (64-bit)
SPAPR_MEMORY_BLOCK_SIZE is logically a difference in memory addresses, and
hence of type hwaddr which is 64-bit.  Previously it wasn't marked as such
which means that it could be treated as 32-bit.  That will work in some
circumstances but if multiplied by another 32-bit value it could lead to
a 32-bit overflow and an incorrect result.

One specific instance of this in spapr_lmb_dt_populate() was spotted by
Coverity (CID 1399145).

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 14:33:04 +11:00
Suraj Jitindar Singh 8ff43ee404 target/ppc/spapr: Add SPAPR_CAP_CCF_ASSIST
Introduce a new spapr_cap SPAPR_CAP_CCF_ASSIST to be used to indicate
the requirement for a hw-assisted version of the count cache flush
workaround.

The count cache flush workaround is a software workaround which can be
used to flush the count cache on context switch. Some revisions of
hardware may have a hardware accelerated flush, in which case the
software flush can be shortened. This cap is used to set the
availability of such hardware acceleration for the count cache flush
routine.

The availability of such hardware acceleration is indicated by the
H_CPU_CHAR_BCCTR_FLUSH_ASSIST flag being set in the characteristics
returned from the KVM_PPC_GET_CPU_CHAR ioctl.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Message-Id: <20190301031912.28809-2-sjitindarsingh@gmail.com>
[dwg: Small style fixes]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 12:07:49 +11:00
Suraj Jitindar Singh 399b2896d4 target/ppc/spapr: Add workaround option to SPAPR_CAP_IBS
The spapr_cap SPAPR_CAP_IBS is used to indicate the level of capability
for mitigations for indirect branch speculation. Currently the available
values are broken (default), fixed-ibs (fixed by serialising indirect
branches) and fixed-ccd (fixed by diabling the count cache).

Introduce a new value for this capability denoted workaround, meaning that
software can work around the issue by flushing the count cache on
context switch. This option is available if the hypervisor sets the
H_CPU_BEHAV_FLUSH_COUNT_CACHE flag in the cpu behaviours returned from
the KVM_PPC_GET_CPU_CHAR ioctl.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Message-Id: <20190301031912.28809-1-sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 12:07:49 +11:00
Suraj Jitindar Singh c982f5cf9a target/ppc/spapr: Add SPAPR_CAP_LARGE_DECREMENTER
Add spapr_cap SPAPR_CAP_LARGE_DECREMENTER to be used to control the
availability of the large decrementer for a guest.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Message-Id: <20190301024317.22137-1-sjitindarsingh@gmail.com>
[dwg: Trivial style fix]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 12:07:49 +11:00
Greg Kurz bb2bdd812e spapr: add hotplug hooks for PHB hotplug
Hotplugging PHBs is a machine-level operation, but PHBs reside on the
main system bus, so we register spapr machine as the handler for the
main system bus.

Provide the usual pre-plug, plug and unplug-request handlers.

Move the checking of the PHB index to the pre-plug handler. It is okay
to do that and assert in the realize function because the pre-plug
handler is always called, even for the oldest machine types we support.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
(Fixed interrupt controller phandle in "interrupt-map" and
 TCE table size in "ibm,dma-window" FDT fragment, Greg Kurz)
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059672926.1466090.13612804072190051439.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Michael Roth 962b6c3650 spapr: create DR connectors for PHBs
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059670389.1466090.10015601248906623076.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz ad62bff638 spapr_irq: Expose the phandle of the interrupt controller
This will be used by PHB hotplug in order to create the "interrupt-map"
property of the PHB node.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059669374.1466090.12943228478046223856.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz 743ed566c1 spapr: Expose the name of the interrupt controller node
This will be needed by PHB hotplug in order to access the "phandle"
property of the interrupt controller node.

Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <155059668867.1466090.6339199751719123386.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz 6cead90c5c xics: Write source state to KVM at claim time
The pseries machine only uses LSIs to support legacy PCI devices. Every
PHB claims 4 LSIs at realize time. When using in-kernel XICS (or upcoming
in-kernel XIVE), QEMU synchronizes the state of all irqs, including these
LSIs, later on at machine reset.

In order to support PHB hotplug, we need a way to tell KVM about the LSIs
that doesn't require a machine reset. An easy way to do that is to always
inform KVM when an interrupt is claimed, which really isn't a performance
path.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059668360.1466090.5969630516627776426.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz 09d876ce2c spapr/drc: Drop spapr_drc_attach() fdt argument
All DRC subtypes have been converted to generate the FDT fragment at
configure connector time instead of attach time. The fdt and fdt_offset
arguments of spapr_drc_attach() aren't needed anymore. Drop them and
make the implementation of the dt_populate() method mandatory.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059667853.1466090.16527852453054217565.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz 345b12b99e spapr: Generate FDT fragment for CPUs at configure connector time
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059666839.1466090.3833376527523126752.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz 62d38c9bd3 spapr: Generate FDT fragment for LMBs at configure connector time
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059666331.1466090.6766540766297333313.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Greg Kurz d9c95c71ac spapr_drc: Allow FDT fragment to be added later
The current logic is to provide the FDT fragment when attaching a device
to a DRC. This works perfectly fine for our current hotplug support, but
soon we will add support for PHB hotplug which has some constraints, that
CPU, PCI and LMB devices don't seem to have.

The first constraint is that the "ibm,dma-window" property of the PHB
node requires the IOMMU to be configured, ie, spapr_tce_table_enable()
has been called, which happens during PHB reset. It is okay in the case
of hotplug since the device is reset before the hotplug handler is
called. On the contrary with coldplug, the hotplug handler is called
first and device is only reset during the initial system reset. Trying
to create the FDT fragment on the hotplug path in this case, would
result in somthing like this:

ibm,dma-window = < 0x80000000 0x00 0x00 0x00 0x00 >;

This will cause linux in the guest to panic, by simply removing and
re-adding the PHB using the drmgr command:

	page = alloc_pages_node(nid, GFP_KERNEL, get_order(sz));
	if (!page)
		panic("iommu_init_table: Can't allocate %ld bytes\n", sz);

The second and maybe more problematic constraint is that the
"interrupt-map" property needs to reference the interrupt controller
node using the very same phandle that SLOF has already exposed to the
guest. QEMU requires SLOF to call the private KVMPPC_H_UPDATE_DT hcall
at some point to know about this phandle. With the latest QEMU and SLOF,
this happens when SLOF gets quiesced. This means that if the PHB gets
hotplugged after CAS but before SLOF quiesce, then we're sure that the
phandle is not known when the hotplug handler is called.

The FDT is only needed when the guest first invokes RTAS to configure
the connector actually, long after SLOF quiesce. Let's postpone the
creation of FDT fragments for PHBs to rtas_ibm_configure_connector().

Since we only need this for PHBs, introduce a new method in the base
DRC class for that. DRC subtypes will be converted to use it in
subsequent patches.

Allow spapr_drc_attach() to be passed a NULL fdt argument if the method
is available. When all DRC subtypes have been converted, the fdt argument
will eventually disappear.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059665823.1466090.18358845122627355537.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Benjamin Herrenschmidt 00fd075e18 target/ppc/spapr: Set LPCR:HR when using Radix mode
The HW relies on LPCR:HR along with the PATE to determine whether
to use Radix or Hash mode. In fact it uses LPCR:HR more commonly
than the PATE.

For us, it's also more efficient to do so, especially since unlike
the HW we do not maintain a cache of the current PATE and HV PATE
in a generic place.

Prepare the grounds for that by ensuring that LPCR:HR is set
properly on SPAPR machines.

Another option would have been to use a callback to get the PATE
but this gets messy when implementing bare metal support, it's
much simpler (and faster) to use LPCR.

Since existing migration streams may not have it, fix it up in
spapr_post_load() as well based on the pseudo-PATE entry that
we keep.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190215170029.15641-2-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Prasad J Pandit 27461d69a0 ppc: add host-serial and host-model machine attributes (CVE-2019-8934)
On ppc hosts, hypervisor shares following system attributes

  - /proc/device-tree/system-id
  - /proc/device-tree/model

with a guest. This could lead to information leakage and misuse.[*]
Add machine attributes to control such system information exposure
to a guest.

[*] https://wiki.openstack.org/wiki/OSSN/OSSN-0028

Reported-by: Daniel P. Berrangé <berrange@redhat.com>
Fix-suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <20190218181349.23885-1-ppandit@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:25 +11:00
Benjamin Herrenschmidt 67afe7759d target/ppc: Add POWER9 external interrupt model
Adds support for the Hypervisor directed interrupts in addition to the
OS ones.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: - modified the icp_realize() and xive_tctx_realize() to take
        into account explicitely the POWER9 interrupt model
      - introduced a specific power9_set_irq for POWER9 ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190215161648.9600-10-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26 09:21:24 +11:00
Greg Kurz 3272752a8b xics: Drop the KVM ICS class
The KVM ICS class isn't used anymore. Drop it.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155023084177.1011724.14693955932559990358.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18 10:52:08 +11:00
Greg Kurz 557b456729 xics: Handle KVM interrupt presentation from "simple" ICS code
We want to use the "simple" ICS type in both KVM and non-KVM setups.
Teach the "simple" ICS how to present interrupts to KVM and adapt
sPAPR accordingly.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155023082996.1011724.16237920586343905010.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18 10:43:19 +11:00
Greg Kurz d80b2ccfa7 xics: Explicitely call KVM ICS methods from the common code
The pre_save(), post_load() and synchronize_state() methods of the
ICSStateClass type are really KVM only things. Make that obvious
by dropping the indirections and directly calling the KVM functions
instead.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155023081817.1011724.14078777320394028836.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18 10:39:24 +11:00
Greg Kurz 8c1ced677d xics: Drop the KVM ICP class
The KVM ICP class isn't used anymore. Drop it.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155023081228.1011724.12474992370439652538.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18 10:37:33 +11:00
Greg Kurz 56af66566d spapr/irq: Use the base ICP class for KVM
The base ICP class knows how to interact with KVM. Adapt sPAPR to use it
instead of the ICP KVM class.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155023080638.1011724.792095453419098948.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18 10:34:56 +11:00
Greg Kurz 8e6e6efef7 xics: Handle KVM ICP realize from the common code
The realization of KVM ICP currently follows the parent_realize logic,
which is a bit overkill here. Also we want to get rid of the KVM ICP
class. Explicitely call icp_kvm_realize() from the base ICP realize
function.

Note that ICPStateClass::parent_realize is retained because powernv
needs it.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155023080049.1011724.15423463482790260696.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18 10:34:05 +11:00
Greg Kurz d82f397183 xics: Handle KVM ICP reset from the common code
The KVM ICP reset handler simply writes the ICP state to KVM. This
doesn't need the overkill parent_reset logic we have today. Call
icp_set_kvm_state() from the base ICP reset function instead.

Since there are no other users for ICPStateClass::parent_reset, and
it isn't currently expected to change, drop it as well.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155023079461.1011724.12644984391500635645.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18 10:29:55 +11:00
Greg Kurz 0e5c7fad9c xics: Explicitely call KVM ICP methods from the common code
The pre_save(), post_load() and synchronize_state() methods of the
ICPStateClass type are really KVM only things. Make that obvious
by dropping the indirections and directly calling the KVM functions
instead.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155023078871.1011724.3083923389814185598.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18 10:14:37 +11:00
Cédric Le Goater 2e66cdb715 spapr/irq: add an 'nr_irq' parameter to initialize the backend.
When using the 'dual' interrupt mode, the source numbers of both sPAPR
IRQ backends are aligned to share a common IRQ number space and to use
a similar mapping of the machine qemu_irq array which is indexed by
the source number.

The XICS IRQ number range initially being [ 0x1000 - 0x2000 ], this
requires to change the XICS ICSState offset to 0 and to provision for
an extra 4K of source numbers and qemu_irqs which will never be used
by the machine when running under the XICS interrupt mode. This is not
an optimal solution.

Change the init() method to allocate an IRQ number space of the
expected size for the XICS sPAPR IRQ backend. It breaks the interrupt
signaling when under the 'dual' mode because source numbers have
unexpected values but next patch will fix that.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190213210756.27032-2-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-17 21:54:02 +11:00
Greg Kurz 0afed8c819 xive: Only set source type for LSIs
MSI is the default and LSI specific code is guarded by the
xive_source_irq_is_lsi() helper. The xive_source_irq_set()
helper is a nop for MSIs.

Simplify the code by turning xive_source_irq_set() into
xive_source_irq_set_lsi() and only call it for LSIs. The
call to xive_source_irq_set(false) in spapr_xive_irq_free()
is also a nop. Just drop it.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <154999584656.690774.18352404495120358613.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-17 21:54:02 +11:00
Greg Kurz 5c7adcf422 spapr: Rename xics to intc in interrupt controller agnostic code
All this code is used with both the XICS and XIVE interrupt controllers.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-17 21:54:02 +11:00
Cédric Le Goater a28b9a5a8d spapr: move the interrupt presenters under machine_data
Next step is to remove them from under the PowerPCCPU

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-04 18:44:18 +11:00
Cédric Le Goater 8907fc25cf ppc/pnv: introduce a CPU machine_data
Include the interrupt presenter under the machine_data as we plan to
remove it from under PowerPCCPU

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-04 18:44:18 +11:00
Cédric Le Goater 40a5056c41 xive: add a get_tctx() method to the XiveRouter
It provides a mean to retrieve the XiveTCTX of a CPU. This will become
necessary with future changes which move the interrupt presenter
object pointers under the PowerPCCPU machine_data.

The PowerNV machine has an extra requirement on TIMA accesses that
this new method addresses. The machine can perform indirect loads and
stores on the TIMA on behalf of another CPU. The PIR being defined in
the controller registers, we need a way to peek in the controller
model to find the PIR value.

The XiveTCTX is moved above the XiveRouter definition to avoid forward
typedef declarations.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-04 18:44:18 +11:00
Cédric Le Goater 6bf6f3a1d1 ppc/xive: fix remaining XiveFabric names
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-04 18:44:17 +11:00
BALATON Zoltan 7d8ccf58d5 ppc4xx: Use ram_addr_t in ppc4xx_sdram_adjust()
To avoid overflow if larger values are added later use ram_addr_t for
the sdram_bank_sizes parameter to match ram_size to which it is compared.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-04 18:44:17 +11:00
Thomas Huth 0d8d6a24fc ppc: Fix duplicated typedefs to be able to compile with Clang in gnu99 mode
When compiling the ppc code with clang and -std=gnu99, there are a
couple of warnings/errors like this one:

  CC      ppc64-softmmu/hw/intc/xics.o
In file included from hw/intc/xics.c:35:
include/hw/ppc/xics.h:43:25: error: redefinition of typedef 'ICPState' is a C11 feature
      [-Werror,-Wtypedef-redefinition]
typedef struct ICPState ICPState;
                        ^
target/ppc/cpu.h:1181:25: note: previous definition is here
typedef struct ICPState ICPState;
                        ^
Work around the problems by including the proper headers in spapr.h
and by using struct forward declarations in cpu.h.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2019-01-22 05:14:33 +01:00
Thomas Huth a51d5afc69 ppc: Move spapr-related prototypes from xics.h into a seperate header file
When compiling with Clang in -std=gnu99 mode, there is a warning/error:

  CC      ppc64-softmmu/hw/intc/xics_spapr.o
In file included from /home/thuth/devel/qemu/hw/intc/xics_spapr.c:34:
/home/thuth/devel/qemu/include/hw/ppc/xics.h:203:34: error: redefinition of typedef 'sPAPRMachineState' is a C11 feature
      [-Werror,-Wtypedef-redefinition]
typedef struct sPAPRMachineState sPAPRMachineState;
                                 ^
/home/thuth/devel/qemu/include/hw/ppc/spapr_irq.h:25:34: note: previous definition is here
typedef struct sPAPRMachineState sPAPRMachineState;
                                 ^

We have to remove the duplicated typedef here and include "spapr.h" instead.
But "spapr.h" should not be included for the pnv machine files. So move
the spapr-related prototypes into a new file called "xics_spapr.h" instead.

Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2019-01-22 05:14:33 +01:00
Cédric Le Goater 3a8eb78e6c spapr: enable XIVE MMIOs at reset
Depending on the interrupt mode of the machine, enable or disable the
XIVE MMIOs.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-01-09 09:28:14 +11:00
Cédric Le Goater 13db0cd9b8 spapr: introduce a new sPAPR IRQ backend supporting XIVE and XICS
The 'dual' sPAPR IRQ backend supports both interrupt mode, XIVE
exploitation mode and the legacy compatibility mode (XICS). both modes
are not supported at the same time.

The machine starts with the legacy mode and a new interrupt mode can
then be negotiated by the CAS process. In this case, the new mode is
activated after a reset to take into account the required changes in
the machine. These impact the device tree layout, the interrupt
presenter object and the exposed MMIO regions in the case of XIVE.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-01-09 09:28:14 +11:00
Cédric Le Goater 72c1e5a66a ppc/xics: allow ICSState to have an offset 0
commit 15ed653fa4 ("ppc/xics: An ICS with offset 0 is assumed to be
uninitialized") introduced an extra check on the ICS offset which is
not strictly necessary.

Revert the change to be able to map the XICS IRQ number space on the
XIVE IRQ number space.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-01-09 09:28:14 +11:00
Cédric Le Goater 872ff3dea3 spapr: move the qemu_irq array under the machine
The qemu_irq array is now allocated at the machine level using a sPAPR
IRQ set_irq handler depending on the chosen interrupt mode. The use of
this handler is slightly inefficient today but it will become necessary
when the 'dual' interrupt mode is introduced.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-01-09 09:28:14 +11:00
Cédric Le Goater f8df900316 pnv/psi: move the ICSState qemu_irq array under the PSI device model
Future changes of the ICSState object will remove the qemu_irq array
from under the interrupt controller model. Prepare ground for the PSI
interrupt sources and introduce a new one directly under the PSI
device model.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-01-09 09:28:14 +11:00
Cédric Le Goater 734d9c8905 ppc: export the XICS and XIVE set_irq handlers
To support the 'dual' interrupt mode, XICS and XIVE, we plan to move
the qemu_irq array of each interrupt controller under the machine and
do the allocation under the sPAPR IRQ init method.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-01-09 09:28:14 +11:00
Cédric Le Goater 8fa1f4ef38 spapr: modify the prototype of the cpu_intc_create() method
Today, the interrupt presenter is linked to a CPU using the
cpu_intc_create() method of the sPAPR IRQ backend. The resulting
object is assigned to the PowerPCCPU 'intc' pointer whatever the
interrupt mode, XICS or XIVE.

To support the 'dual' interrupt mode, we will need to distinguish
between the two presenter objects and for that, we plan to introduce a
second interrupt presenter object pointer under the PowerPCCPU. The
modifications below move the assignment of the presenter object under
the cpu_intc_create() method to prepare ground for the future changes.

Both sPAPR and PowerNV machines are impacted.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-01-09 09:28:14 +11:00
Cédric Le Goater a0c493ae67 spapr/xive: simplify the sPAPR IRQ qirq method for XIVE
The qirq routines of the XiveSource and the sPAPRXive model are only
used under the sPAPR IRQ backend. Simplify the overall call stack and
gather all the code under spapr_qirq_xive(). It will ease future
changes.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-01-09 09:28:14 +11:00
Alexey Kardashevskiy fea35ca4b8 ppc/spapr: Receive and store device tree blob from SLOF
SLOF receives a device tree and updates it with various properties
before switching to the guest kernel and QEMU is not aware of any changes
made by SLOF. Since there is no real RTAS (QEMU implements it), it makes
sense to pass the SLOF final device tree to QEMU to let it implement
RTAS related tasks better, such as PCI host bus adapter hotplug.

Specifially, now QEMU can find out the actual XICS phandle (for PHB
hotplug) and the RTAS linux,rtas-entry/base properties (for firmware
assisted NMI - FWNMI).

This stores the initial DT blob in the sPAPR machine and replaces it
in the KVMPPC_H_UPDATE_DT (new private hypercall) handler.

This adds an @update_dt_enabled machine property to allow backward
migration.

SLOF already has a hypercall since
https://github.com/aik/SLOF/commit/e6fc84652c9c0073f9183

This makes use of the new fdt_check_full() helper. In order to allow
the configure script to pick the correct DTC version, this adjusts
the DTC presense test.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-01-09 09:28:13 +11:00
Laurent Vivier c24ba3d0a3 spapr: Add H-Call H_HOME_NODE_ASSOCIATIVITY
H_HOME_NODE_ASSOCIATIVITY H-Call returns the associativity domain
designation associated with the identifier input parameter

This fixes a crash when we try to hotplug a CPU in memory-less and
CPU-less numa node. In this case, the kernel tries to online the
node, but without the information provided by this h-call, the node id,
it cannot and the CPU is started while the node is not onlined.

It also removes the warning message from the kernel:
  VPHN is not supported. Disabling polling..

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-01-09 09:28:13 +11:00
Cédric Le Goater 3ba3d0bc33 spapr: introduce an 'ic-mode' machine option
This option is used to select the interrupt controller mode (XICS or
XIVE) with which the machine will operate. XICS being the default
mode for now.

When running a machine with the XIVE interrupt mode backend, the guest
OS is required to have support for the XIVE exploitation mode. In the
case of legacy OS, the mode selected by CAS should be XICS and the OS
should fail to boot. However, QEMU could possibly detect it, terminate
the boot process and reset to stop in the SLOF firmware. This is not
yet handled.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:40:43 +11:00
Cédric Le Goater db592b5b16 spapr: add an extra OV5 field to the sPAPR IRQ backend
The interrupt modes supported by the hypervisor are advertised to the
guest with new bits definitions of the option vector 5 of property
"ibm,arch-vec-5-platform-support. The byte 23 bits 0-1 of the OV5 are
defined as follow :

  0b00   PAPR 2.7 and earlier (Legacy systems)
  0b01   XIVE Exploitation mode only
  0b10   Either available

If the client/guest selects the XIVE interrupt mode, it informs the
hypervisor by returning the value 0b01 in byte 23 bits 0-1. A 0b00
value indicates the use of the XICS interrupt mode (Legacy systems).

The sPAPR IRQ backend is extended with these definitions and the
values are directly used to populate the "ibm,arch-vec-5-platform-support"
property. The interrupt mode is advertised under TCG and under KVM.
Although a KVM XIVE device is not yet available, the machine can still
operate with kernel_irqchip=off. However, we apply a restriction on
the CPU which is required to be a POWER9 when a XIVE interrupt
controller is in use.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:40:43 +11:00
Cédric Le Goater b2e2247716 spapr: add a 'reset' method to the sPAPR IRQ backend
For the time being, the XIVE reset handler updates the OS CAM line of
the vCPU as it is done under a real hypervisor when a vCPU is
scheduled to run on a HW thread. This will let the XIVE presenter
engine find a match among the NVTs dispatched on the HW threads.

This handler will become even more useful when we introduce the
machine supporting both interrupt modes, XIVE and XICS. In this
machine, the interrupt mode is chosen by the CAS negotiation process
and activated after a reset.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: Fix style nits]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:40:35 +11:00
Cédric Le Goater 1c53b06c03 spapr: extend the sPAPR IRQ backend for XICS migration
Introduce a new sPAPR IRQ handler to handle resend after migration
when the machine is using a KVM XICS interrupt controller model.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:39:13 +11:00
Cédric Le Goater 1a937ad7e7 spapr: allocate the interrupt thread context under the CPU core
Each interrupt mode has its own specific interrupt presenter object,
that we store under the CPU object, one for XICS and one for XIVE.

Extend the sPAPR IRQ backend with a new handler to support them both.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:39:13 +11:00
Cédric Le Goater 6e21de4a50 spapr: add device tree support for the XIVE exploitation mode
The XIVE interface for the guest is described in the device tree under
the "interrupt-controller" node. A couple of new properties are
specific to XIVE :

 - "reg"

   contains the base address and size of the thread interrupt
   managnement areas (TIMA), for the User level and for the Guest OS
   level. Only the Guest OS level is taken into account today.

 - "ibm,xive-eq-sizes"

   the size of the event queues. One cell per size supported, contains
   log2 of size, in ascending order.

 - "ibm,xive-lisn-ranges"

   the IRQ interrupt number ranges assigned to the guest for the IPIs.

and also under the root node :

 - "ibm,plat-res-int-priorities"

   contains a list of priorities that the hypervisor has reserved for
   its own use. OPAL uses the priority 7 queue to automatically
   escalate interrupts for all other queues (DD2.X POWER9). So only
   priorities [0..6] are allowed for the guest.

Extend the sPAPR IRQ backend with a new handler to populate the DT
with the appropriate "interrupt-controller" node.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: Fix style nits]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:39:07 +11:00
Cédric Le Goater 23bcd5eb9a spapr: add hcalls support for the XIVE exploitation interrupt mode
The different XIVE virtualization structures (sources and event queues)
are configured with a set of Hypervisor calls :

 - H_INT_GET_SOURCE_INFO

   used to obtain the address of the MMIO page of the Event State
   Buffer (ESB) entry associated with the source.

 - H_INT_SET_SOURCE_CONFIG

   assigns a source to a "target".

 - H_INT_GET_SOURCE_CONFIG

   determines which "target" and "priority" is assigned to a source

 - H_INT_GET_QUEUE_INFO

   returns the address of the notification management page associated
   with the specified "target" and "priority".

 - H_INT_SET_QUEUE_CONFIG

   sets or resets the event queue for a given "target" and "priority".
   It is also used to set the notification configuration associated
   with the queue, only unconditional notification is supported for
   the moment. Reset is performed with a queue size of 0 and queueing
   is disabled in that case.

 - H_INT_GET_QUEUE_CONFIG

   returns the queue settings for a given "target" and "priority".

 - H_INT_RESET

   resets all of the guest's internal interrupt structures to their
   initial state, losing all configuration set via the hcalls
   H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.

 - H_INT_SYNC

   issue a synchronisation on a source to make sure all notifications
   have reached their queue.

Calls that still need to be addressed :

   H_INT_SET_OS_REPORTING_LINE
   H_INT_GET_OS_REPORTING_LINE

See the code for more documentation on each hcall.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Folded in fix for field accessors]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:37:38 +11:00
Cédric Le Goater dcc345b61e spapr: introduce a new machine IRQ backend for XIVE
The XIVE IRQ backend uses the same layout as the new XICS backend but
covers the full range of the IRQ number space. The IRQ numbers for the
CPU IPIs are allocated at the bottom of this space, below 4K, to
preserve compatibility with XICS which does not use that range.

This should be enough given that the maximum number of CPUs is 1024
for the sPAPR machine under QEMU. For the record, the biggest POWER8
or POWER9 system has a maximum of 1536 HW threads (16 sockets, 192
cores, SMT8).

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:37:38 +11:00
Cédric Le Goater 3aa597f650 spapr/xive: introduce a XIVE interrupt controller
sPAPRXive models the XIVE interrupt controller of the sPAPR machine.
It inherits from the XiveRouter and provisions storage for the routing
tables :

  - Event Assignment Structure (EAS)
  - Event Notification Descriptor (END)

The sPAPRXive model incorporates an internal XiveSource for the IPIs
and for the interrupts of the virtual devices of the guest. This model
is consistent with XIVE architecture which also incorporates an
internal IVSE for IPIs and accelerator interrupts in the IVRE
sub-engine.

The sPAPRXive model exports two memory regions, one for the ESB
trigger and management pages used to control the sources and one for
the TIMA pages. They are mapped by default at the addresses found on
chip 0 of a baremetal system. This is also consistent with the XIVE
architecture which defines a Virtualization Controller BAR for the
internal IVSE ESB pages and a Thread Managment BAR for the TIMA.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Fold in field accessor fixes]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:37:38 +11:00
Cédric Le Goater af53dbf622 ppc/xive: introduce a simplified XIVE presenter
The last sub-engine of the XIVE architecture is the Interrupt
Virtualization Presentation Engine (IVPE). On HW, the IVRE and the
IVPE share elements, the Power Bus interface (CQ), the routing table
descriptors, and they can be combined in the same HW logic. We do the
same in QEMU and combine both engines in the XiveRouter for
simplicity.

When the IVRE has completed its job of matching an event source with a
Notification Virtual Target (NVT) to notify, it forwards the event
notification to the IVPE sub-engine. The IVPE scans the thread
interrupt contexts of the Notification Virtual Targets (NVT)
dispatched on the HW processor threads and if a match is found, it
signals the thread. If not, the IVPE escalates the notification to
some other targets and records the notification in a backlog queue.

The IVPE maintains the thread interrupt context state for each of its
NVTs not dispatched on HW processor threads in the Notification
Virtual Target table (NVTT).

The model currently only supports single NVT notifications.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: Folded in fix for field accessors]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:37:04 +11:00
Cédric Le Goater 207d9fe985 ppc/xive: introduce the XIVE interrupt thread context
Each POWER9 processor chip has a XIVE presenter that can generate four
different exceptions to its threads:

  - hypervisor exception,
  - O/S exception
  - Event-Based Branch (EBB)
  - msgsnd (doorbell).

Each exception has a state independent from the others called a Thread
Interrupt Management context. This context is a set of registers which
lets the thread handle priority management and interrupt acknowledgment
among other things. The most important ones being :

  - Interrupt Priority Register  (PIPR)
  - Interrupt Pending Buffer     (IPB)
  - Current Processor Priority   (CPPR)
  - Notification Source Register (NSR)

These registers are accessible through a specific MMIO region, called
the Thread Interrupt Management Area (TIMA), four aligned pages, each
exposing a different view of the registers. First page (page address
ending in 0b00) gives access to the entire context and is reserved for
the ring 0 view for the physical thread context. The second (page
address ending in 0b01) is for the hypervisor, ring 1 view. The third
(page address ending in 0b10) is for the operating system, ring 2
view. The fourth (page address ending in 0b11) is for user level, ring
3 view.

The thread interrupt context is modeled with a XiveTCTX object
containing the values of the different exception registers. The TIMA
region is mapped at the same address for each CPU.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:29:12 +11:00
Cédric Le Goater 002686be42 ppc/xive: add support for the END Event State Buffers
The Event Notification Descriptor (END) XIVE structure also contains
two Event State Buffers providing further coalescing of interrupts,
one for the notification event (ESn) and one for the escalation events
(ESe). A MMIO page is assigned for each to control the EOI through
loads only. Stores are not allowed.

The END ESBs are modeled through an object resembling the 'XiveSource'
It is stateless as the END state bits are backed into the XiveEND
structure under the XiveRouter and the MMIO accesses follow the same
rules as for the XiveSource ESBs.

END ESBs are not supported by the Linux drivers neither on OPAL nor on
sPAPR. Nevetherless, it provides a mean to study the question in the
future and validates a bit more the XIVE model.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: Fold in a later fix for field access]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:29:12 +11:00
Cédric Le Goater 1a518e7693 spapr: export and rename the xics_max_server_number() routine
The XIVE sPAPR IRQ backend will use it to define the number of ENDs of
the IC controller.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:29:10 +11:00
Cédric Le Goater fab397d84a spapr: introduce a spapr_irq_init() routine
Initialize the MSI bitmap from it as this will be necessary for the
sPAPR IRQ backend for XIVE.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:28:47 +11:00
Cédric Le Goater e4ddaac67f ppc/xive: introduce the XIVE Event Notification Descriptors
To complete the event routing, the IVRE sub-engine uses a second table
containing Event Notification Descriptor (END) structures.

An END specifies on which Event Queue (EQ) the event notification
data, defined in the associated EAS, should be posted when an
exception occurs. It also defines which Notification Virtual Target
(NVT) should be notified.

The Event Queue is a memory page provided by the O/S defining a
circular buffer, one per server and priority couple, containing Event
Queue entries. These are 4 bytes long, the first bit being a
'generation' bit and the 31 following bits the END Data field. They
are pulled by the O/S when the exception occurs.

The END Data field is a way to set an invariant logical event source
number for an IRQ. On sPAPR machines, it is set with the
H_INT_SET_SOURCE_CONFIG hcall when the EISN flag is used.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: Fold in a later fix from Cédric fixing field accessors]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:26:42 +11:00
Cédric Le Goater 7ff7ea9280 ppc/xive: introduce the XiveRouter model
The XiveRouter models the second sub-engine of the XIVE architecture :
the Interrupt Virtualization Routing Engine (IVRE).

The IVRE handles event notifications of the IVSE and performs the
interrupt routing process. For this purpose, it uses a set of tables
stored in system memory, the first of which being the Event Assignment
Structure (EAS) table.

The EAT associates an interrupt source number with an Event Notification
Descriptor (END) which will be used in a second phase of the routing
process to identify a Notification Virtual Target.

The XiveRouter is an abstract class which needs to be inherited from
to define a storage for the EAT, and other upcoming tables.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: Folded in parts of a later fix by Cédric fixing field access]
[dwg: Fix style nits]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:26:31 +11:00
Cédric Le Goater 5e79b155a8 ppc/xive: introduce the XiveNotifier interface
The XiveNotifier offers a simple interface, between the XiveSource
object and the main interrupt controller of the machine. It will
forward event notifications to the XIVE Interrupt Virtualization
Routing Engine (IVRE).

Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: Adjust type name string for XiveNotifier]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:24:23 +11:00
Cédric Le Goater 5fd9ef18a9 ppc/xive: add support for the LSI interrupt sources
The 'sent' status of the LSI interrupt source is modeled with the 'P'
bit of the ESB and the assertion status of the source is maintained
with an extra bit under the main XiveSource object. The type of the
source is stored in the same array for practical reasons.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: Fix style nit]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:24:23 +11:00
Cédric Le Goater 02e3ff548d ppc/xive: introduce a XIVE interrupt source model
The first sub-engine of the overall XIVE architecture is the Interrupt
Virtualization Source Engine (IVSE). An IVSE can be integrated into
another logic, like in a PCI PHB or in the main interrupt controller
to manage IPIs.

Each IVSE instance is associated with an Event State Buffer (ESB) that
contains a two bit state entry for each possible event source. When an
event is signaled to the IVSE, by MMIO or some other means, the
associated interrupt state bits are fetched from the ESB and
modified. Depending on the resulting ESB state, the event is forwarded
to the IVRE sub-engine of the controller doing the routing.

Each supported ESB entry is associated with either a single or a
even/odd pair of pages which provides commands to manage the source:
to EOI, to turn off the source for instance.

On a sPAPR machine, the O/S will obtain the page address of the ESB
entry associated with a source and its characteristic using the
H_INT_GET_SOURCE_INFO hcall. On PowerNV, a similar OPAL call is used.

The xive_source_notify() routine is in charge forwarding the source
event notification to the routing engine. It will be filled later on.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:24:23 +11:00
Greg Kurz 9929301ee1 mac_newworld: simplify IRQ wiring
The OpenPIC have 5 outputs per connected CPU. The machine init code hence
needs a bi-dimensional array (smp_cpu lines, 5 columns) to wire up the irqs
between the PIC and the CPUs.

The current code first allocates an array of smp_cpus pointers to qemu_irq
type, then it allocates another array of smp_cpus * 5 qemu_irq and fills the
first array with pointers to each line of the second array. This is rather
convoluted.

Simplify the logic by introducing a structured type that describes all the
OpenPIC outputs for a single CPU, ie, fixed size of 5 qemu_irq, and only
allocate a smp_cpu sized array of those.

This also allows to use g_new(T, n) instead of g_malloc(sizeof(T) * n)
as recommended in HACKING.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21 09:24:23 +11:00
Suraj Jitindar Singh b9a477b725 ppc/spapr_caps: Add SPAPR_CAP_NESTED_KVM_HV
Add the spapr cap SPAPR_CAP_NESTED_KVM_HV to be used to control the
availability of nested kvm-hv to the level 1 (L1) guest.

Assuming a hypervisor with support enabled an L1 guest can be allowed to
use the kvm-hv module (and thus run it's own kvm-hv guests) by setting:
-machine pseries,cap-nested-hv=true
or disabled with:
-machine pseries,cap-nested-hv=false

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-11-08 13:08:35 +11:00
Thomas Huth 0e947a89ce hw/ppc/spapr_rng: Introduce CONFIG_SPAPR_RNG switch for spapr_rng.c
The spapr-rng device is suboptimal when compared to virtio-rng, so
users might want to disable it in their builds. Thus let's introduce
a proper CONFIG switch to allow us to compile QEMU without this device.
The function spapr_rng_populate_dt is required for linking, so move it
to a different location.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-11-08 12:04:40 +11:00
Cédric Le Goater ae83740237 spapr: increase the size of the IRQ number space
The new layout using static IRQ number does not leave much space to
the dynamic MSI range, only 0x100 IRQ numbers. Increase the total
number of IRQS for newer machines and introduce a legacy XICS backend
for pre-3.1 machines to maintain compatibility.

For the old backend, provide a 'nr_msis' value covering the full IRQ
number space as it does not use the bitmap allocator to allocate MSI
interrupt numbers.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-09-25 11:12:25 +10:00
Cédric Le Goater e39de895f6 spapr: introduce a spapr_irq class 'nr_msis' attribute
The number of MSI interrupts a sPAPR machine can allocate is in direct
relation with the number of interrupts of the sPAPRIrq backend. Define
statically this value at the sPAPRIrq class level and use it for the
"ibm,pe-total-#msi" property of the sPAPR PHB.

According to the PAPR specs, "ibm,pe-total-#msi" defines the maximum
number of MSIs that are available to the PE. We choose to advertise
the maximum number of MSIs that are available to the machine for
simplicity of the model and to avoid segmenting the MSI interrupt pool
which can be easily shared. If the pool limit is reached, it can be
extended dynamically.

Finally, remove XICS_IRQS_SPAPR which is now unused.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-09-25 11:12:25 +10:00
Cédric Le Goater ef01ed9d19 spapr: introduce a IRQ controller backend to the machine
This proposal moves all the related IRQ routines of the sPAPR machine
behind a sPAPR IRQ backend interface 'spapr_irq' to prepare for future
changes. First of which will be to increase the size of the IRQ number
space, then, will follow a new backend for the POWER9 XIVE IRQ controller.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-08-21 14:28:45 +10:00
Cédric Le Goater 82cffa2eb2 spapr: introduce a fixed IRQ number space
This proposal introduces a new IRQ number space layout using static
numbers for all devices, depending on a device index, and a bitmap
allocator for the MSI IRQ numbers which are negotiated by the guest at
runtime.

As the VIO device model does not have a device index but a "reg"
property, we introduce a formula to compute an IRQ number from a "reg"
value. It should minimize most of the collisions.

The previous layout is kept in pre-3.1 machines raising the
'legacy_irq_allocation' machine class flag.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-08-21 14:28:45 +10:00
Greg Kurz 71c55a1eef xics: don't include "target/ppc/cpu-qom.h" in "hw/ppc/xics.h"
The last user of the PowerPCCPU typedef in "hw/ppc/xics.h" vanished with
commit b1fd36c363. It isn't necessary to
include "target/ppc/cpu-qom.h" there anymore.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-08-21 14:28:45 +10:00
Peter Maydell b07cd3e748 ppc patch queue 2018-07-03
Here's a last minue pull request before today's soft freeze.  Ideally
 I would have sent this earlier, but I was waiting for a couple of
 extra fixes I knew were close.  And the freeze crept up on me, like
 always.
 
 Most of the changes here are bugfixes in any case.  There are some
 cleanups as well, which have been in my staging tree for a little
 while.  There are a couple of truly new features (some extensions to
 the sam460ex platform), but these are low risk, since they only affect
 a new and not really stabilized machine type anyway.
 
 Higlights are:
   * Mac platform improvements from Mark Cave-Ayland
   * Sam460ex improvements from BALATON Zoltan et al.
   * XICS interrupt handler cleanups from Cédric Le Goater
   * TCG improvements for atomic loads and stores from Richard
     Henderson
   * Assorted other bugfixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAls7D8oACgkQbDjKyiDZ
 s5Lxmg//YzPfC/nKqTTKkyJPzh/NnSC+kRTMAT3mbxdRIc7yfgMqJtWGGbS1iKgK
 EeJ9hl5Qm0HfscfDuzf0xasU62ZEv3kNdLnWJEIgkqiXrxoO5KCnC0y4D8NN1W03
 mvINNCa8+QDg2OsirGmNUTkriiG3wLIrHTpLZ4+JuC2Bd9H3nTHZgJ0MXON/1VWY
 oRgr6kMZ5+IAzPhvYLFR6l3nPI883fgJOFyRo7YqYrkVBKFrFkfK0Xjw6vpsNxcx
 2dE/YCHhNIriLuBG5noewL7GuqZRtLnl6rjjee5VAKIe1EmFeR+jsXwNjzGOVOJg
 dhjOtsJsQQ3WdEw5uImJzE64kV228WCgmkeXzZd1010JBLr7sUkrd2EuoZ23vvat
 uvZAHVSBrJg5WvzMo1VMEoPU3VeeZQ5HL+MI80iKiU6oUgRK11gVJcebtA0sEKt+
 zhJC4JiUlHtZLTGIpMBmU8DJZ3Tyk1cBEm+Ky+SaPE+dsz16UHI0fazFQXJnXphE
 MLHEGAyQgzWYp7kIcAjUFev0Geq/Uovy4JKIGI6ISop1wRPEQDxkthfkfRyQxQkE
 zuse4EBcEH/Undw9KrmEQa0hCe+8BRkxklVbPesFPPdqH3PKNxtHYuWpSShQF0PW
 XMjw43O2Rbsl8kBUHCpy4pYSugD1hpfgaw/mVUOU1u/M1O6toTw=
 =AHrx
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-3.0-20180703' into staging

ppc patch queue 2018-07-03

Here's a last minue pull request before today's soft freeze.  Ideally
I would have sent this earlier, but I was waiting for a couple of
extra fixes I knew were close.  And the freeze crept up on me, like
always.

Most of the changes here are bugfixes in any case.  There are some
cleanups as well, which have been in my staging tree for a little
while.  There are a couple of truly new features (some extensions to
the sam460ex platform), but these are low risk, since they only affect
a new and not really stabilized machine type anyway.

Higlights are:
  * Mac platform improvements from Mark Cave-Ayland
  * Sam460ex improvements from BALATON Zoltan et al.
  * XICS interrupt handler cleanups from Cédric Le Goater
  * TCG improvements for atomic loads and stores from Richard
    Henderson
  * Assorted other bugfixes

# gpg: Signature made Tue 03 Jul 2018 06:55:22 BST
# gpg:                using RSA key 6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg:                 aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-3.0-20180703: (35 commits)
  ppc: Include vga cirrus card into the compiling process
  target/ppc: Relax reserved bitmask of indexed store instructions
  target/ppc: set is_jmp on ppc_tr_breakpoint_check
  spapr: compute default value of "hpt-max-page-size" later
  target/ppc/kvm: don't pass cpu to kvm_get_smmu_info()
  target/ppc/kvm: get rid of kvm_get_fallback_smmu_info()
  ppc440_uc: Basic emulation of PPC440 DMA controller
  sam460ex: Add RTC device
  hw/timer: Add basic M41T80 emulation
  ppc4xx_i2c: Rewrite to model hardware more closely
  hw/ppc: Give sam46ex its own config option
  fpu_helper.c: fix setting FPSCR[FI] bit
  target/ppc: Implement the rest of gen_st_atomic
  target/ppc: Implement the rest of gen_ld_atomic
  target/ppc: Use atomic min/max helpers
  target/ppc: Use MO_ALIGN for EXIWX and ECOWX
  target/ppc: Split out gen_st_atomic
  target/ppc: Split out gen_ld_atomic
  target/ppc: Split out gen_load_locked
  target/ppc: Tidy gen_conditional_store
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

# Conflicts:
#	hw/ppc/spapr.c
2018-07-03 14:59:27 +01:00
Cédric Le Goater eeefd43b3c ppx/xics: introduce a parent_reset in ICSStateClass
Just like for the realize handlers, this makes possible to move the
common ICSState code of the reset handlers in the ics-base class.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-07-03 09:56:51 +10:00
Cédric Le Goater 0a647b76db ppc/xics: introduce a parent_realize in ICSStateClass
This makes possible to move the common ICSState code of the realize
handlers in the ics-base class.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-07-03 09:56:51 +10:00
Cédric Le Goater a028dd423e ppc/xics: introduce ICP DeviceRealize and DeviceReset handlers
This changes the ICP realize and reset handlers in DeviceRealize and
DeviceReset handlers. parent handlers are now called from the
inheriting classes which is a cleaner object pattern.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-07-03 09:56:51 +10:00
Philippe Mathieu-Daudé ab3dd74924 hw/ppc: Use the IEC binary prefix definitions
It eases code review, unit is explicit.

Patch generated using:

  $ git grep -E '(1024|2048|4096|8192|(<<|>>).?(10|20|30))' hw/ include/hw/

and modified manually.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20180625124238.25339-33-f4bug@amsat.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-07-02 15:41:16 +02:00
David Gibson 123eec6552 spapr: Use maximum page size capability to simplify memory backend checking
The way we used to handle KVM allowable guest pagesizes for PAPR guests
required some convoluted checking of memory attached to the guest.

The allowable pagesizes advertised to the guest cpus depended on the memory
which was attached at boot, but then we needed to ensure that any memory
later hotplugged didn't change which pagesizes were allowed.

Now that we have an explicit machine option to control the allowable
maximum pagesize we can simplify this.  We just check all memory backends
against that declared pagesize.  We check base and cold-plugged memory at
reset time, and hotplugged memory at pre_plug() time.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2018-06-22 14:19:07 +10:00
David Gibson 2309832afd spapr: Maximum (HPT) pagesize property
The way the POWER Hash Page Table (HPT) MMU is virtualized by KVM HV means
that every page that the guest puts in the pagetables must be truly
physically contiguous, not just GPA-contiguous.  In effect this means that
an HPT guest can't use any pagesizes greater than the host page size used
to back its memory.

At present we handle this by changing what we advertise to the guest based
on the backing pagesizes.  This is pretty bad, because it means the guest
sees a different environment depending on what should be host configuration
details.

As a start on fixing this, we add a new capability parameter to the
pseries machine type which gives the maximum allowed pagesizes for an
HPT guest.  For now we just create and validate the parameter without
making it do anything.

For backwards compatibility, on older machine types we set it to the max
available page size for the host.  For the 3.0 machine type, we fix it to
16, the intention being to only allow HPT pagesizes up to 64kiB by default
in future.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2018-06-22 14:19:07 +10:00
Cédric Le Goater 71b5c8d26e spapr: remove unused spapr_irq routines
spapr_irq_alloc_block and spapr_irq_alloc() are now deprecated.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-21 21:22:53 +10:00