Next set of patches for ppc and spapr. There's a lot in this one:
* Support "STOP light" states on POWER9
* Add support for HVI interrupts on POWER9 (powernv machine)
* CVE-2019-8934: Don't leak host model and serial information to the guest
* Tests and cleanups for various hot unplug options
* Hash and radix MMU implementation on POWER9 for powernv machine
* PCI Host Bridge hotplug support for pseries machine
* Allow larger kernels and initrds for powernv machine
Plus a handful of miscellaneous fixes and cleanups.
The cpu hotplug tests and cleanups from David Hildenbrand aren't
solely power related. However the consensus amongst Michael Tsirkin,
David Hildenbrand, Cornelia Huck and myself was that it made most
sense to come in via my tree.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAlx0tIoACgkQbDjKyiDZ
s5I+ZhAA1LHyTGh2xXYcE1BFUFeYWqpDmFEghKSGvnCjhqkPoACBUyR3GKcNF+vf
sgX/OgoPPgXE8QpB/4hzcMxiNxFTPNOX+1p5oGv3zzokjyF7qtlVvVub3BI0cvjg
37eGLW9iLmc/PhiS7kDVSDwQpyhombsU4jb73cp6RqYTKT0wHHl/As3WmzIWW4bk
BguUE6zPROEuVyQSxiL2pTWv4UBSsMrqqwCBkbAohXkDCjntaSdHCxmaHyf+VXqe
ac50BSIAkAEIiJiPOGEJkuIOm1goE823RGwuPQWvkfM3flozmTYWh/Y+Y2t9NMBR
sC8Ly9Wo3Lz/sDr3cfL5HZ3NXCayDZwJEllbHqzDyjSJzU3gY3XMyWnIM0NTckTr
n5wX1OLghTYkgYkDLRyi9Nj1Gd0B11OfMsw17/Bj9hyz3k1KdgyJ98UZkwUBqvbC
kwrwkSutMrs8qqAZM6xtn++ABYgxhLOlY83U8rfAXEebUixAj/6WOmxgyYiV+m/n
9qQfPD8301lxpmmowBVuGyBKcdFUJ+QYNXD3a1S/vphvA2+G1y1SccMrlz2WEYol
gXVVe1tpA0ohmwflFX87zDOeyvO1gezhtXdaDlVjyeXOaGYUV3Srjei9w1p3PTs0
FsKwC/bL+cbTmi43qj5et0HG5Fx48fjIOjEqCcVBaz0ZQqjkdus=
=Z4Z6
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.0-20190226' into staging
ppc patch queue 2019-02-26
Next set of patches for ppc and spapr. There's a lot in this one:
* Support "STOP light" states on POWER9
* Add support for HVI interrupts on POWER9 (powernv machine)
* CVE-2019-8934: Don't leak host model and serial information to the guest
* Tests and cleanups for various hot unplug options
* Hash and radix MMU implementation on POWER9 for powernv machine
* PCI Host Bridge hotplug support for pseries machine
* Allow larger kernels and initrds for powernv machine
Plus a handful of miscellaneous fixes and cleanups.
The cpu hotplug tests and cleanups from David Hildenbrand aren't
solely power related. However the consensus amongst Michael Tsirkin,
David Hildenbrand, Cornelia Huck and myself was that it made most
sense to come in via my tree.
# gpg: Signature made Tue 26 Feb 2019 03:37:46 GMT
# gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full]
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full]
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full]
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-4.0-20190226: (50 commits)
ppc/pnv: use IEC binary prefixes to represent sizes
ppc/pnv: add INITRD_MAX_SIZE constant
ppc/pnv: increase kernel size limit to 256MiB
hw/ppc: Use object_initialize_child for correct reference counting
ppc/xive: xive does not have a POWER7 interrupt model
tests/device-plug: Add PHB unplug request test for spapr
spapr: enable PHB hotplug for default pseries machine type
spapr: add hotplug hooks for PHB hotplug
spapr_pci: add ibm, my-drc-index property for PHB hotplug
spapr_pci: provide node start offset via spapr_populate_pci_dt()
spapr_events: add support for phb hotplug events
spapr: populate PHB DRC entries for root DT node
spapr: create DR connectors for PHBs
spapr_pci: add PHB unrealize
spapr_irq: Expose the phandle of the interrupt controller
spapr: Expose the name of the interrupt controller node
xics: Write source state to KVM at claim time
spapr/drc: Drop spapr_drc_attach() fdt argument
spapr/pci: Generate FDT fragment at configure connector time
spapr: Generate FDT fragment for CPUs at configure connector time
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
- Block graph change fixes (avoid loops, cope with non-tree graphs)
- bdrv_set_aio_context() related fixes
- HMP snapshot commands: Use only tag, not the ID to identify snapshots
- qmeu-img, commit: Error path fixes
- block/nvme: Build fix for gcc 9
- MAINTAINERS updates
- Fix various issues with bdrv_refresh_filename()
- Fix various iotests
- Include LUKS overhead in qemu-img measure for qcow2
- A fix for vmdk's image creation interface
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJcc/knAAoJEH8JsnLIjy/WptQP/3F8Lh52H4egXaP7NUUuDjQM
AhqhuDAp/EZBS+xim9kLTogNJADe/rMWdSX/YB5aLpSPYbjasC66NgaLhd6QewgQ
VIcsLUdlYAyZ5ZjJytimfMTLwm1X02RmVIe55y52DTY8LlfViZzOlf3qwqPm00ao
EJB2cl8UJLM+PVEu59cCw3R0/06LY+WIJRB32d3tnCBRTkaJwfR9h4lrp/juVcFZ
U+2eWU68KMbUHSYiWANowN+KRV3uPY4HVA98v3F0vDmcBxlVHOeBg6S+PcT7tK8p
huzCMwcdwUyPMJgVs/+WBtUnbG0jN6SHUYmFLz859UMVgBnCw5tzBMf8qw1wOA4A
Iw+zor27Pxj4IlxcLPp5f97YZ8k9acdMR2VKPH6xLJZ1JF+sKa54RfzESd5EJeIj
Mfcp773H0lIaWcFJ6RY1F0L1E1ta7QigwNBiWMdYfh0a0EWHnDvGyYeaSPYEQ+rl
e8bZOcfrYwVI7DTDiZOIkGA9D8DXEPDNp+sl6s1DxeY69D0NNaXTtCPqFNNAbFbd
20uD7yDRZlWq32cQB/K9D5cSkZRSOzdUpLfLU3nQU2+dz11x6OpM6m7DVboSrztD
1HtPPDzDEvH5dOP7ibd60s+ntjkSiNfNkUgnuVrBE/d/PocC1eHHpZt5V7f43Ofb
RxVwH5+smzQ9nsNBfQR0
=gaah
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches:
- Block graph change fixes (avoid loops, cope with non-tree graphs)
- bdrv_set_aio_context() related fixes
- HMP snapshot commands: Use only tag, not the ID to identify snapshots
- qmeu-img, commit: Error path fixes
- block/nvme: Build fix for gcc 9
- MAINTAINERS updates
- Fix various issues with bdrv_refresh_filename()
- Fix various iotests
- Include LUKS overhead in qemu-img measure for qcow2
- A fix for vmdk's image creation interface
# gpg: Signature made Mon 25 Feb 2019 14:18:15 GMT
# gpg: using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream: (71 commits)
iotests: Skip 211 on insufficient memory
vmdk: false positive of compat6 with hwversion not set
iotests: add LUKS payload overhead to 178 qemu-img measure test
qcow2: include LUKS payload overhead in qemu-img measure
iotests.py: s/_/-/g on keys in qmp_log()
iotests: Let 045 be run concurrently
iotests: Filter SSH paths
iotests.py: Filter filename in any string value
iotests.py: Add is_str()
iotests: Fix 207 to use QMP filters for qmp_log
iotests: Fix 232 for LUKS
iotests: Remove superfluous rm from 232
iotests: Fix 237 for Python 2.x
iotests: Re-add filename filters
iotests: Test json:{} filenames of internal BDSs
block: BDS options may lack the "driver" option
block/null: Generate filename even with latency-ns
block/curl: Implement bdrv_refresh_filename()
block/curl: Harmonize option defaults
block/nvme: Fix bdrv_refresh_filename()
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The 'qemu_acl' type was a previous non-QOM based attempt to provide an
authorization facility in QEMU. Because it is non-QOM based it cannot be
created via the command line and requires special monitor commands to
manipulate it.
The new QAuthZ subclasses provide a superset of the functionality in
qemu_acl, so the latter can now be deleted. The HMP 'acl_*' monitor
commands are converted to use the new QAuthZSimple data type instead
in order to provide temporary backwards compatibility.
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add an authorization backend that talks to PAM to check whether the user
identity is allowed. This only uses the PAM account validation facility,
which is essentially just a check to see if the provided username is permitted
access. It doesn't use the authentication or session parts of PAM, since
that's dealt with by the relevant part of QEMU (eg VNC server).
Consider starting QEMU with a VNC server and telling it to use TLS with
x509 client certificates and configuring it to use an PAM to validate
the x509 distinguished name. In this example we're telling it to use PAM
for the QAuthZ impl with a service name of "qemu-vnc"
$ qemu-system-x86_64 \
-object tls-creds-x509,id=tls0,dir=/home/berrange/security/qemutls,\
endpoint=server,verify-peer=yes \
-object authz-pam,id=authz0,service=qemu-vnc \
-vnc :1,tls-creds=tls0,tls-authz=authz0
This requires an /etc/pam/qemu-vnc file to be created with the auth
rules. A very simple file based whitelist can be setup using
$ cat > /etc/pam/qemu-vnc <<EOF
account requisite pam_listfile.so item=user sense=allow file=/etc/qemu/vnc.allow
EOF
The /etc/qemu/vnc.allow file simply contains one username per line. Any
username not in the file is denied. The usernames in this example are
the x509 distinguished name from the client's x509 cert.
$ cat > /etc/qemu/vnc.allow <<EOF
CN=laptop.berrange.com,O=Berrange Home,L=London,ST=London,C=GB
EOF
More interesting would be to configure PAM to use an LDAP backend, so
that the QEMU authorization check data can be centralized instead of
requiring each compute host to have file maintained.
The main limitation with this PAM module is that the rules apply to all
QEMU instances on the host. Setting up different rules per VM, would
require creating a separate PAM service name & config file for every
guest. An alternative approach for the future might be to not pass in
the plain username to PAM, but instead combine the VM name or UUID with
the username. This requires further consideration though.
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add a QAuthZListFile object type that implements the QAuthZ interface. This
built-in implementation is a proxy around the QAuthZList object type,
initializing it from an external file, and optionally, automatically
reloading it whenever it changes.
To create an instance of this object via the QMP monitor, the syntax
used would be:
{
"execute": "object-add",
"arguments": {
"qom-type": "authz-list-file",
"id": "authz0",
"props": {
"filename": "/etc/qemu/vnc.acl",
"refresh": true
}
}
}
If "refresh" is "yes", inotify is used to monitor the file,
automatically reloading changes. If an error occurs during reloading,
all authorizations will fail until the file is next successfully
loaded.
The /etc/qemu/vnc.acl file would contain a JSON representation of a
QAuthZList object
{
"rules": [
{ "match": "fred", "policy": "allow", "format": "exact" },
{ "match": "bob", "policy": "allow", "format": "exact" },
{ "match": "danb", "policy": "deny", "format": "glob" },
{ "match": "dan*", "policy": "allow", "format": "exact" },
],
"policy": "deny"
}
This sets up an authorization rule that allows 'fred', 'bob' and anyone
whose name starts with 'dan', except for 'danb'. Everyone unmatched is
denied.
The object can be loaded on the comand line using
-object authz-list-file,id=authz0,filename=/etc/qemu/vnc.acl,refresh=yes
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Add a QAuthZList object type that implements the QAuthZ interface. This
built-in implementation maintains a trivial access control list with a
sequence of match rules and a final default policy. This replicates the
functionality currently provided by the qemu_acl module.
To create an instance of this object via the QMP monitor, the syntax
used would be:
{
"execute": "object-add",
"arguments": {
"qom-type": "authz-list",
"id": "authz0",
"props": {
"rules": [
{ "match": "fred", "policy": "allow", "format": "exact" },
{ "match": "bob", "policy": "allow", "format": "exact" },
{ "match": "danb", "policy": "deny", "format": "glob" },
{ "match": "dan*", "policy": "allow", "format": "exact" },
],
"policy": "deny"
}
}
}
This sets up an authorization rule that allows 'fred', 'bob' and anyone
whose name starts with 'dan', except for 'danb'. Everyone unmatched is
denied.
It is not currently possible to create this via -object, since there is
no syntax supported to specify non-scalar properties for objects. This
is likely to be addressed by later support for using JSON with -object,
or an equivalent approach.
In any case the future "authz-listfile" object can be used from the
CLI and is likely a better choice, as it allows the ACL to be refreshed
automatically on change.
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
In many cases a single VM will just need to whitelist a single identity
as the allowed user of network services. This is especially the case for
TLS live migration (optionally with NBD storage) where we just need to
whitelist the x509 certificate distinguished name of the source QEMU
host.
Via QMP this can be configured with:
{
"execute": "object-add",
"arguments": {
"qom-type": "authz-simple",
"id": "authz0",
"props": {
"identity": "fred"
}
}
}
Or via the command line
-object authz-simple,id=authz0,identity=fred
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The current qemu_acl module provides a simple access control list
facility inside QEMU, which is used via a set of monitor commands
acl_show, acl_policy, acl_add, acl_remove & acl_reset.
Note there is no ability to create ACLs - the network services (eg VNC
server) were expected to create ACLs that they want to check.
There is also no way to define ACLs on the command line, nor potentially
integrate with external authorization systems like polkit, pam, ldap
lookup, etc.
The QAuthZ object defines a minimal abstract QOM class that can be
subclassed for creating different authorization providers.
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The inotify userspace API for reading events is quite horrible, so it is
useful to wrap it in a more friendly API to avoid duplicating code
across many users in QEMU. Wrapping it also allows introduction of a
platform portability layer, so that we can add impls for non-Linux based
equivalents in future.
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Handling it just like float128_to_uint32_round_to_zero, that hopefully
is free of bugs :)
Documentation basically copied from float128_to_uint64
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Needed on s390x, to test for the data class of a number. So it will
gain soon a user.
A number is considered normal if the exponent is neither 0 nor all 1's.
That can be checked by adding 1 to the exponent, and comparing against
>= 2 after dropping an eventual overflow into the sign bit.
While at it, convert the other floatXX_is_normal functions to use a
similar, less error prone calculation, as suggested by Richard H.
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Hotplugging PHBs is a machine-level operation, but PHBs reside on the
main system bus, so we register spapr machine as the handler for the
main system bus.
Provide the usual pre-plug, plug and unplug-request handlers.
Move the checking of the PHB index to the pre-plug handler. It is okay
to do that and assert in the realize function because the pre-plug
handler is always called, even for the oldest machine types we support.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
(Fixed interrupt controller phandle in "interrupt-map" and
TCE table size in "ibm,dma-window" FDT fragment, Greg Kurz)
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059672926.1466090.13612804072190051439.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
PHB hotplug re-uses PHB device tree generation code and passes
it to a guest via RTAS. Doing this requires knowledge of where
exactly in the device tree the node describing the PHB begins.
Provide this via a new optional pointer that can be used to
store the PHB node's start offset.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059671912.1466090.10891589403973703473.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059670389.1466090.10015601248906623076.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
To support PHB hotplug we need to clean up lingering references,
memory, child properties, etc. prior to the PHB object being
finalized. Generally this will be called as a result of calling
object_unparent() on the PHB object, which in turn would normally
be called as the result of an unplug() operation.
When the PHB is finalized, child objects will be unparented in
turn, and finalized if the PHB was the only reference holder. so
we don't bother to explicitly unparent child objects of the PHB,
with the notable exception of DRCs. This is needed to avoid a QEMU
crash when unplugging a PHB and resetting the machine before the
guest could handle the event. The DRCs are removed from the QOM tree
by pci_unregister_root_bus() and we must make sure we're not leaving
stale aliases under the global /dr-connector path.
The formula that gives the number of DMA windows is moved to an
inline function in the hw/pci-host/spapr.h header because it
will have other users.
The unrealize function is able to cope with partially realized PHBs.
It is hence used to implement proper rollback on the realize error
path.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <155059669881.1466090.13515030705986041517.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This will be used by PHB hotplug in order to create the "interrupt-map"
property of the PHB node.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059669374.1466090.12943228478046223856.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This will be needed by PHB hotplug in order to access the "phandle"
property of the interrupt controller node.
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <155059668867.1466090.6339199751719123386.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The pseries machine only uses LSIs to support legacy PCI devices. Every
PHB claims 4 LSIs at realize time. When using in-kernel XICS (or upcoming
in-kernel XIVE), QEMU synchronizes the state of all irqs, including these
LSIs, later on at machine reset.
In order to support PHB hotplug, we need a way to tell KVM about the LSIs
that doesn't require a machine reset. An easy way to do that is to always
inform KVM when an interrupt is claimed, which really isn't a performance
path.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059668360.1466090.5969630516627776426.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
All DRC subtypes have been converted to generate the FDT fragment at
configure connector time instead of attach time. The fdt and fdt_offset
arguments of spapr_drc_attach() aren't needed anymore. Drop them and
make the implementation of the dt_populate() method mandatory.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059667853.1466090.16527852453054217565.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059667346.1466090.326696113231137772.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059666839.1466090.3833376527523126752.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059666331.1466090.6766540766297333313.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The current logic is to provide the FDT fragment when attaching a device
to a DRC. This works perfectly fine for our current hotplug support, but
soon we will add support for PHB hotplug which has some constraints, that
CPU, PCI and LMB devices don't seem to have.
The first constraint is that the "ibm,dma-window" property of the PHB
node requires the IOMMU to be configured, ie, spapr_tce_table_enable()
has been called, which happens during PHB reset. It is okay in the case
of hotplug since the device is reset before the hotplug handler is
called. On the contrary with coldplug, the hotplug handler is called
first and device is only reset during the initial system reset. Trying
to create the FDT fragment on the hotplug path in this case, would
result in somthing like this:
ibm,dma-window = < 0x80000000 0x00 0x00 0x00 0x00 >;
This will cause linux in the guest to panic, by simply removing and
re-adding the PHB using the drmgr command:
page = alloc_pages_node(nid, GFP_KERNEL, get_order(sz));
if (!page)
panic("iommu_init_table: Can't allocate %ld bytes\n", sz);
The second and maybe more problematic constraint is that the
"interrupt-map" property needs to reference the interrupt controller
node using the very same phandle that SLOF has already exposed to the
guest. QEMU requires SLOF to call the private KVMPPC_H_UPDATE_DT hcall
at some point to know about this phandle. With the latest QEMU and SLOF,
this happens when SLOF gets quiesced. This means that if the PHB gets
hotplugged after CAS but before SLOF quiesce, then we're sure that the
phandle is not known when the hotplug handler is called.
The FDT is only needed when the guest first invokes RTAS to configure
the connector actually, long after SLOF quiesce. Let's postpone the
creation of FDT fragments for PHBs to rtas_ibm_configure_connector().
Since we only need this for PHBs, introduce a new method in the base
DRC class for that. DRC subtypes will be converted to use it in
subsequent patches.
Allow spapr_drc_attach() to be passed a NULL fdt argument if the method
is available. When all DRC subtypes have been converted, the fdt argument
will eventually disappear.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155059665823.1466090.18358845122627355537.stgit@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The HW relies on LPCR:HR along with the PATE to determine whether
to use Radix or Hash mode. In fact it uses LPCR:HR more commonly
than the PATE.
For us, it's also more efficient to do so, especially since unlike
the HW we do not maintain a cache of the current PATE and HV PATE
in a generic place.
Prepare the grounds for that by ensuring that LPCR:HR is set
properly on SPAPR machines.
Another option would have been to use a callback to get the PATE
but this gets messy when implementing bare metal support, it's
much simpler (and faster) to use LPCR.
Since existing migration streams may not have it, fix it up in
spapr_post_load() as well based on the pseudo-PATE entry that
we keep.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190215170029.15641-2-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
On ppc hosts, hypervisor shares following system attributes
- /proc/device-tree/system-id
- /proc/device-tree/model
with a guest. This could lead to information leakage and misuse.[*]
Add machine attributes to control such system information exposure
to a guest.
[*] https://wiki.openstack.org/wiki/OSSN/OSSN-0028
Reported-by: Daniel P. Berrangé <berrange@redhat.com>
Fix-suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <20190218181349.23885-1-ppandit@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Adds support for the Hypervisor directed interrupts in addition to the
OS ones.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: - modified the icp_realize() and xive_tctx_realize() to take
into account explicitely the POWER9 interrupt model
- introduced a specific power9_set_irq for POWER9 ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190215161648.9600-10-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJccAIFAAoJEJykq7OBq3PImM0IAMcC92GYwSL6MgQ4NrfbVfDQ
P+qgIoxlXcYNwa12kGY6rE4tgSDab9Mc5ACpmuFdr8Xj7/JOp97AccmKZ+RzYQSj
IFlkvd8GngQR1YnFGV6PIWSt7hRhKuUZMqSIDrWro/MTdiJFEMI8/e7QLGxmaEX1
gWCNSopxJUeACSJiRyfZvBGNCs23R9ptFKBFhIXS98KPtEtF8LQV0JnQXoRUDiBL
G9C/xggdGDvct3Id4yOBCh43ErssyOrlYwjzIRWB2AFfIwHGytJcL6JBjAqy4Z7M
ClMNXaQSbfSCXrc3osF9nO6KaFduhfpUH44lL5JBHxUgH6pN2xGradRANZwysQM=
=DSBn
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging
Pull request
# gpg: Signature made Fri 22 Feb 2019 14:07:01 GMT
# gpg: using RSA key 9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full]
# gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" [full]
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8
* remotes/stefanha/tags/block-pull-request: (27 commits)
tests/virtio-blk: add test for DISCARD command
tests/virtio-blk: add test for WRITE_ZEROES command
tests/virtio-blk: add virtio_blk_fix_dwz_hdr() function
tests/virtio-blk: change assert on data_size in virtio_blk_request()
virtio-blk: add DISCARD and WRITE_ZEROES features
virtio-blk: set config size depending on the features enabled
virtio-net: make VirtIOFeature usable for other virtio devices
virtio-blk: add "discard" and "write-zeroes" properties
virtio-blk: add host_features field in VirtIOBlock
virtio-blk: add acct_failed param to virtio_blk_handle_rw_error()
hw/ide: drop iov field from IDEDMA
hw/ide: drop iov field from IDEBufferedRequest
hw/ide: drop iov field from IDEState
tests/test-bdrv-drain: use QEMU_IOVEC_INIT_BUF
migration/block: use qemu_iovec_init_buf
qemu-img: use qemu_iovec_init_buf
block/vmdk: use qemu_iovec_init_buf
block/qed: use qemu_iovec_init_buf
block/qcow2: use qemu_iovec_init_buf
block/qcow: use qemu_iovec_init_buf
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Currently, BlockDriver.bdrv_refresh_filename() is supposed to both
refresh the filename (BDS.exact_filename) and set BDS.full_open_options.
Now that we have generic code in the central bdrv_refresh_filename() for
creating BDS.full_open_options, we can drop the latter part from all
BlockDriver.bdrv_refresh_filename() implementations.
This also means that we can drop all of the existing default code for
this from the global bdrv_refresh_filename() itself.
Furthermore, we now have to call BlockDriver.bdrv_refresh_filename()
after having set BDS.full_open_options, because the block driver's
implementation should now be allowed to depend on BDS.full_open_options
being set correctly.
Finally, with this patch we can drop the @options parameter from
BlockDriver.bdrv_refresh_filename(); also, add a comment on this
function's purpose in block/block_int.h while touching its interface.
This completely obsoletes blklogwrite's implementation of
.bdrv_refresh_filename().
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190201192935.18394-25-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Some follow-up patches will rework the way bs->full_open_options is
refreshed in bdrv_refresh_filename(). The new implementation will remove
the need for the block drivers' bdrv_refresh_filename() implementations
to set bs->full_open_options; instead, it will be generic and use static
information from each block driver.
However, by implementing bdrv_gather_child_options(), block drivers will
still be able to override the way the full_open_options of their
children are incorporated into their own.
We need to implement this function for VMDK because we have to prevent
the generic implementation from gathering the options of all children:
It is not possible to specify options for the extents through the
runtime options.
For quorum, the child names that would be used by the generic
implementation and the ones that we actually (currently) want to use
differ. See quorum_gather_child_options() for more information.
Note that both of these are cases which are not ideal: In case of VMDK
it would probably be nice to be able to specify options for all extents.
In case of quorum, the current runtime option structure is simply broken
and needs to be fixed (but that is left for another patch).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-23-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
This new field can be set by block drivers to list the runtime options
they accept that may influence the contents of the respective BDS. As of
a follow-up patch, this list will be used by the common
bdrv_refresh_filename() implementation to decide which options to put
into BDS.full_open_options (and consequently whether a JSON filename has
to be created), thus freeing the drivers of having to implement that
logic themselves.
Additionally, this patch adds the field to all of the block drivers that
need it and sets it accordingly.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-22-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
This function may be implemented by block drivers to derive a directory
name from a BDS. Concatenating this g_free()-able string with a relative
filename must result in a valid (not necessarily existing) filename, so
this is a function that should generally be not implemented by format
drivers, because this is protocol-specific.
If a BDS's driver does not implement this function, bdrv_dirname() will
fall through to the BDS's file if it exists. If it does not, the
exact_filename field will be used to generate a directory name.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-15-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Make bdrv_get_full_backing_filename() return an allocated string instead
of placing the result in a caller-provided buffer.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-12-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Make bdrv_get_full_backing_filename_from_filename() return an allocated
string instead of placing the result in a caller-provided buffer.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190201192935.18394-11-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Besides being safe for arbitrary path lengths, after some follow-up
patches all callers will want a freshly allocated buffer anyway.
In the meantime, path_combine_deprecated() is added which has the same
interface as path_combine() had before this patch. All callers to that
function will be converted in follow-up patches.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20190201192935.18394-10-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
If the backing file is overridden, this most probably does change the
guest-visible data of a BDS. Therefore, we will need to consider this
in bdrv_refresh_filename().
To see whether it has been overridden, we might want to compare
bs->backing_file and bs->backing->bs->filename. However,
bs->backing_file is changed by bdrv_set_backing_hd() (which is just used
to change the backing child at runtime, without modifying the image
header), so bs->backing_file most of the time simply contains a copy of
bs->backing->bs->filename anyway, so it is useless for such a
comparison.
This patch adds an auto_backing_file BDS field which contains the
backing file path as indicated by the image header, which is not changed
by bdrv_set_backing_hd().
Because of bdrv_refresh_filename() magic, however, a BDS's filename may
differ from what has been specified during bdrv_open(). Then, the
comparison between bs->auto_backing_file and bs->backing->bs->filename
may fail even though bs->backing was opened from bs->auto_backing_file.
To mitigate this, we can copy the real BDS's filename (after the whole
bdrv_open() and bdrv_refresh_filename() process) into
bs->auto_backing_file, if we know the former has been opened based on
the latter. This is only possible if no options modifying the backing
file's behavior have been specified, though. To simplify things, this
patch only copies the filename from the backing file if no options have
been specified for it at all.
Furthermore, there are cases where an overlay is created by qemu which
already contains a BDS's filename (e.g. in blockdev-snapshot-sync). We
do not need to worry about updating the overlay's bs->auto_backing_file
there, because we actually wrote a post-bdrv_refresh_filename() filename
into the image header.
So all in all, there will be false negatives where (as of a future
patch) bdrv_refresh_filename() will assume that the backing file differs
from what was specified in the image header, even though it really does
not. However, these cases should be limited to where (1) the user
actually did override something in the backing chain (e.g. by specifying
options for the backing file), or (2) the user executed a QMP command to
change some node's backing file (e.g. change-backing-file or
block-commit with @backing-file given) where the given filename does not
happen to coincide with qemu's idea of the backing BDS's filename.
Then again, (1) really is limited to -drive. With -blockdev or
blockdev-add, you have to adhere to the schema, so a user cannot give
partial "unimportant" options (e.g. by just setting backing.node-name
and leaving the rest to the image header). Therefore, trying to fix
this would mean trying to fix something for -drive only.
To improve on (2), we would need a full infrastructure to "canonicalize"
an arbitrary filename (+ options), so it can be compared against
another. That seems a bit over the top, considering that filenames
nowadays are there mostly for the user's entertainment.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-5-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Before this patch, bdrv_refresh_filename() is used in a pushing manner:
Whenever the BDS graph is modified, the parents of the modified edges
are supposed to be updated (recursively upwards). However, that is
nonviable, considering that we want child changes not to concern
parents.
Also, in the long run we want a pull model anyway: Here, we would have a
bdrv_filename() function which returns a BDS's filename, freshly
constructed.
This patch is an intermediate step. It adds bdrv_refresh_filename()
calls before every place a BDS.filename value is used. The only
exceptions are protocol drivers that use their own filename, which
clearly would not profit from refreshing that filename before.
Also, bdrv_get_encrypted_filename() is removed along the way (as a user
of BDS.filename), since it is completely unused.
In turn, all of the calls to bdrv_refresh_filename() before this patch
are removed, because we no longer have to call this function on graph
changes.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190201192935.18394-2-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
bdrv_check_perm in it's recursion checks each node in context of new
permissions for one parent, because of nature of DFS. It works well,
while children subgraph of top-most updated node is a tree, i.e. it
doesn't have any kind of loops. But if we have a loop (not oriented,
of course), i.e. we have two different ways from top-node to some
child-node, then bdrv_check_perm will do wrong thing:
top
| \
| |
v v
A B
| |
v v
node
It will once check new permissions of node in context of new A
permissions and old B permissions and once visa-versa. It's a wrong way
and may lead to corruption of permission system. We may start with
no-permissions and all-shared for both A->node and B->node relations
and finish up with non shared write permission for both ways.
The following commit will add a test, which shows this bug.
To fix this situation, let's really set BdrvChild permissions during
bdrv_check_perm procedure. And we are happy here, as check-perm is
already written in transaction manner, so we just need to restore
backed-up permissions in _abort.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of using the convenience wrapper qio_channel_read_all_eof(), use
the lower level QIOChannel API. This means duplicating some code, but
we'll need this because this coroutine yield is special: We want it to
be interruptible so that nbd_client_attach_aio_context() can correctly
reenter the coroutine.
This moves the bdrv_dec/inc_in_flight() pair into nbd_read_eof(), so
that connection_co will always sit in this exact qio_channel_yield()
call when bdrv_drain() returns.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
The only caller of nbd_read_eof() is nbd_receive_reply(), so it doesn't
have to live in the header file, but can move next to its caller.
Also add the missing coroutine_fn to the function and its caller.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Similar to how qemu_co_sleep_ns() allows preemption from an external
coroutine entry, allow reentering qio_channel_yield() early.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For some users of BlockBackends, just increasing the in_flight counter
is easier than implementing separate handlers in BlockDevOps. Make the
helper functions for this public.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
After the previous patch, the only instance of this function left
is inside qemu-img.c.
qemu-img is using it inside the 'img_snapshot' function to delete
snapshots in the SNAPSHOT_DELETE case, based on a "snapshot_name"
string that refers to the tag, not ID, of the QEMUSnapshotInfo struct.
This can be verified by checking the SNAPSHOT_CREATE case that
comes shortly before SNAPSHOT_DELETE. In that case, the same
"snapshot_name" variable is being strcpy to the 'name' field
of the QEMUSnapshotInfo struct sn:
pstrcpy(sn.name, sizeof(sn.name), snapshot_name);
Based on that, it is unlikely that "snapshot_name" might contain
an "id" in SNAPSHOT_DELETE.
This patch changes SNAPSHOT_DELETE to use snapshot_find() and
snapshot_delete() instead of bdrv_snapshot_delete_by_id_or_name.
After that, there is no instances left of bdrv_snapshot_delete_by_id_or_name
in the code, so it is safe to remove it entirely.
Suggested-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch adds the support of DISCARD and WRITE_ZEROES commands,
that have been introduced in the virtio-blk protocol to have
better performance when using SSD backend.
We support only one segment per request since multiple segments
are not widely used and there are no userspace APIs that allow
applications to submit multiple segments in a single call.
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20190221103314.58500-7-sgarzare@redhat.com
Message-Id: <20190221103314.58500-7-sgarzare@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Starting from DISABLE and WRITE_ZEROES features, we use an array of
VirtIOFeature (as virtio-net) to properly set the config size
depending on the features enabled.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20190221103314.58500-6-sgarzare@redhat.com
Message-Id: <20190221103314.58500-6-sgarzare@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
In order to use VirtIOFeature also in other virtio devices, we move
its declaration and the endof() macro (renamed in virtio_endof())
in virtio.h.
We add virtio_feature_get_config_size() function to iterate the array
of VirtIOFeature and to return the config size depending on the
features enabled. (as virtio_net_set_config_size() did)
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20190221103314.58500-5-sgarzare@redhat.com
Message-Id: <20190221103314.58500-5-sgarzare@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Since configurable features for virtio-blk are growing, this patch
adds host_features field in the struct VirtIOBlock. (as in virtio-net)
In this way, we can avoid to add new fields for new properties and
we can directly set VIRTIO_BLK_F* flags in the host_features.
We update "config-wce" and "scsi" property definition to use the new
host_features field without change the behaviour.
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20190221103314.58500-3-sgarzare@redhat.com
Message-Id: <20190221103314.58500-3-sgarzare@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>