openkylin/qemu - qemu - 红山开源项目托管

Commit Graph

Author	SHA1	Message	Date
Markus Armbruster	c3b8e3e0ed	vfio: Clean up error reporting after previous commit The previous commit changed vfio's warning messages from vfio warning: DEV-NAME: Could not frobnicate to warning: vfio DEV-NAME: Could not frobnicate To match this change, change error messages from vfio error: DEV-NAME: On fire to vfio DEV-NAME: On fire Note the loss of "error". If we think marking error messages that way is a good idea, we should mark all error messages, i.e. make error_report() print it. Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Message-Id: <20181017082702.5581-7-armbru@redhat.com>	2018-10-19 14:51:34 +02:00
Markus Armbruster	e1eb292ace	vfio: Use warn_report() & friends to report warnings The vfio code reports warnings like error_report(WARN_PREFIX "Could not frobnicate", DEV-NAME); where WARN_PREFIX is defined so the message comes out as vfio warning: DEV-NAME: Could not frobnicate This usage predates the introduction of warn_report() & friends in commit `97f40301f1`. It's time to convert to that interface. Since these functions already prefix the message with "warning: ", replace WARN_PREFIX by VFIO_MSG_PREFIX, so the messages come out like warning: vfio DEV-NAME: Could not frobnicate The next commit will replace ERR_PREFIX. Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20181017082702.5581-6-armbru@redhat.com>	2018-10-19 14:51:34 +02:00
Eric Auger	a49531ebd0	vfio/platform: Make the vfio-platform device non-abstract Up to now the vfio-platform device has been abstract and could not be instantiated. The integration of a new vfio platform device required creating a dummy derived device which only set the compatible string. Following the few vfio-platform device integrations we have seen the actual requested adaptation happens on device tree node creation (sysbus-fdt). Hence remove the abstract setting, and read the list of compatible values from sysfs if not set by a derived device. Update the amd-xgbe and calxeda-xgmac drivers to fill in the number of compatible values, as there can now be more than one. Note that sysbus-fdt does not support the instantiation of the vfio-platform device yet. Signed-off-by: Eric Auger <eric.auger@redhat.com> [geert: Rebase, set user_creatable=true, use compatible values in sysfs instead of user-supplied manufacturer/model options, reword] Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2018-10-15 10:52:09 -06:00
Gerd Hoffmann	b290659fc3	hw/vfio/display: add ramfb support So we have a boot display when using a vgpu as primary display. ramfb depends on a fw_cfg file. fw_cfg files can not be added and removed at runtime, therefore a ramfb-enabled vfio device can't be hotplugged. Add a nohotplug variant of the vfio-pci device (as child class). Add the ramfb property to the nohotplug variant only. So to enable the vgpu display with boot support use this: -device vfio-pci-nohotplug,display=on,ramfb=on,sysfsdev=... Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2018-10-15 10:52:09 -06:00
Tony Krowiak	2fe2942cd6	s390x/vfio: ap: Introduce VFIO AP device Introduces a VFIO based AP device. The device is defined via the QEMU command line by specifying: -device vfio-ap,sysfsdev=<path-to-mediated-matrix-device> There may be only one vfio-ap device configured for a guest. The mediated matrix device is created by the VFIO AP device driver by writing a UUID to a sysfs attribute file (see docs/vfio-ap.txt). The mediated matrix device will be named after the UUID. Symbolic links to the $uuid are created in many places, so the path to the mediated matrix device $uuid can be specified in any of the following ways: /sys/devices/vfio_ap/matrix/$uuid /sys/devices/vfio_ap/matrix/mdev_supported_types/vfio_ap-passthrough/devices/$uuid /sys/bus/mdev/devices/$uuid /sys/bus/mdev/drivers/vfio_mdev/$uuid When the vfio-ap device is realized, it acquires and opens the VFIO iommu group to which the mediated matrix device is bound. This causes a VFIO group notification event to be signaled. The vfio_ap device driver's group notification handler will get called at which time the device driver will configure the the AP devices to which the guest will be granted access. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Tested-by: Pierre Morel <pmorel@linux.ibm.com> Acked-by: Halil Pasic <pasic@linux.ibm.com> Tested-by: Pierre Morel <pmorel@linux.ibm.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Message-Id: <20181010170309.12045-6-akrowiak@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> [CH: added missing g_free and device category] Signed-off-by: Cornelia Huck <cohuck@redhat.com>	2018-10-12 11:32:18 +02:00
Alexey Kardashevskiy	c26bc185b7	vfio/spapr: Allow backing bigger guest IOMMU pages with smaller physical pages At the moment the PPC64/pseries guest only supports 4K/64K/16M IOMMU pages and POWER8 CPU supports the exact same set of page size so so far things worked fine. However POWER9 supports different set of sizes - 4K/64K/2M/1G and the last two - 2M and 1G - are not even allowed in the paravirt interface (RTAS DDW) so we always end up using 64K IOMMU pages, although we could back guest's 16MB IOMMU pages with 2MB pages on the host. This stores the supported host IOMMU page sizes in VFIOContainer and uses this later when creating a new DMA window. This uses the system page size (64k normally, 2M/16M/1G if hugepages used) as the upper limit of the IOMMU pagesize. This changes the type of @pagesize to uint64_t as this is what memory_region_iommu_get_min_page_size() returns and clz64() takes. There should be no behavioral changes on platforms other than pseries. The guest will keep using the IOMMU page size selected by the PHB pagesize property as this only changes the underlying hardware TCE table granularity. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-08-21 14:28:45 +10:00
Alex Williamson	238e917285	vfio/ccw/pci: Allow devices to opt-in for ballooning If a vfio assigned device makes use of a physical IOMMU, then memory ballooning is necessarily inhibited due to the page pinning, lack of page level granularity at the IOMMU, and sufficient notifiers to both remove the page on balloon inflation and add it back on deflation. However, not all devices are backed by a physical IOMMU. In the case of mediated devices, if a vendor driver is well synchronized with the guest driver, such that only pages actively used by the guest driver are pinned by the host mdev vendor driver, then there should be no overlap between pages available for the balloon driver and pages actively in use by the device. Under these conditions, ballooning should be safe. vfio-ccw devices are always mediated devices and always operate under the constraints above. Therefore we can consider all vfio-ccw devices as balloon compatible. The situation is far from straightforward with vfio-pci. These devices can be physical devices with physical IOMMU backing or mediated devices where it is unknown whether a physical IOMMU is in use or whether the vendor driver is well synchronized to the working set of the guest driver. The safest approach is therefore to assume all vfio-pci devices are incompatible with ballooning, but allow user opt-in should they have further insight into mediated devices. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2018-08-17 09:27:16 -06:00
Tiwei Bie	f88b44f9eb	vfio: remove DPRINTF() definition from vfio-common.h This macro isn't used by any VFIO code. And its name is too generic. The vfio-common.h (in include/hw/vfio) can be included by other modules in QEMU. It can introduce conflicts. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2018-06-05 08:23:16 -06:00
Philippe Mathieu-Daudé	d791937fa0	vfio: Include "exec/address-spaces.h" directly in the source file No declaration of "hw/vfio/vfio-common.h" directly requires to include the "exec/address-spaces.h" header. To simplify dependencies and ease the upcoming cleanup of "exec/address-spaces.h", directly include it in the source file where the declaration are used. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20180528232719.4721-2-f4bug@amsat.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-05-31 19:12:13 +02:00
Alexey Kardashevskiy	ae0215b2bb	vfio-pci: Allow mmap of MSIX BAR At the moment we unconditionally avoid mapping MSIX data of a BAR and emulate MSIX table in QEMU. However it is 1) not always necessary as a platform may provide a paravirt interface for MSIX configuration; 2) can affect the speed of MMIO access by emulating them in QEMU when frequently accessed registers share same system page with MSIX data, this is particularly a problem for systems with the page size bigger than 4KB. A new capability - VFIO_REGION_INFO_CAP_MSIX_MAPPABLE - has been added to the kernel [1] which tells the userspace that mapping of the MSIX data is possible now. This makes use of it so from now on QEMU tries mapping the entire BAR as a whole and emulate MSIX on top of that. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a32295c612c57990d17fb0f41e7134394b2f35f6 Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2018-03-13 11:17:31 -06:00
Gerd Hoffmann	8b818e059b	vfio/display: adding dmabuf support Wire up dmabuf-based display. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2018-03-13 11:17:30 -06:00
Gerd Hoffmann	00195ba710	vfio/display: adding region support Wire up region-based display. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed By: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2018-03-13 11:17:30 -06:00
Alexey Kardashevskiy	3df9d74806	memory/iommu: QOM'fy IOMMU MemoryRegion This defines new QOM object - IOMMUMemoryRegion - with MemoryRegion as a parent. This moves IOMMU-related fields from MR to IOMMU MR. However to avoid dymanic QOM casting in fast path (address_space_translate, etc), this adds an @is_iommu boolean flag to MR and provides new helper to do simple cast to IOMMU MR - memory_region_get_iommu. The flag is set in the instance init callback. This defines memory_region_is_iommu as memory_region_get_iommu()!=NULL. This switches MemoryRegion to IOMMUMemoryRegion in most places except the ones where MemoryRegion may be an alias. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20170711035620.4232-2-aik@ozlabs.ru> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-07-14 12:04:41 +02:00
Alex Williamson	7da624e26a	vfio: Test realized when using VFIOGroup.device_list iterator VFIOGroup.device_list is effectively our reference tracking mechanism such that we can teardown a group when all of the device references are removed. However, we also use this list from our machine reset handler for processing resets that affect multiple devices. Generally device removals are fully processed (exitfn + finalize) when this reset handler is invoked, however if the removal is triggered via another reset handler (piix4_reset->acpi_pcihp_reset) then the device exitfn may run, but not finalize. In this case we hit asserts when we start trying to access PCI helpers since much of the PCI state of the device is released. To resolve this, add a pointer to the Object DeviceState in our common base-device and skip non-realized devices as we iterate. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2017-07-10 10:39:43 -06:00
Xiao Feng Ren	1dcac3e152	vfio/ccw: vfio based subchannel passthrough driver We use the IOMMU_TYPE1 of VFIO to realize the subchannels passthrough, implement a vfio based subchannels passthrough driver called "vfio-ccw". Support qemu parameters in the style of: "-device vfio-ccw,sysfsdev=$mdev_file_path,devno=xx.x.xxxx' Reviewed-by: Eric Auger <eric.auger@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Xiao Feng Ren <renxiaof@linux.vnet.ibm.com> Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Message-Id: <20170517004813.58227-8-bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>	2017-05-19 12:29:01 +02:00
Eric Auger	59f7d6743c	vfio: Pass an error object to vfio_get_device Pass an error object to prepare for migration to VFIO-PCI realize. In vfio platform vfio_base_device_init we currently just report the error. Subsequent patches will propagate the error up to the realize function. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:58:00 -06:00
Eric Auger	1b808d5be0	vfio: Pass an error object to vfio_get_group Pass an error object to prepare for migration to VFIO-PCI realize. For the time being let's just simply report the error in vfio platform's vfio_base_device_init(). A subsequent patch will duly propagate the error up to vfio_platform_realize. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:57:59 -06:00
Eric Auger	426ec9049e	vfio/pci: Use local error object in vfio_initfn To prepare for migration to realize, let's use a local error object in vfio_initfn. Also let's use the same error prefix for all error messages. On top of the 1-1 conversion, we start using a common error prefix for all error messages. We also introduce a similar warning prefix which will be used later on. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-17 10:57:56 -06:00
Peter Xu	cdb3081269	memory: introduce IOMMUNotifier and its caps IOMMU Notifier list is used for notifying IO address mapping changes. Currently VFIO is the only user. However it is possible that future consumer like vhost would like to only listen to part of its notifications (e.g., cache invalidations). This patch introduced IOMMUNotifier and IOMMUNotfierFlag bits for a finer grained control of it. IOMMUNotifier contains a bitfield for the notify consumer describing what kind of notification it is interested in. Currently two kinds of notifications are defined: - IOMMU_NOTIFIER_MAP: for newly mapped entries (additions) - IOMMU_NOTIFIER_UNMAP: for entries to be removed (cache invalidates) When registering the IOMMU notifier, we need to specify one or multiple types of messages to listen to. When notifications are triggered, its type will be checked against the notifier's type bits, and only notifiers with registered bits will be notified. (For any IOMMU implementation, an in-place mapping change should be notified with an UNMAP followed by a MAP.) Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1474606948-14391-2-git-send-email-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-09-27 08:59:16 +02:00
Markus Armbruster	175de52487	Clean up decorations and whitespace around header guards Cleaned up with scripts/clean-header-guards.pl. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net>	2016-07-12 16:20:46 +02:00
Markus Armbruster	121d07125b	Clean up header guards that don't match their file name Header guard symbols should match their file name to make guard collisions less likely. Offenders found with scripts/clean-header-guards.pl -vn. Cleaned up with scripts/clean-header-guards.pl, followed by some renaming of new guard symbols picked by the script to better ones. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net>	2016-07-12 16:19:16 +02:00
Alexey Kardashevskiy	2e4109de8e	vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2) New VFIO_SPAPR_TCE_v2_IOMMU type supports dynamic DMA window management. This adds ability to VFIO common code to dynamically allocate/remove DMA windows in the host kernel when new VFIO container is added/removed. This adds a helper to vfio_listener_region_add which makes VFIO_IOMMU_SPAPR_TCE_CREATE ioctl and adds just created IOMMU into the host IOMMU list; the opposite action is taken in vfio_listener_region_del. When creating a new window, this uses heuristic to decide on the TCE table levels number. This should cause no guest visible change in behavior. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [dwg: Added some casts to prevent printf() warnings on certain targets where the kernel headers' __u64 doesn't match uint64_t or PRIx64] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-05 14:31:08 +10:00
Alexey Kardashevskiy	f4ec5e26ed	vfio: Add host side DMA window capabilities There are going to be multiple IOMMUs per a container. This moves the single host IOMMU parameter set to a list of VFIOHostDMAWindow. This should cause no behavioral change and will be used later by the SPAPR TCE IOMMU v2 which will also add a vfio_host_win_del() helper. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-05 14:31:08 +10:00
Alexey Kardashevskiy	318f67ce13	vfio: spapr: Add DMA memory preregistering (SPAPR IOMMU v2) This makes use of the new "memory registering" feature. The idea is to provide the userspace ability to notify the host kernel about pages which are going to be used for DMA. Having this information, the host kernel can pin them all once per user process, do locked pages accounting (once) and not spent time on doing that in real time with possible failures which cannot be handled nicely in some cases. This adds a prereg memory listener which listens on address_space_memory and notifies a VFIO container about memory which needs to be pinned/unpinned. VFIO MMIO regions (i.e. "skip dump" regions) are skipped. The feature is only enabled for SPAPR IOMMU v2. The host kernel changes are required. Since v2 does not need/support VFIO_IOMMU_ENABLE, this does not call it when v2 is detected and enabled. This enforces guest RAM blocks to be host page size aligned; however this is not new as KVM already requires memory slots to be host page size aligned. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [dwg: Fix compile error on 32-bit host] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-05 14:30:54 +10:00
Alexey Kardashevskiy	d78c19b5cf	memory: Fix IOMMU replay base address Since `a788f227` "memory: Allow replay of IOMMU mapping notifications" when new VFIO listener is added, all existing IOMMU mappings are replayed. However there is a problem that the base address of an IOMMU memory region (IOMMU MR) is ignored which is not a problem for the existing user (which is pseries) with its default 32bit DMA window starting at 0 but it is if there is another DMA window. This stores the IOMMU's offset_within_address_space and adjusts the IOVA before calling vfio_dma_map/vfio_dma_unmap. As the IOMMU notifier expects IOVA offset rather than the absolute address, this also adjusts IOVA in sPAPR H_PUT_TCE handler before calling notifier(s). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-05-26 11:12:08 -06:00
Alex Williamson	e61a424f05	vfio: Create device specific region info helper Given a device specific region type and sub-type, find it. Also cleanup return point on error in vfio_get_region_info() so that we always return 0 with a valid pointer or -errno and NULL. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Tested-by: Gerd Hoffmann <kraxel@redhat.com>	2016-05-26 11:04:50 -06:00
Markus Armbruster	14b6d44d47	Use scripts/clean-includes to drop redundant qemu/typedefs.h Re-run scripts/clean-includes to apply the previous commit's corrections and updates. Besides redundant qemu/typedefs.h, this only finds a redundant config-host.h include in ui/egl-helpers.c. No idea how that escaped the previous runs. Some manual whitespace trimming around dropped includes squashed in. Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-03-22 22:20:16 +01:00
David Gibson	3356128cd1	vfio: Eliminate vfio_container_ioctl() vfio_container_ioctl() was a bad interface that bypassed abstraction boundaries, had semantics that sat uneasily with its name, and was unsafe in many realistic circumstances. Now that spapr-pci-vfio-host-bridge has been folded into spapr-pci-host-bridge, there are no more users, so remove it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com>	2016-03-16 09:55:11 +11:00
David Gibson	3153119e9b	vfio: Start improving VFIO/EEH interface At present the code handling IBM's Enhanced Error Handling (EEH) interface on VFIO devices operates by bypassing the usual VFIO logic with vfio_container_ioctl(). That's a poorly designed interface with unclear semantics about exactly what can be operated on. In particular it operates on a single vfio container internally (hence the name), but takes an address space and group id, from which it deduces the container in a rather roundabout way. groupids are something that code outside vfio shouldn't even be aware of. This patch creates new interfaces for EEH operations. Internally we have vfio_eeh_container_op() which takes a VFIOContainer object directly. For external use we have vfio_eeh_as_ok() which determines if an AddressSpace is usable for EEH (at present this means it has a single container with exactly one group attached), and vfio_eeh_as_op() which will perform an operation on an AddressSpace in the unambiguous case, and otherwise returns an error. This interface still isn't great, but it's enough of an improvement to allow a number of cleanups in other places. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com>	2016-03-16 09:55:10 +11:00
Alex Williamson	db0da029a1	vfio: Generalize region support Both platform and PCI vfio drivers create a "slow", I/O memory region with one or more mmap memory regions overlayed when supported by the device. Generalize this to a set of common helpers in the core that pulls the region info from vfio, fills the region data, configures slow mapping, and adds helpers for comleting the mmap, enable/disable, and teardown. This can be immediately used by the PCI MSI-X code, which needs to mmap around the MSI-X vector table. This also changes VFIORegion.mem to be dynamically allocated because otherwise we don't know how the caller has allocated VFIORegion and therefore don't know whether to unreference it to destroy the MemoryRegion or not. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-03-10 20:03:16 -07:00
Alex Williamson	469002263a	vfio: Wrap VFIO_DEVICE_GET_REGION_INFO In preparation for supporting capability chains on regions, wrap ioctl(VFIO_DEVICE_GET_REGION_INFO) so we don't duplicate the code for each caller. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-03-10 09:39:07 -07:00
Alex Williamson	7df9381b7a	vfio: Add sysfsdev property for pci & platform vfio-pci currently requires a host= parameter, which comes in the form of a PCI address in [domain:]<bus:slot.function> notation. We expect to find a matching entry in sysfs for that under /sys/bus/pci/devices/. vfio-platform takes a similar approach, but defines the host= parameter to be a string, which can be matched directly under /sys/bus/platform/devices/. On the PCI side, we have some interest in using vfio to expose vGPU devices. These are not actual discrete PCI devices, so they don't have a compatible host PCI bus address or a device link where QEMU wants to look for it. There's also really no requirement that vfio can only be used to expose physical devices, a new vfio bus and iommu driver could expose a completely emulated device. To fit within the vfio framework, it would need a kernel struct device and associated IOMMU group, but those are easy constraints to manage. To support such devices, which would include vGPUs, that honor the VFIO PCI programming API, but are not necessarily backed by a unique PCI address, add support for specifying any device in sysfs. The vfio API already has support for probing the device type to ensure compatibility with either vfio-pci or vfio-platform. With this, a vfio-pci device could either be specified as: -device vfio-pci,host=02:00.0 or -device vfio-pci,sysfsdev=/sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0 or even -device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:02:00.0 When vGPU support comes along, this might look something more like: -device vfio-pci,sysfsdev=/sys/devices/virtual/intel-vgpu/vgpu0@0000:00:02.0 NB - This is only a made up example path The same change is made for vfio-platform, specifying sysfsdev has precedence over the old host option. Tested-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-03-10 09:39:07 -07:00
Eric Auger	62d9551247	hw/vfio/platform: amd-xgbe device This patch introduces the amd-xgbe VFIO platform device. It allows the guest to do passthrough on a device exposing an "amd,xgbe-seattle-v1a" compat string. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-02-19 09:42:29 -07:00
David Gibson	7a140a57c6	vfio: Record host IOMMU's available IO page sizes Depending on the host IOMMU type we determine and record the available page sizes for IOMMU translation. We'll need this for other validation in future patches. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-10-05 12:38:41 -06:00
David Gibson	3898aad323	vfio: Check guest IOVA ranges against host IOMMU capabilities The current vfio core code assumes that the host IOMMU is capable of mapping any IOVA the guest wants to use to where we need. However, real IOMMUs generally only support translating a certain range of IOVAs (the "DMA window") not a full 64-bit address space. The common x86 IOMMUs support a wide enough range that guests are very unlikely to go beyond it in practice, however the IOMMU used on IBM Power machines - in the default configuration - supports only a much more limited IOVA range, usually 0..2GiB. If the guest attempts to set up an IOVA range that the host IOMMU can't map, qemu won't report an error until it actually attempts to map a bad IOVA. If guest RAM is being mapped directly into the IOMMU (i.e. no guest visible IOMMU) then this will show up very quickly. If there is a guest visible IOMMU, however, the problem might not show up until much later when the guest actually attempt to DMA with an IOVA the host can't handle. This patch adds a test so that we will detect earlier if the guest is attempting to use IOVA ranges that the host IOMMU won't be able to deal with. For now, we assume that "Type1" (x86) IOMMUs can support any IOVA, this is incorrect, but no worse than what we have already. We can't do better for now because the Type1 kernel interface doesn't tell us what IOVA range the IOMMU actually supports. For the Power "sPAPR TCE" IOMMU, however, we can retrieve the supported IOVA range and validate guest IOVA ranges against it, and this patch does so. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-10-05 12:38:13 -06:00
David Gibson	ee0bf0e59b	vfio: Remove unneeded union from VFIOContainer Currently the VFIOContainer iommu_data field contains a union with different information for different host iommu types. However: * It only actually contains information for the x86-like "Type1" iommu * Because we have a common listener the Type1 fields are actually used on all IOMMU types, including the SPAPR TCE type as well In fact we now have a general structure for the listener which is unlikely to ever need per-iommu-type information, so this patch removes the union. In a similar way we can unify the setup of the vfio memory listener in vfio_connect_container() that is currently split across a switch on iommu type, but is effectively the same in both cases. The iommu_data.release pointer was only needed as a cleanup function which would handle potentially different data in the union. With the union gone, it too can be removed. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-10-05 12:36:08 -06:00
Eric Auger	a22313deca	hw/vfio/platform: change interrupt/unmask fields into pointer unmask EventNotifier might not be initialized in case of edge sensitive irq. Using EventNotifier pointers make life simpler to handle the edge-sensitive irqfd setup. Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-10-05 12:30:12 -06:00
Alex Williamson	5e15d79b86	vfio: Change polarity of our no-mmap option The default should be to allow mmap and new drivers shouldn't need to expose an option or set it to other than the allocation default in their initfn. Take advantage of the experimental flag to change this option to the correct polarity. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-09-23 13:04:44 -06:00
Alex Williamson	46746dbaa8	vfio/pci: Make interrupt bypass runtime configurable Tracing is more effective when we can completely disable all KVM bypass paths. Make these runtime rather than build-time configurable. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-09-23 13:04:44 -06:00
Eric Auger	fb5f816499	hw/vfio/platform: add irqfd support This patch aims at optimizing IRQ handling using irqfd framework. Instead of handling the eventfds on user-side they are handled on kernel side using - the KVM irqfd framework, - the VFIO driver virqfd framework. the virtual IRQ completion is trapped at interrupt controller This removes the need for fast/slow path swap. Overall this brings significant performance improvements. Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com> Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Tested-by: Vikram Sethi <vikrams@codeaurora.org> Acked-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-07-06 12:15:14 -06:00
Eric Auger	7a8d15d770	hw/vfio/platform: calxeda xgmac device The platform device class has become abstract. This patch introduces a calxeda xgmac device that derives from it. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-06-09 08:17:17 -06:00
Eric Auger	38559979bf	hw/vfio/platform: add irq assignment This patch adds the code requested to assign interrupts to a guest. The interrupts are mediated through user handled eventfds only. Signed-off-by: Eric Auger <eric.auger@linaro.org> Tested-by: Vikram Sethi <vikrams@codeaurora.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-06-08 09:25:26 -06:00
Eric Auger	0ea2730bef	hw/vfio/platform: vfio-platform skeleton Minimal VFIO platform implementation supporting register space user mapping but not IRQ assignment. Signed-off-by: Kim Phillips <kim.phillips@linaro.org> Signed-off-by: Eric Auger <eric.auger@linaro.org> Tested-by: Vikram Sethi <vikrams@codeaurora.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-06-08 09:25:25 -06:00
Samuel Pitoiset	6ee47c9008	vfio: allow to disable MMAP per device with -x-mmap=off option Disabling MMAP support uses the slower read/write accesses but allows to trace all MMIO accesses, which is not good for performance, but very useful for reverse engineering PCI drivers. This option allows to disable MMAP per device without a compile-time change. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-02 11:38:55 -07:00
Alexey Kardashevskiy	51b833f440	vfio: Make type1 listener symbols static They are not used from anywhere but common.c which is where these are defined so make them static. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-03-02 11:38:55 -07:00
Paolo Bonzini	217e9fdcad	vfio: cleanup vfio_get_device error path, remove vfio_populate_device callback Now that vfio_put_base_device is called unconditionally at instance_finalize time, it can be called twice if vfio_populate_device fails. This works but it is slightly harder to follow. Change vfio_get_device to not touch the vbasedev struct until it will definitely succeed, moving the vfio_populate_device call back to vfio-pci. This way, vfio_put_base_device will only be called once. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2015-02-10 10:25:44 -07:00
Eric Auger	e2c7d025ad	hw/vfio: create common module A new common module is created. It implements all functions that have no device specificity (PCI, Platform). This patch only consists in move (no functional changes) Signed-off-by: Kim Phillips <kim.phillips@linaro.org> Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-12-22 09:54:51 -07:00
Kim Phillips	cf7087db10	vfio: move hw/misc/vfio.c to hw/vfio/pci.c Move vfio.h into include/hw/vfio This is done in preparation for the addition of VFIO platform device support. Signed-off-by: Kim Phillips <kim.phillips@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2014-12-19 15:24:06 -07:00

48 Commits