An alias doesn't always point to a physical device. When this
happens we must first verify that the IOMMU group isn't rooted in
a device above the alias. In this case the alias is effectively
just another quirk for the devices aliased to it. Alternatively,
the virtual alias itself may be the root of the IOMMU group. To
support this, allow a group to be hosted on the alias dev_data
for use by anything that might have the same alias.
Signed-off-by: Alex williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Add a WARN_ON to make it clear why we don't add dma_pdev->dev to the
group we're allocating.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This needs to be broken apart, start with pulling all the IOMMU
group init code into a new function.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
We should return NULL on error instead of the freed pointer.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Report the availability of irq remapping through the
IOMMU-API to allow KVM device passthrough again without
additional module parameter overrides.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Add the six routines required to setup interrupt remapping
with the AMD IOMMU. Also put it all together into the AMD
specific irq_remap_ops.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Add the routine to setup interrupt remapping for ioapic
interrupts. Also add a routine to change the affinity of an
irq and to free an irq allocation for interrupt remapping.
The last two functions will also be used for MSI interrupts.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Add routines to:
* Alloc remapping tables and single entries from these
tables
* Change entries in the tables
* Free entries in the table
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Add routine to invalidate the IOMMU cache for interupt
translations. Also include the IRTE caches when flushing all
IOMMU caches.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The irq remapping tables for the AMD IOMMU need to be
aligned on a 128 byte boundary. Create a seperate slab-cache
to guarantee this alignment.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The IVRS ACPI table provides information about the IOAPICs
and the HPETs available in the system and which PCI device
ID they use in transactions. Save that information for later
usage in interrupt remapping.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The new IOMMU groups code in the AMD IOMMU driver makes the
assumption that there is a pci_dev struct available for all
device-ids listed in the IVRS ACPI table. Unfortunatly this
assumption is not true and so this code causes a NULL
pointer dereference at boot on some systems.
Fix it by making sure the given pointer is never NULL when
passed to the group specific code. The real fix is larger
and will be queued for v3.7.
Reported-by: Florian Dazinger <florian@dazinger.net>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Fix some typos in comments and user-visible messages. No
functional changes.
Signed-off-by: Frank Arnold <frank.arnold@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
SR-IOV can create buses without a bridge. There may be other cases
where this happens as well. In these cases skip to the parent bus
and continue testing devices there.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This did not work because devices are not put into the
pt_domain. Fix this.
Cc: stable@vger.kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
A few sparse warnings fire in drivers/iommu/amd_iommu_init.c.
Fix most of them with this patch. Also fix the sparse
warnings in drivers/iommu/irq_remapping.c while at it.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
write_file_bool() modifies 32 bits of data, so "amd_iommu_unmap_flush"
needs to be 32 bits as well or we'll corrupt memory. Fortunately it
looks like the data is aligned with a gap after the declaration so this
is harmless in production.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Work around broken devices and adhere to ACS support when determining
IOMMU grouping.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Add IOMMU group support to AMD-Vi device init and uninit code.
Existing notifiers make sure this gets called for each device.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
IOMMU device groups are currently a rather vague associative notion
with assembly required by the user or user level driver provider to
do anything useful. This patch intends to grow the IOMMU group concept
into something a bit more consumable.
To do this, we first create an object representing the group, struct
iommu_group. This structure is allocated (iommu_group_alloc) and
filled (iommu_group_add_device) by the iommu driver. The iommu driver
is free to add devices to the group using it's own set of policies.
This allows inclusion of devices based on physical hardware or topology
limitations of the platform, as well as soft requirements, such as
multi-function trust levels or peer-to-peer protection of the
interconnects. Each device may only belong to a single iommu group,
which is linked from struct device.iommu_group. IOMMU groups are
maintained using kobject reference counting, allowing for automatic
removal of empty, unreferenced groups. It is the responsibility of
the iommu driver to remove devices from the group
(iommu_group_remove_device).
IOMMU groups also include a userspace representation in sysfs under
/sys/kernel/iommu_groups. When allocated, each group is given a
dynamically assign ID (int). The ID is managed by the core IOMMU group
code to support multiple heterogeneous iommu drivers, which could
potentially collide in group naming/numbering. This also keeps group
IDs to small, easily managed values. A directory is created under
/sys/kernel/iommu_groups for each group. A further subdirectory named
"devices" contains links to each device within the group. The iommu_group
file in the device's sysfs directory, which formerly contained a group
number when read, is now a link to the iommu group. Example:
$ ls -l /sys/kernel/iommu_groups/26/devices/
total 0
lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 ->
../../../../devices/pci0000:00/0000:00:1e.0
lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 ->
../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0
lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 ->
../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1
$ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group
[truncating perms/owner/timestamp]
/sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group ->
../../../kernel/iommu_groups/26
/sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group ->
../../../../kernel/iommu_groups/26
/sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group ->
../../../../kernel/iommu_groups/26
Groups also include several exported functions for use by user level
driver providers, for example VFIO. These include:
iommu_group_get(): Acquires a reference to a group from a device
iommu_group_put(): Releases reference
iommu_group_for_each_dev(): Iterates over group devices using callback
iommu_group_[un]register_notifier(): Allows notification of device add
and remove operations relevant to the group
iommu_group_id(): Return the group number
This patch also extends the IOMMU API to allow attaching groups to
domains. This is currently a simple wrapper for iterating through
devices within a group, but it's expected that the IOMMU API may
eventually make groups a more integral part of domains.
Groups intentionally do not try to manage group ownership. A user
level driver provider must independently acquire ownership for each
device within a group before making use of the group as a whole.
This may change in the future if group usage becomes more pervasive
across both DMA and IOMMU ops.
Groups intentionally do not provide a mechanism for driver locking
or otherwise manipulating driver matching/probing of devices within
the group. Such interfaces are generic to devices and beyond the
scope of IOMMU groups. If implemented, user level providers have
ready access via iommu_group_for_each_dev and group notifiers.
iommu_device_group() is removed here as it has no users. The
replacement is:
group = iommu_group_get(dev);
id = iommu_group_id(group);
iommu_group_put(group);
AMD-Vi & Intel VT-d support re-added in following patches.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
When a device is added to the system at runtime the AMD
IOMMU driver initializes the necessary data structures to
handle translation for it. But it forgets to change the
per-device dma_ops to point to the AMD IOMMU driver. So
mapping actually never happens and all DMA accesses end in
an IO_PAGE_FAULT. Fix this.
Reported-by: Stefan Assmann <sassmann@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
In the error path of the ppr_notifer it can happen that the
iommu->lock is taken recursivly. This patch fixes the
problem by releasing the iommu->lock before any notifier is
invoked. This also requires to move the erratum workaround
for the ppr-log (interrupt may be faster than data in the log)
one function up.
Cc: stable@vger.kernel.org # v3.3, v3.4
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Due to a recent erratum it can happen that the head pointer
of the event-log is updated before the actual event-log
entry is written. This patch implements the recommended
workaround.
Cc: stable@vger.kernel.org # all stable kernels
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Unfortunatly the PRI spec changed and moved the
TLP-prefix-required bit to a different location. This patch
makes the necessary change in the AMD IOMMU driver.
Regressions are not expected because all hardware
implementing the PRI capability sets this bit to zero
anyway.
Cc: stable@vger.kernel.org # v3.3
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Pull DMA mapping branch from Marek Szyprowski:
"Short summary for the whole series:
A few limitations have been identified in the current dma-mapping
design and its implementations for various architectures. There exist
more than one function for allocating and freeing the buffers:
currently these 3 are used dma_{alloc, free}_coherent,
dma_{alloc,free}_writecombine, dma_{alloc,free}_noncoherent.
For most of the systems these calls are almost equivalent and can be
interchanged. For others, especially the truly non-coherent ones
(like ARM), the difference can be easily noticed in overall driver
performance. Sadly not all architectures provide implementations for
all of them, so the drivers might need to be adapted and cannot be
easily shared between different architectures. The provided patches
unify all these functions and hide the differences under the already
existing dma attributes concept. The thread with more references is
available here:
http://www.spinics.net/lists/linux-sh/msg09777.html
These patches are also a prerequisite for unifying DMA-mapping
implementation on ARM architecture with the common one provided by
dma_map_ops structure and extending it with IOMMU support. More
information is available in the following thread:
http://thread.gmane.org/gmane.linux.kernel.cross-arch/12819
More works on dma-mapping framework are planned, especially in the
area of buffer sharing and managing the shared mappings (together with
the recently introduced dma_buf interface: commit d15bd7ee44
"dma-buf: Introduce dma buffer sharing mechanism").
The patches in the current set introduce a new alloc/free methods
(with support for memory attributes) in dma_map_ops structure, which
will later replace dma_alloc_coherent and dma_alloc_writecombine
functions."
People finally started piping up with support for merging this, so I'm
merging it as the last of the pending stuff from the merge window.
Looks like pohmelfs is going to wait for 3.5 and more external support
for merging.
* 'for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
common: DMA-mapping: add NON-CONSISTENT attribute
common: DMA-mapping: add WRITE_COMBINE attribute
common: dma-mapping: introduce mmap method
common: dma-mapping: remove old alloc_coherent and free_coherent methods
Hexagon: adapt for dma_map_ops changes
Unicore32: adapt for dma_map_ops changes
Microblaze: adapt for dma_map_ops changes
SH: adapt for dma_map_ops changes
Alpha: adapt for dma_map_ops changes
SPARC: adapt for dma_map_ops changes
PowerPC: adapt for dma_map_ops changes
MIPS: adapt for dma_map_ops changes
X86 & IA64: adapt for dma_map_ops changes
common: dma-mapping: introduce generic alloc() and free() methods
Adapt core x86 and IA64 architecture code for dma_map_ops changes: replace
alloc/free_coherent with generic alloc/free methods.
Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
[removed swiotlb related changes and replaced it with wrappers,
merged with IA64 patch to avoid inter-patch dependences in intel-iommu code]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Fix the following section warnings :
WARNING: vmlinux.o(.text+0x49dbc): Section mismatch in reference
from the function acpi_map_cpu2node() to the variable
.cpuinit.data:__apicid_to_node The function acpi_map_cpu2node()
references the variable __cpuinitdata __apicid_to_node. This is
often because acpi_map_cpu2node lacks a __cpuinitdata
annotation or the annotation of __apicid_to_node is wrong.
WARNING: vmlinux.o(.text+0x49dc1): Section mismatch in reference
from the function acpi_map_cpu2node() to the function
.cpuinit.text:numa_set_node() The function acpi_map_cpu2node()
references the function __cpuinit numa_set_node(). This is often
because acpi_map_cpu2node lacks a __cpuinit annotation or the
annotation of numa_set_node is wrong.
WARNING: vmlinux.o(.text+0x526e77): Section mismatch in
reference from the function prealloc_protection_domains() to the
function .init.text:alloc_passthrough_domain() The function
prealloc_protection_domains() references the function __init
alloc_passthrough_domain(). This is often because
prealloc_protection_domains lacks a __init annotation or the annotation of alloc_passthrough_domain is wrong.
Signed-off-by: Steffen Persvold <sp@numascale.com>
Link: http://lkml.kernel.org/r/1331810188-24785-1-git-send-email-sp@numascale.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On some systems the IVRS table does not contain all PCI
devices present in the system. In case a device not present
in the IVRS table is translated by the IOMMU no DMA is
possible from that device by default.
This patch fixes this by removing the DTE entry for every
PCI device present in the system and not covered by IVRS.
Cc: stable@vger.kernel.org # >= 3.0
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The IOMMUv2 driver added a few statistic counter which are
interesting in the iommu=pt mode too. So initialize the
statistic counter for that mode too.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This function can be used to find out which features
necessary for IOMMUv2 usage are available on a given device.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The symbolic register names for PCI and PASID changed in
PCI code. This patch adapts the AMD IOMMU driver to these
changes.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The AMD IOMMUv2 driver needs to get the IOMMUv2 domain
associated with a particular device. This patch adds a
function to get this information.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
To send completions for PPR requests this patch adds a
function which can be used by the IOMMUv2 driver.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch adds functions necessary to set and clear the
GCR3 values associated with a particular PASID in an IOMMUv2
domain.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The functions added with this patch allow to manage the
IOMMU and the device TLBs for all devices in an IOMMUv2
domain.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This function can be used to switch a domain into
paging-mode 0. In this mode all devices can access physical
system memory directly without any remapping.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
If the device starts to use IOMMUv2 features the dma handles
need to stay valid. The only sane way to do this is to use a
identity mapping for the device and not translate it by the
iommu. This is implemented with this patch. Since this lifts
the device-isolation there is also a new kernel parameter
which allows to disable that feature.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Convert the contents of 'struct dev_table_entry' to u64 to
allow updating the DTE wit 64bit writes as required by the
spec.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The option iommu=group_mf indicates the that the iommu driver should
expose all functions of a multi-function PCI device as the same
iommu_device_group. This is useful for disallowing individual functions
being exposed as independent devices to userspace as there are often
hidden dependencies. Virtual functions are not affected by this option.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Just use the amd_iommu_alias_table directly.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Let the IOMMU core know we support arbitrary page sizes (as long as
they're an order of 4KiB).
This way the IOMMU core will retain the existing behavior we're used to;
it will let us map regions that:
- their size is an order of 4KiB
- they are naturally aligned
Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Joerg Roedel <Joerg.Roedel@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Express sizes in bytes rather than in page order, to eliminate the
size->order->size conversions we have whenever the IOMMU API is calling
the low level drivers' map/unmap methods.
Adopt all existing drivers.
Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Cc: David Brown <davidb@codeaurora.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <Joerg.Roedel@amd.com>
Cc: Stepan Moskovchenko <stepanm@codeaurora.org>
Cc: KyongHo Cho <pullip.cho@samsung.com>
Cc: Hiroshi DOYU <hdoyu@nvidia.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The shift direction was wrong because the function takes a
page number and i is the address is the loop.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The domain_flush_devices() function takes the domain->lock.
But this function is only called from update_domain() which
itself is already called unter the domain->lock. This causes
a deadlock situation when the dma-address-space of a domain
grows larger than 1GB.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The value is only set to true but never set back to false,
which causes to many completion-wait commands to be sent to
hardware. Fix it with this patch.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Reserve the MSI address range in the address allocator so
that MSI addresses are not handed out as dma handles.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
A few parts of the driver were missing in drivers/iommu.
Move them there to have the complete driver in that
directory.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This should ease finding similarities with different platforms,
with the intention of solving problems once in a generic framework
which everyone can use.
Compile-tested on x86_64.
Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>