linux/drivers
Alexey Kardashevskiy 7f92891778 vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver
POWER9 Witherspoon machines come with 4 or 6 V100 GPUs which are not
pluggable PCIe devices but still have PCIe links which are used
for config space and MMIO. In addition to that the GPUs have 6 NVLinks
which are connected to other GPUs and the POWER9 CPU. POWER9 chips
have a special unit on a die called an NPU which is an NVLink2 host bus
adapter with p2p connections to 2 to 3 GPUs, 3 or 2 NVLinks to each.
These systems also support ATS (address translation services) which is
a part of the NVLink2 protocol. Such GPUs also share on-board RAM
(16GB or 32GB) to the system via the same NVLink2 so a CPU has
cache-coherent access to a GPU RAM.

This exports GPU RAM to the userspace as a new VFIO device region. This
preregisters the new memory as device memory as it might be used for DMA.
This inserts pfns from the fault handler as the GPU memory is not onlined
until the vendor driver is loaded and trained the NVLinks so doing this
earlier causes low level errors which we fence in the firmware so
it does not hurt the host system but still better be avoided; for the same
reason this does not map GPU RAM into the host kernel (usual thing for
emulated access otherwise).

This exports an ATSD (Address Translation Shootdown) register of NPU which
allows TLB invalidations inside GPU for an operating system. The register
conveniently occupies a single 64k page. It is also presented to
the userspace as a new VFIO device region. One NPU has 8 ATSD registers,
each of them can be used for TLB invalidation in a GPU linked to this NPU.
This allocates one ATSD register per an NVLink bridge allowing passing
up to 6 registers. Due to the host firmware bug (just recently fixed),
only 1 ATSD register per NPU was actually advertised to the host system
so this passes that alone register via the first NVLink bridge device in
the group which is still enough as QEMU collects them all back and
presents to the guest via vPHB to mimic the emulated NPU PHB on the host.

In order to provide the userspace with the information about GPU-to-NVLink
connections, this exports an additional capability called "tgt"
(which is an abbreviated host system bus address). The "tgt" property
tells the GPU its own system address and allows the guest driver to
conglomerate the routing information so each GPU knows how to get directly
to the other GPUs.

For ATS to work, the nest MMU (an NVIDIA block in a P9 CPU) needs to
know LPID (a logical partition ID or a KVM guest hardware ID in other
words) and PID (a memory context ID of a userspace process, not to be
confused with a linux pid). This assigns a GPU to LPID in the NPU and
this is why this adds a listener for KVM on an IOMMU group. A PID comes
via NVLink from a GPU and NPU uses a PID wildcard to pass it through.

This requires coherent memory and ATSD to be available on the host as
the GPU vendor only supports configurations with both features enabled
and other configurations are known not to work. Because of this and
because of the ways the features are advertised to the host system
(which is a device tree with very platform specific properties),
this requires enabled POWERNV platform.

The V100 GPUs do not advertise any of these capabilities via the config
space and there are more than just one device ID so this relies on
the platform to tell whether these GPUs have special abilities such as
NVLinks.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21 16:20:47 +11:00
..
accessibility
acpi acpi/nfit, x86/mce: Validate a MCE's address before using it 2018-11-06 19:13:26 +01:00
amba
android
ata sata_rcar: convert to SPDX identifiers 2018-11-08 06:32:16 -07:00
atm atm: zatm: Fix empty body Clang warnings 2018-10-18 15:39:10 -07:00
auxdisplay The Compiler Attributes series 2018-11-01 18:34:46 -07:00
base mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock 2018-10-31 08:54:17 -07:00
bcma
block xen: fixes for 4.20-rc2 2018-11-10 08:58:48 -06:00
bluetooth Merge branch 'work.tty-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-10-24 14:43:41 +01:00
bus ARM: SoC driver updates for 4.17 2018-10-29 15:16:01 -07:00
cdrom gdrom: fix mistake in assignment of error 2018-10-25 11:17:40 -06:00
char RTC for 4.20 2018-10-27 09:24:24 -07:00
clk clk: qcom: gcc: Fix board clock node name 2018-11-09 14:13:55 -08:00
clocksource Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-11-11 16:41:50 -06:00
connector
cpufreq drivers/cpufreq: change CONFIG_6xx to CONFIG_PPC_BOOK3S_32 2018-11-26 22:33:37 +11:00
cpuidle powerpc/pseries/cpuidle: Fix preempt warning 2018-12-04 19:45:01 +11:00
crypto crypto4xx_core: don't abuse __dma_sync_page 2018-12-20 22:21:20 +11:00
dax
dca
devfreq
dio
dma pci-v4.20-changes 2018-10-25 06:50:48 -07:00
dma-buf
edac * skx_edac: Address translation for NVDIMMs (Tony Luck and Qiuxu Zhuo) 2018-11-02 11:17:22 -07:00
eisa
extcon
firewire
firmware Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-11-03 18:25:17 -07:00
fmc
fpga fpga: add devm_fpga_region_create 2018-10-16 11:13:50 +02:00
fsi iov_iter: Separate type from direction and use accessor functions 2018-10-24 00:41:07 +01:00
gnss
gpio pci-v4.20-changes 2018-10-25 06:50:48 -07:00
gpu drm: i915, amdgpu, sun4i, exynos and etnaviv fixes 2018-11-10 13:29:47 -06:00
hid HID: asus: fix build warning wiht CONFIG_ASUS_WMI disabled 2018-11-06 13:57:42 +01:00
hsi
hv hv_balloon: Replace spin_is_locked() with lockdep 2018-10-15 20:54:17 +02:00
hwmon hwmon: (ibmpowernv) Remove bogus __init annotations 2018-11-04 15:55:12 -08:00
hwspinlock
hwtracing stm class: Use memcat_p() 2018-10-11 12:12:55 +02:00
i2c i2c: nvidia-gpu: make pm_ops static 2018-11-09 17:56:44 +01:00
ide
idle Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-10-23 13:32:18 +01:00
iio Staging/IIO patches for 4.20-rc1 2018-10-29 10:38:10 -07:00
infiniband Revert "mm, mmu_notifier: annotate mmu notifiers with blockable invalidate callbacks" 2018-10-26 16:25:19 -07:00
input Merge branch 'xarray' of git://git.infradead.org/users/willy/linux-dax 2018-10-28 11:35:40 -07:00
iommu mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
ipack
irqchip irqchip/irq-mvebu-sei: Fix a NULL vs IS_ERR() bug in probe function 2018-11-01 12:38:48 +01:00
isdn Merge branch 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-11-01 19:58:52 -07:00
leds LED fixes for 4.20-rc2 2018-11-08 17:49:04 -06:00
lightnvm
macintosh macintosh: Use device_type helpers to access the node type 2018-11-26 22:33:37 +11:00
mailbox - Convert print users to use the %pOFn format specifier 2018-10-29 10:30:44 -07:00
mcb
md for-linus-20181102 2018-11-02 11:25:48 -07:00
media media updates for v4.20-rc1 2018-10-31 10:53:29 -07:00
memory
memstick
message
mfd chrome-platform for v4.20 2018-10-31 16:47:55 -07:00
misc ocxl: Fix endiannes bug in read_afu_name() 2018-12-21 14:46:50 +11:00
mmc Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-10-23 13:32:18 +01:00
mtd mtd: sa1100: avoid VLA in sa1100_setup_mtd 2018-11-06 10:38:17 +01:00
mux This is the bulk of GPIO changes for the v4.20 series: 2018-10-23 08:45:05 +01:00
net net: dsa: mv88e6xxx: Fix clearing of stats counters 2018-11-11 10:19:10 -08:00
nfc NFC: nfcmrvl_uart: fix OF child-node lookup 2018-10-23 13:28:53 -05:00
ntb ntb: idt: Alter the driver info comments 2018-11-01 10:33:12 -04:00
nubus
nvdimm libnvdimm for 4.20 2018-10-25 06:31:56 -07:00
nvme for-linus-20181109 2018-11-09 16:31:51 -06:00
nvmem nvmem: hide unused nvmem_find_cell_by_index function 2018-10-15 15:56:15 +02:00
of Devicetree fixes for 4.20-rc: 2018-11-09 16:41:58 -06:00
opp
oprofile
parisc parisc: Add alternative coding infrastructure 2018-10-17 17:22:26 +02:00
parport
pci Merge branch 'xarray' of git://git.infradead.org/users/willy/linux-dax 2018-10-28 11:35:40 -07:00
pcmcia powerpc updates for 4.20 2018-10-26 14:36:21 -07:00
perf arm64 updates for 4.20: 2018-10-22 17:30:06 +01:00
phy USB/PHY patches for 4.20-rc1 2018-10-26 08:14:13 -07:00
pinctrl This is the bulk of GPIO changes for the v4.20 series: 2018-10-23 08:45:05 +01:00
platform platform-drivers-x86 for v4.20-1 2018-11-01 08:42:21 -07:00
pnp
power Devicetree updates for 4.20: 2018-10-26 12:09:58 -07:00
powercap Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-10-23 13:32:18 +01:00
pps
ps3
ptp ptp: drop redundant kasprintf() to create worker name 2018-10-28 19:20:06 -07:00
pwm pwm: lpss: Only set update bit if we are actually changing the settings 2018-10-16 13:16:15 +02:00
rapidio
ras
regulator regulator: Regulator updates for next release 2018-10-23 01:54:44 +01:00
remoteproc remoteproc: qcom: q6v5-mss: Register segments/dumpfn for coredump 2018-10-19 12:54:03 -07:00
reset ARM: SoC driver updates for 4.17 2018-10-29 15:16:01 -07:00
rpmsg
rtc rtc: sc27xx: Always read normal alarm when registering RTC device 2018-10-25 02:35:42 +02:00
s390 s390/qeth: report 25Gbit link speed 2018-11-03 10:44:06 -07:00
sbus
scsi Kbuild updates for v4.20 (2nd) 2018-11-03 10:47:33 -07:00
sfi mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
sh
siox
slimbus
sn
soc soc: ti: QMSS: Fix usage of irq_set_affinity_hint 2018-11-02 11:22:09 -07:00
soundwire
spi - New Drivers 2018-10-25 06:19:15 -07:00
spmi
ssb
staging media updates for v4.20-rc1 2018-10-31 10:53:29 -07:00
target SCSI misc on 20181102 2018-11-03 10:34:03 -07:00
tc TC: Set DMA masks for devices 2018-10-11 09:16:44 -07:00
tee
thermal Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux 2018-10-31 11:28:12 -07:00
thunderbolt
tty TTY/Serial fixes for 4.20-rc2 2018-11-10 13:32:14 -06:00
uio
usb usb: typec: ucsi: add support for Cypress CCGx 2018-11-09 18:49:59 +01:00
uwb
vfio vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver 2018-12-21 16:20:47 +11:00
vhost virtio, vhost: fixes, tweaks 2018-11-01 14:42:49 -07:00
video fbdev changes for v4.20: 2018-10-31 11:41:37 -07:00
virt
virtio virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON 2018-10-24 20:57:55 -04:00
visorbus
vlynq
vme
w1 w1: IAD Register is yet readable trough iad sys file. Fix snprintf (%u for unsigned, count for max size). 2018-10-15 20:50:32 +02:00
watchdog watchdog: ts4800: release syscon device node in ts4800_wdt_probe() 2018-10-22 10:16:28 +02:00
xen xen: fixes for 4.20-rc2 2018-11-10 08:58:48 -06:00
zorro
Kconfig
Makefile