linux/include
Joerg Roedel 1897bdc4d3 mmu_notifier: add mmu_notifier_invalidate_range()
This notifier closes an important gap in the current mmu_notifier
implementation, the existing callbacks are called too early or too late to
reliably manage a non-CPU TLB.  Specifically, invalidate_range_start() is
called when all pages are still mapped and invalidate_range_end() when all
pages are unmapped and potentially freed.

This is fine when the users of the mmu_notifiers manage their own SoftTLB,
like KVM does.  When the TLB is managed in software it is easy to wipe out
entries for a given range and prevent new entries to be established until
invalidate_range_end is called.

But when the user of mmu_notifiers has to manage a hardware TLB it can
still wipe out TLB entries in invalidate_range_start, but it can't make
sure that no new TLB entries in the given range are established between
invalidate_range_start and invalidate_range_end.

To avoid silent data corruption the entries in the non-CPU TLB need to be
flushed when the pages are unmapped (at this point in time no _new_ TLB
entries can be established in the non-CPU TLB) but not yet freed (as the
non-CPU TLB may still have _existing_ entries pointing to the pages about
to be freed).

To fix this problem we need to catch the moment when the Linux VMM flushes
remote TLBs (as a non-CPU TLB is not very CPU TLB), as this is the point
in time when the pages are unmapped but _not_ yet freed.

The mmu_notifier_invalidate_range() function aims to catch that moment.

IOMMU code will be one user of the notifier-callback.  Currently this is
only the AMD IOMMUv2 driver, but its code is about to be more generalized
and converted to a generic IOMMU-API extension to fit the needs of similar
functionality in other IOMMUs as well.

The current attempt in the AMD IOMMUv2 driver to work around the
invalidate_range_start/end() shortcoming is to assign an empty page table
to the non-CPU TLB between any invalidata_range_start/end calls.  With the
empty page-table assigned, every page-table walk to re-fill the non-CPU
TLB will cause a page-fault reported to the IOMMU driver via an interrupt,
possibly causing interrupt storms.

The page-fault handler in the AMD IOMMUv2 driver doesn't handle the fault
if an invalidate_range_start/end pair is active, it just reports back
SUCCESS to the device and let it refault the page.  But existing hardware
(newer Radeon GPUs) that makes use of this feature don't re-fault
indefinitly, after a certain number of faults for the same address the
device enters a failure state and needs to be resetted.

To avoid the GPUs entering a failure state we need to get rid of the
empty-page-table workaround and use the mmu_notifier_invalidate_range()
function introduced with this patch.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Jay Cornwall <Jay.Cornwall@amd.com>
Cc: Oded Gabbay <Oded.Gabbay@amd.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
2014-11-13 13:46:09 +11:00
..
acpi ACPI and power management updates for 3.18-rc2 2014-10-24 11:29:31 -07:00
asm-generic Merge git://git.infradead.org/users/eparis/audit 2014-10-19 16:25:56 -07:00
clocksource
crypto crypto: LLVMLinux: Add macro to remove use of VLAIS in crypto code 2014-10-14 10:51:22 +02:00
drm drm: Per-plane locking 2014-11-12 17:56:12 +10:00
dt-bindings ARM: imx: clk-vf610: define PLL's clock tree 2014-11-04 13:40:14 +08:00
keys KEYS: Restore partial ID matching functionality for asymmetric keys 2014-10-06 15:21:05 +01:00
kvm arm/arm64: KVM: Fix BE accesses to GICv2 EISR and ELRSR regs 2014-10-16 10:57:41 +02:00
linux mmu_notifier: add mmu_notifier_invalidate_range() 2014-11-13 13:46:09 +11:00
math-emu
media Merge branch 'patchwork' into v4l_for_linus 2014-10-09 14:00:54 -03:00
memory
misc cxl: Add new header for call backs and structs 2014-10-08 20:15:43 +11:00
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf 2014-10-31 12:29:42 -04:00
pcmcia
ras PCI/AER: Rename PCI_ERR_UNC_TRAIN to PCI_ERR_UNC_UND 2014-09-25 09:42:40 -06:00
rdma IB/mlx5, iser, isert: Add Signature API additions 2014-10-09 00:10:53 -07:00
rxrpc
scsi scsi: set REQ_QUEUE for the blk-mq case 2014-10-28 09:53:43 +01:00
soc/tegra
sound Merge branch 'for-linus' of git://git.infradead.org/users/vkoul/slave-dma 2014-10-18 18:11:04 -07:00
target target: Add force_pr_aptpl device attribute 2014-10-04 05:41:20 +00:00
trace Merge branch 'urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent 2014-10-30 07:37:37 +01:00
uapi Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-10-31 14:05:35 -07:00
video fbdev changes for 3.18 2014-10-18 18:03:02 -07:00
xen xen: remove DEFINE_XENBUS_DRIVER() macro 2014-10-06 10:27:57 +01:00
Kbuild