linux/arch/x86/mm
Andy Lutomirski 94b1b03b51 x86/mm: Rework lazy TLB mode and TLB freshness tracking
x86's lazy TLB mode used to be fairly weak -- it would switch to
init_mm the first time it tried to flush a lazy TLB.  This meant an
unnecessary CR3 write and, if the flush was remote, an unnecessary
IPI.

Rewrite it entirely.  When we enter lazy mode, we simply remove the
CPU from mm_cpumask.  This means that we need a way to figure out
whether we've missed a flush when we switch back out of lazy mode.
I use the tlb_gen machinery to track whether a context is up to
date.

Note to reviewers: this patch, my itself, looks a bit odd.  I'm
using an array of length 1 containing (ctx_id, tlb_gen) rather than
just storing tlb_gen, and making it at array isn't necessary yet.
I'm doing this because the next few patches add PCID support, and,
with PCID, we need ctx_id, and the array will end up with a length
greater than 1.  Making it an array now means that there will be
less churn and therefore less stress on your eyeballs.

NB: This is dubious but, AFAICT, still correct on Xen and UV.
xen_exit_mmap() uses mm_cpumask() for nefarious purposes and this
patch changes the way that mm_cpumask() works.  This should be okay,
since Xen *also* iterates all online CPUs to find all the CPUs it
needs to twiddle.

The UV tlbflush code is rather dated and should be changed.

Here are some benchmark results, done on a Skylake laptop at 2.3 GHz
(turbo off, intel_pstate requesting max performance) under KVM with
the guest using idle=poll (to avoid artifacts when bouncing between
CPUs).  I haven't done any real statistics here -- I just ran them
in a loop and picked the fastest results that didn't look like
outliers.  Unpatched means commit a4eb8b9935, so all the
bookkeeping overhead is gone.

MADV_DONTNEED; touch the page; switch CPUs using sched_setaffinity.  In
an unpatched kernel, MADV_DONTNEED will send an IPI to the previous CPU.
This is intended to be a nearly worst-case test.

  patched:         13.4µs
  unpatched:       21.6µs

Vitaly's pthread_mmap microbenchmark with 8 threads (on four cores),
nrounds = 100, 256M data

  patched:         1.1 seconds or so
  unpatched:       1.9 seconds or so

The sleepup on Vitaly's test appearss to be because it spends a lot
of time blocked on mmap_sem, and this patch avoids sending IPIs to
blocked CPUs.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Nadav Amit <nadav.amit@gmail.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Travis <travis@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/ddf2c92962339f4ba39d8fc41b853936ec0b44f1.1498751203.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-05 10:52:57 +02:00
..
kmemcheck x86/mm: Audit and remove any unnecessary uses of module.h 2016-07-14 13:04:20 +02:00
Makefile x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation 2017-06-13 08:56:50 +02:00
amdtopology.c x86/boot/e820: Move asm/e820.h to asm/e820/api.h 2017-01-28 09:31:13 +01:00
debug_pagetables.c x86/mm/ptdump: Make (debugfs)/kernel_page_tables read-only 2015-12-04 12:55:01 +01:00
dump_pagetables.c x86/boot/64: Rename init_level4_pgt and early_level4_pgt 2017-06-13 08:56:55 +02:00
extable.c x86/debug: Handle early WARN_ONs proper 2017-06-12 21:17:48 +02:00
fault.c x86/mm: Split read_cr3() into read_cr3_pa() and __read_cr3() 2017-06-13 08:48:09 +02:00
highmem_32.c x86/mm: Audit and remove any unnecessary uses of module.h 2016-07-14 13:04:20 +02:00
hugetlbpage.c mm: larger stack guard gap, between vmas 2017-06-19 21:50:20 +08:00
ident_map.c x86/mm: Add support for gbpages to kernel_ident_mapping_init() 2017-05-08 08:28:40 +02:00
init.c x86/mm: Rework lazy TLB mode and TLB freshness tracking 2017-07-05 10:52:57 +02:00
init_32.c x86: use set_memory.h header 2017-05-08 17:15:13 -07:00
init_64.c Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-07-03 14:45:09 -07:00
iomap_32.c x86/mm: Audit and remove any unnecessary uses of module.h 2016-07-14 13:04:20 +02:00
ioremap.c x86/mm: Split read_cr3() into read_cr3_pa() and __read_cr3() 2017-06-13 08:48:09 +02:00
kasan_init_64.c x86/boot/64: Rename init_level4_pgt and early_level4_pgt 2017-06-13 08:56:55 +02:00
kaslr.c x86/mm: Add support for 5-level paging for KASLR 2017-06-13 08:56:58 +02:00
kmmio.c x86/mm: Audit and remove any unnecessary uses of module.h 2016-07-14 13:04:20 +02:00
mm_internal.h x86: Enable PAT to use cache mode translation tables 2014-11-16 11:04:26 +01:00
mmap.c x86/mmap, ASLR: Do not treat unlimited-stack tasks as legacy mmap 2017-06-24 08:39:16 +02:00
mmio-mod.c x86/boot/e820: Move asm/e820.h to asm/e820/api.h 2017-01-28 09:31:13 +01:00
mpx.c x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space 2017-04-12 08:40:58 +02:00
numa.c Merge branch 'x86/boot' into x86/mm, to avoid conflict 2017-04-11 08:56:05 +02:00
numa_32.c x86/mm/32: Set the '__vmalloc_start_set' flag in initmem_init() 2017-05-09 08:12:27 +02:00
numa_64.c x86, mm: kill numa_free_all_bootmem() 2012-11-17 11:59:47 -08:00
numa_emulation.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
numa_internal.h x86-32, mm: Rip out x86_32 NUMA remapping code 2013-01-31 14:12:30 -08:00
pageattr-test.c x86/mm/pat: Make mm/pageattr[-test].c explicitly non-modular 2015-08-25 09:48:38 +02:00
pageattr.c x86/mm/ftrace: Do not bug in early boot on irqs_disabled in cpu_flush_range() 2017-05-26 22:37:01 -04:00
pat.c Revert "x86/PAT: Fix Xorg regression on CPUs that don't support PAT" 2017-06-01 15:52:23 +02:00
pat_internal.h x86/mm/pat: Convert to pr_*() usage 2015-05-27 14:40:59 +02:00
pat_rbtree.c x86/mm/pat: Use rb_entry() 2017-02-04 17:18:00 +01:00
pf_in.c x86/mm: Audit and remove any unnecessary uses of module.h 2016-07-14 13:04:20 +02:00
pf_in.h
pgtable.c x86/mm: Add basic defines/helpers for CONFIG_X86_5LEVEL=y 2017-04-04 08:22:34 +02:00
pgtable_32.c Merge branch 'x86/boot' into x86/mm, to avoid conflict 2017-04-11 08:56:05 +02:00
physaddr.c x86/mm: Audit and remove any unnecessary uses of module.h 2016-07-14 13:04:20 +02:00
physaddr.h
pkeys.c x86/fpu: Finish excising 'eagerfpu' 2016-10-18 09:56:03 +02:00
setup_nx.c Revert "x86/mm/32: Set NX in __supported_pte_mask before enabling paging" 2016-04-26 19:52:57 +02:00
srat.c x86/boot/e820: Move asm/e820.h to asm/e820/api.h 2017-01-28 09:31:13 +01:00
testmmiotrace.c Annotate hardware config module parameters in arch/x86/mm/ 2017-04-04 16:54:21 +01:00
tlb.c x86/mm: Rework lazy TLB mode and TLB freshness tracking 2017-07-05 10:52:57 +02:00