linux/kernel
Aaron Lu 6fcb52a56f thp: reduce usage of huge zero page's atomic counter
The global zero page is used to satisfy an anonymous read fault.  If
THP(Transparent HugePage) is enabled then the global huge zero page is
used.  The global huge zero page uses an atomic counter for reference
counting and is allocated/freed dynamically according to its counter
value.

CPU time spent on that counter will greatly increase if there are a lot
of processes doing anonymous read faults.  This patch proposes a way to
reduce the access to the global counter so that the CPU load can be
reduced accordingly.

To do this, a new flag of the mm_struct is introduced:
MMF_USED_HUGE_ZERO_PAGE.  With this flag, the process only need to touch
the global counter in two cases:

 1 The first time it uses the global huge zero page;
 2 The time when mm_user of its mm_struct reaches zero.

Note that right now, the huge zero page is eligible to be freed as soon
as its last use goes away.  With this patch, the page will not be
eligible to be freed until the exit of the last process from which it
was ever used.

And with the use of mm_user, the kthread is not eligible to use huge
zero page either.  Since no kthread is using huge zero page today, there
is no difference after applying this patch.  But if that is not desired,
I can change it to when mm_count reaches zero.

Case used for test on Haswell EP:

  usemem -n 72 --readonly -j 0x200000 100G

Which spawns 72 processes and each will mmap 100G anonymous space and
then do read only access to that space sequentially with a step of 2MB.

  CPU cycles from perf report for base commit:
      54.03%  usemem   [kernel.kallsyms]   [k] get_huge_zero_page
  CPU cycles from perf report for this commit:
       0.11%  usemem   [kernel.kallsyms]   [k] mm_get_huge_zero_page

Performance(throughput) of the workload for base commit: 1784430792
Performance(throughput) of the workload for this commit: 4726928591
164% increase.

Runtime of the workload for base commit: 707592 us
Runtime of the workload for this commit: 303970 us
50% drop.

Link: http://lkml.kernel.org/r/fe51a88f-446a-4622-1363-ad1282d71385@intel.com
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Ebru Akagunduz <ebru.akagunduz@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07 18:46:28 -07:00
..
bpf bpf: allow access into map value arrays 2016-09-29 01:35:35 -04:00
configs KVM updates for v4.9-rc1 2016-10-06 10:49:01 -07:00
debug mm/init: Add 'rodata=off' boot cmdline parameter to disable read-only kernel mappings 2016-02-22 08:51:37 +01:00
events Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2016-10-05 10:11:24 -07:00
gcov gcov: add support for gcc version >= 6 2016-07-15 14:54:27 +09:00
irq genirq: Make function __irq_do_set_handler() static 2016-09-25 16:46:52 -04:00
livepatch livepatch/module: make TAINT_LIVEPATCH module-specific 2016-08-26 14:42:08 +02:00
locking locking/lglock: Remove lglock implementation 2016-09-22 15:25:56 +02:00
power oom, suspend: fix oom_killer_disable vs. pm suspend properly 2016-10-07 18:46:27 -07:00
printk printk/nmi: avoid direct printk()-s from __printk_nmi_flush() 2016-09-01 17:52:01 -07:00
rcu Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-10-03 12:15:00 -07:00
sched Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-10-03 16:13:28 -07:00
time tick/nohz: Prevent stopping the tick on an offline CPU 2016-09-13 17:53:52 +02:00
trace Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2016-10-07 12:24:08 -07:00
.gitignore
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
Makefile userns: Add per user namespace sysctls. 2016-08-08 13:18:58 -05:00
acct.c
async.c
audit.c Merge branch 'stable-4.9' of git://git.infradead.org/users/pcmoore/audit 2016-10-04 14:21:41 -07:00
audit.h Merge branch 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit 2016-07-29 17:54:17 -07:00
audit_fsnotify.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
audit_tree.c audit: cleanup prune_tree_thread 2016-04-04 09:46:47 -04:00
audit_watch.c Merge branch 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit 2016-09-01 15:55:56 -07:00
auditfilter.c audit: add fields to exclude filter by reusing user filter 2016-06-27 11:01:00 -04:00
auditsc.c Merge branch 'stable-4.9' of git://git.infradead.org/users/pcmoore/audit 2016-10-04 14:21:41 -07:00
backtracetest.c
bounds.c
capability.c kernel: Add noaudit variant of ns_capable() 2016-06-06 20:16:18 +10:00
cgroup.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2016-10-06 09:52:23 -07:00
cgroup_freezer.c
cgroup_pids.c cgroup: Use lld instead of ld when printing pids controller events_limit 2016-06-21 15:03:36 -04:00
compat.c
configs.c
context_tracking.c
cpu.c Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-10-03 19:43:08 -07:00
cpu_pm.c
cpuset.c cpuset: fix non static symbol warning 2016-09-16 11:31:17 -04:00
crash_dump.c
cred.c cred: Reject inodes with invalid ids in set_create_file_as() 2016-06-30 18:05:09 -05:00
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c mm, oom: enforce exit_oom_victim on current task 2016-10-07 18:46:28 -07:00
extable.c
fork.c thp: reduce usage of huge zero page's atomic counter 2016-10-07 18:46:28 -07:00
freezer.c freezer, oom: check TIF_MEMDIE on the correct task 2016-07-28 16:07:41 -07:00
futex.c futex: Add some more function commentry 2016-09-05 17:20:18 +02:00
futex_compat.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-01-20 17:09:18 -08:00
groups.c
hung_task.c locking/hung_task: Show all locks 2016-08-24 12:16:13 +02:00
irq_work.c
jump_label.c powerpc updates for 4.8 #2 2016-08-05 09:00:54 -04:00
kallsyms.c kallsyms: add support for relative offsets in kallsyms address table 2016-03-15 16:55:16 -07:00
kcmp.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-01-20 17:09:18 -08:00
kcov.c kernel/kcov: unproxify debugfs file's fops 2016-06-15 04:56:35 -07:00
kexec.c kexec: allow architectures to override boot mapping 2016-08-02 19:35:27 -04:00
kexec_core.c kexec: add restriction on kexec_load() segment sizes 2016-08-02 19:35:31 -04:00
kexec_file.c kexec: fix double-free when failing to relocate the purgatory 2016-09-01 17:52:01 -07:00
kexec_internal.h kexec: move some memembers and definitions within the scope of CONFIG_KEXEC_FILE 2016-01-20 17:09:18 -08:00
kmod.c
kprobes.c
ksysfs.c kexec: add a kexec_crash_loaded() function 2016-08-02 19:35:30 -04:00
kthread.c kthread: Pin the stack via try_get_task_stack()/put_task_stack() in to_live_kthread() function 2016-09-16 09:18:53 +02:00
latencytop.c sched/debug: Make schedstats a runtime tunable that is disabled by default 2016-02-09 11:54:23 +01:00
membarrier.c
memremap.c mm: fix cache mode of dax pmd mappings 2016-09-09 17:34:46 -07:00
module-internal.h
module.c livepatch/module: make TAINT_LIVEPATCH module-specific 2016-08-26 14:42:08 +02:00
module_signing.c KEYS: Move the point of trust determination to __key_link() 2016-04-11 22:43:43 +01:00
notifier.c
nsproxy.c cgroup: introduce cgroup namespaces 2016-02-16 13:04:58 -05:00
padata.c padata: Convert to hotplug state machine 2016-09-19 21:44:30 +02:00
panic.c kexec: use core_param for crash_kexec_post_notifiers boot option 2016-08-02 19:35:29 -04:00
params.c
pid.c remove lots of IS_ERR_VALUE abuses 2016-05-27 15:26:11 -07:00
pid_namespace.c Merge branch 'nsfs-ioctls' into HEAD 2016-09-22 20:00:36 -05:00
profile.c profile: Convert to hotplug state machine 2016-07-15 10:41:42 +02:00
ptrace.c tree-wide: replace config_enabled() with IS_ENABLED() 2016-08-04 08:50:07 -04:00
range.c
reboot.c
relay.c relayfs: Convert to hotplug state machine 2016-09-06 18:30:20 +02:00
resource.c /proc/iomem: only expose physical resource addresses to privileged users 2016-04-14 12:56:09 -07:00
seccomp.c seccomp: Fix tracer exit notifications during fatal signals 2016-08-30 16:12:46 -07:00
signal.c x86/signal: Add SA_{X32,IA32}_ABI sa_flags 2016-09-14 21:28:11 +02:00
smp.c smp: Allocate smp_call_on_cpu() workqueue on stack too 2016-09-22 14:49:10 +02:00
smpboot.c sched/core: Do not use smp_processor_id() with preempt enabled in smpboot_thread_fn() 2016-09-22 12:28:00 +02:00
smpboot.h cpu/hotplug: Create hotplug threads 2016-03-01 20:36:56 +01:00
softirq.c Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-10-03 19:43:08 -07:00
stacktrace.c
stop_machine.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-10-03 13:39:00 -07:00
sys.c prctl: make PR_SET_THP_DISABLE wait for mmap_sem killable 2016-05-23 17:04:14 -07:00
sys_ni.c
sysctl.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2016-10-06 09:52:23 -07:00
sysctl_binary.c kernel/sysctl_binary.c: use generic UUID library 2016-05-20 17:58:30 -07:00
task_work.c task_work: use READ_ONCE/lockless_dereference, avoid pi_lock if !task_works 2016-08-02 19:35:02 -04:00
taskstats.c taskstats: use the libnl API to align nlattr on 64-bit 2016-04-23 20:13:25 -04:00
test_kprobes.c
torture.c torture: Convert torture_shutdown() to hrtimer 2016-08-22 10:01:49 -07:00
tracepoint.c kernel/...: convert pr_warning to pr_warn 2016-03-22 15:36:02 -07:00
tsacct.c time, acct: Drop irq save & restore from __acct_update_integrals() 2016-02-29 09:53:09 +01:00
ucount.c mntns: Add a limit on the number of mount namespaces. 2016-08-31 07:28:35 -05:00
uid16.c
up.c smp: Add function to execute a function synchronously on a CPU 2016-09-05 13:52:39 +02:00
user-return-notifier.c
user.c
user_namespace.c Merge branch 'nsfs-ioctls' into HEAD 2016-09-22 20:00:36 -05:00
utsname.c Merge branch 'nsfs-ioctls' into HEAD 2016-09-22 20:00:36 -05:00
utsname_sysctl.c
watchdog.c Revert "perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86" 2016-07-10 20:58:36 +02:00
workqueue.c Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-07-29 13:55:30 -07:00
workqueue_internal.h sched/core: Get rid of 'cpu' argument in wq_worker_sleeping() 2016-03-02 10:28:47 -05:00