linux/include/trace/events
Vlastimil Babka 698b1b3064 mm, compaction: introduce kcompactd
Memory compaction can be currently performed in several contexts:

 - kswapd balancing a zone after a high-order allocation failure
 - direct compaction to satisfy a high-order allocation, including THP
   page fault attemps
 - khugepaged trying to collapse a hugepage
 - manually from /proc

The purpose of compaction is two-fold.  The obvious purpose is to
satisfy a (pending or future) high-order allocation, and is easy to
evaluate.  The other purpose is to keep overal memory fragmentation low
and help the anti-fragmentation mechanism.  The success wrt the latter
purpose is more

The current situation wrt the purposes has a few drawbacks:

 - compaction is invoked only when a high-order page or hugepage is not
   available (or manually).  This might be too late for the purposes of
   keeping memory fragmentation low.
 - direct compaction increases latency of allocations.  Again, it would
   be better if compaction was performed asynchronously to keep
   fragmentation low, before the allocation itself comes.
 - (a special case of the previous) the cost of compaction during THP
   page faults can easily offset the benefits of THP.
 - kswapd compaction appears to be complex, fragile and not working in
   some scenarios.  It could also end up compacting for a high-order
   allocation request when it should be reclaiming memory for a later
   order-0 request.

To improve the situation, we should be able to benefit from an
equivalent of kswapd, but for compaction - i.e. a background thread
which responds to fragmentation and the need for high-order allocations
(including hugepages) somewhat proactively.

One possibility is to extend the responsibilities of kswapd, which could
however complicate its design too much.  It should be better to let
kswapd handle reclaim, as order-0 allocations are often more critical
than high-order ones.

Another possibility is to extend khugepaged, but this kthread is a
single instance and tied to THP configs.

This patch goes with the option of a new set of per-node kthreads called
kcompactd, and lays the foundations, without introducing any new
tunables.  The lifecycle mimics kswapd kthreads, including the memory
hotplug hooks.

For compaction, kcompactd uses the standard compaction_suitable() and
ompact_finished() criteria and the deferred compaction functionality.
Unlike direct compaction, it uses only sync compaction, as there's no
allocation latency to minimize.

This patch doesn't yet add a call to wakeup_kcompactd.  The kswapd
compact/reclaim loop for high-order pages will be replaced by waking up
kcompactd in the next patch with the description of what's wrong with
the old approach.

Waking up of the kcompactd threads is also tied to kswapd activity and
follows these rules:
 - we don't want to affect any fastpaths, so wake up kcompactd only from
   the slowpath, as it's done for kswapd
 - if kswapd is doing reclaim, it's more important than compaction, so
   don't invoke kcompactd until kswapd goes to sleep
 - the target order used for kswapd is passed to kcompactd

Future possible future uses for kcompactd include the ability to wake up
kcompactd on demand in special situations, such as when hugepages are
not available (currently not done due to __GFP_NO_KSWAPD) or when a
fragmentation event (i.e.  __rmqueue_fallback()) occurs.  It's also
possible to perform periodic compaction with kcompactd.

[arnd@arndb.de: fix build errors with kcompactd]
[paul.gortmaker@windriver.com: don't use modular references for non modular code]
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17 15:09:34 -07:00
..
9p.h net/9p/tracing: Export enums in tracepoints to userspace 2015-04-08 09:39:59 -04:00
asoc.h ASoC: trace: fix printing jack name 2016-02-26 10:52:48 +09:00
bcache.h bcache: fix crash in bcache_btree_node_alloc_fail tracepoint 2014-08-04 15:23:03 -07:00
block.h blktrace: fix accounting of partially completed requests 2014-03-05 16:11:21 -07:00
btrfs.h mm, tracing: unify mm flags handling in tracepoints and printk 2016-03-15 16:55:16 -07:00
clk.h clk: Add tracepoints for hardware operations 2015-03-12 12:18:51 -07:00
cma.h mm: cma: add trace events for CMA allocations and freeings 2015-04-15 16:35:19 -07:00
compaction.h mm, compaction: introduce kcompactd 2016-03-17 15:09:34 -07:00
context_tracking.h context_tracking: User/kernel broundary cross trace events 2013-08-14 17:14:48 +02:00
cpuhp.h cpu/hotplug: Add tracepoints 2016-03-01 20:36:54 +01:00
ext4.h ext4: implement allocation of pre-zeroed blocks 2015-12-07 15:10:26 -05:00
f2fs.h f2fs: add a tracepoint for sync_dirty_inodes 2015-12-17 09:55:27 -08:00
fence.h tracing/dma-buf/fence: Fix timeline str value on fence_annotate_wait_on 2016-01-25 11:03:17 -05:00
fib.h net: Make table id type u32 2015-09-01 14:32:44 -07:00
fib6.h net: IPv6 fib lookup tracepoint 2015-11-22 11:54:10 -05:00
filelock.h locks: sprinkle some tracepoints around the file locking code 2016-01-08 11:38:13 -05:00
filemap.h tracing, mm: Record pfn instead of pointer to struct page 2015-04-13 11:44:52 -03:00
gpio.h tracing: gpio: Add Kconfig option for enabling/disabling trace events 2015-10-20 21:56:10 -04:00
host1x.h gpu: host1x: Use struct host1x_bo pointers in traces 2014-11-13 16:11:32 +01:00
hswadsp.h Merge remote-tracking branches 'asoc/topic/ml26124', 'asoc/topic/of', 'asoc/topic/omap', 'asoc/topic/pxa' and 'asoc/topic/rcar' into asoc-next 2014-03-12 23:04:35 +00:00
huge_memory.h mm, tracing: unify mm flags handling in tracepoints and printk 2016-03-15 16:55:16 -07:00
i2c.h i2c: Add message transfer tracepoints for SMBUS [ver #2] 2014-03-13 22:15:07 +01:00
intel-sst.h tracing: Add TRACE_SYSTEM_VAR to intel-sst 2015-04-07 12:31:12 -04:00
iommu.h iommu: Change trace unmap api to report unmapped size 2015-01-19 15:19:31 +01:00
ipi.h tracepoint: add generic tracepoint definitions for IPI tracing 2014-08-07 20:40:40 -04:00
irq.h irq_poll: make blk-iopoll available outside the block layer 2015-12-11 11:52:24 -08:00
jbd2.h jbd2: trace when lock_buffer in do_get_write_access takes a long time 2013-04-21 16:47:54 -04:00
kmem.h mm, tracing: unify mm flags handling in tracepoints and printk 2016-03-15 16:55:16 -07:00
kvm.h KVM: halt_polling: improve grow/shrink settings 2016-02-16 18:48:29 +01:00
libata.h libata: Add tracepoints 2015-03-27 11:59:22 -04:00
lock.h
mce.h
migrate.h mm: tracing: Export enums in tracepoints to user space 2015-04-08 09:40:01 -04:00
mmflags.h mm, tracing: unify mm flags handling in tracepoints and printk 2016-03-15 16:55:16 -07:00
module.h tracing: %pF is only for function pointers 2015-03-25 08:57:22 -04:00
napi.h
net.h net: rename vlan_tx_* helpers since "tx" is misleading there 2015-01-13 17:51:08 -05:00
nilfs2.h nilfs2: add tracepoints for analyzing reading and writing metadata files 2015-11-06 17:50:42 -08:00
nmi.h x86: Add NMI duration tracepoints 2013-06-23 11:52:58 +02:00
oom.h mm, oom: change type of oom_score_adj to short 2012-12-11 17:22:27 -08:00
page_isolation.h mm/page_isolation.c: add new tracepoint, test_pages_isolated 2016-01-14 16:00:49 -08:00
pagemap.h mm: pagemap: avoid unnecessary overhead when tracepoints are deactivated 2014-08-06 18:01:20 -07:00
power.h cpufreq: powernv/tracing: Add powernv_throttle tracepoint 2016-02-05 02:38:02 +01:00
power_cpu_migrate.h ARM: bL_switcher/trace: Add trace trigger for trace bootstrapping 2013-09-23 18:47:30 -04:00
printk.h printk/tracing: rework console tracing 2013-04-29 18:28:13 -07:00
random.h tracing: %pF is only for function pointers 2015-03-25 08:57:22 -04:00
rcu.h rcu: Apply rcu_seq operations to _rcu_barrier() 2015-07-17 14:58:57 -07:00
regulator.h
rpm.h
sched.h sched/core: Fix trace_sched_switch() 2015-10-06 17:08:15 +02:00
scsi.h scsi: rename SERVICE_ACTION_IN to SERVICE_ACTION_IN_16 2014-11-24 20:01:40 +01:00
signal.h
skb.h
sock.h
spi.h spi: Provide common spi_message processing loop 2013-10-11 20:09:50 +01:00
spmi.h spmi: add command tracepoints for SPMI 2015-08-05 12:27:09 -07:00
sunrpc.h net: sunrpc: fix tracepoint Warning: unknown op '->' 2015-08-31 16:32:15 -04:00
swiotlb.h tracing/events: Add bounce tracing to swiotbl 2013-10-02 12:53:26 -04:00
syscalls.h tracepoint: Fix sparse warnings in tracepoint.c 2014-04-09 10:12:11 -04:00
target.h target: Minimize SCSI header #include directives 2015-06-02 08:03:25 -07:00
task.h tracing: Don't make assumptions about length of string on task rename 2015-08-31 10:47:14 -04:00
thermal.h devfreq_cooling: add trace information 2015-10-30 10:41:38 -07:00
thermal_power_allocator.h thermal: consistently use int for temperatures 2015-08-03 23:15:50 +08:00
thp.h powerpc/thp: Add tracepoints to track hugepage invalidate 2014-08-13 18:20:42 +10:00
timer.h nohz: Use enum code for tick stop failure tracing message 2016-03-02 16:42:15 +01:00
tlb.h x86, mm: trace when an IPI is about to be sent 2015-09-04 16:54:41 -07:00
udp.h
v4l2.h [media] media: videobuf2: Move timestamp to vb2_buffer 2015-12-18 13:53:31 -02:00
vb2.h [media] media: videobuf2: Move timestamp to vb2_buffer 2015-12-18 13:53:31 -02:00
vmscan.h mm, tracing: unify mm flags handling in tracepoints and printk 2016-03-15 16:55:16 -07:00
workqueue.h workqueue: rename cpu_workqueue to pool_workqueue 2013-02-13 19:29:12 -08:00
writeback.h writeback: update writeback tracepoints to report cgroup 2015-08-18 15:49:15 -07:00
xen.h x86: expose number of page table levels on Kconfig level 2015-04-14 16:49:02 -07:00