linux_old1/include
Michal Hocko d379f01de0 oom, trace: add oom detection tracepoints
should_reclaim_retry is the central decision point for declaring the
OOM.  It might be really useful to expose data used for this decision
making when debugging an unexpected oom situations.

Say we have an OOM report:
[   52.264001] mem_eater invoked oom-killer: gfp_mask=0x24280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=0, order=0, oom_score_adj=0
[   52.267549] CPU: 3 PID: 3148 Comm: mem_eater Tainted: G        W       4.8.0-oomtrace3-00006-gb21338b386d2 #1024

Now we can check the tracepoint data to see how we have ended up in this
situation:
       mem_eater-3148  [003] ....    52.432801: reclaim_retry_zone: node=0 zone=DMA32 order=0 reclaimable=51 available=11134 min_wmark=11084 no_progress_loops=1 wmark_check=1
       mem_eater-3148  [003] ....    52.433269: reclaim_retry_zone: node=0 zone=DMA32 order=0 reclaimable=51 available=11103 min_wmark=11084 no_progress_loops=1 wmark_check=1
       mem_eater-3148  [003] ....    52.433712: reclaim_retry_zone: node=0 zone=DMA32 order=0 reclaimable=51 available=11100 min_wmark=11084 no_progress_loops=2 wmark_check=1
       mem_eater-3148  [003] ....    52.434067: reclaim_retry_zone: node=0 zone=DMA32 order=0 reclaimable=51 available=11097 min_wmark=11084 no_progress_loops=3 wmark_check=1
       mem_eater-3148  [003] ....    52.434414: reclaim_retry_zone: node=0 zone=DMA32 order=0 reclaimable=51 available=11094 min_wmark=11084 no_progress_loops=4 wmark_check=1
       mem_eater-3148  [003] ....    52.434761: reclaim_retry_zone: node=0 zone=DMA32 order=0 reclaimable=51 available=11091 min_wmark=11084 no_progress_loops=5 wmark_check=1
       mem_eater-3148  [003] ....    52.435108: reclaim_retry_zone: node=0 zone=DMA32 order=0 reclaimable=51 available=11087 min_wmark=11084 no_progress_loops=6 wmark_check=1
       mem_eater-3148  [003] ....    52.435478: reclaim_retry_zone: node=0 zone=DMA32 order=0 reclaimable=51 available=11084 min_wmark=11084 no_progress_loops=7 wmark_check=0
       mem_eater-3148  [003] ....    52.435478: reclaim_retry_zone: node=0 zone=DMA order=0 reclaimable=0 available=1126 min_wmark=179 no_progress_loops=7 wmark_check=0

The above shows that we can quickly deduce that the reclaim stopped
making any progress (see no_progress_loops increased in each round) and
while there were still some 51 reclaimable pages they couldn't be
dropped for some reason (vmscan trace points would tell us more about
that part).  available will represent reclaimable + free_pages scaled
down per no_progress_loops factor.  This is essentially an optimistic
estimate of how much memory we would have when reclaiming everything.
This can be compared to min_wmark to get a rought idea but the
wmark_check tells the result of the watermark check which is more
precise (includes lowmem reserves, considers the order etc.).  As we can
see no zone is eligible in the end and that is why we have triggered the
oom in this situation.

Please note that higher order requests might fail on the wmark_check
even when there is much more memory available than min_wmark - e.g.
when the memory is fragmented.  A follow up tracepoint will help to
debug those situations.

Link: http://lkml.kernel.org/r/20161220130135.15719-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22 16:41:27 -08:00
..
acpi Merge branches 'acpi-bus', 'acpi-sleep' and 'acpi-processor' 2017-02-20 14:28:03 +01:00
asm-generic Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-02-20 13:23:30 -08:00
clocksource
crypto
drm Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-02-20 13:23:30 -08:00
dt-bindings Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2017-02-22 10:15:09 -08:00
keys
kvm KVM: arm64: Access CNTHCTL_EL2 bit fields correctly on VHE systems 2017-01-13 11:19:25 +00:00
linux slab: use memcg_kmem_cache_wq for slab destruction operations 2017-02-22 16:41:27 -08:00
math-emu
media [media] v4l2-subdev.h: fix v4l2_subdev_pad_config documentation 2017-02-03 12:00:37 -02:00
memory
misc
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2017-02-22 10:15:09 -08:00
pcmcia
ras
rdma net-next: treewide use is_vlan_dev() helper function. 2017-02-06 16:33:29 -05:00
rxrpc
scsi SCSI misc on 20170220 2017-02-21 11:51:42 -08:00
soc ARC updates for 4.11 rc1 2017-02-22 10:33:53 -08:00
sound Merge remote-tracking branches 'asoc/fix/arizona', 'asoc/fix/dpcm', 'asoc/fix/dwc', 'asoc/fix/fsl-ssi' and 'asoc/fix/hdmi-codec' into asoc-linus 2017-01-10 10:47:50 +00:00
target target: Fix multi-session dynamic se_node_acl double free OOPs 2017-02-08 08:25:23 -08:00
trace oom, trace: add oom detection tracepoints 2017-02-22 16:41:27 -08:00
uapi TTY/Serial driver patches for 4.11-rc1 2017-02-22 12:17:25 -08:00
video
xen xen/privcmd: Add IOCTL_PRIVCMD_DM_OP 2017-02-14 15:13:43 -05:00
Kbuild