Go to file
Chris Down b3ff92916a mm, memcg: reclaim more aggressively before high allocator throttling
Patch series "mm, memcg: reclaim harder before high throttling", v2.

This patch (of 2):

In Facebook production, we've seen cases where cgroups have been put into
allocator throttling even when they appear to have a lot of slack file
caches which should be trivially reclaimable.

Looking more closely, the problem is that we only try a single cgroup
reclaim walk for each return to usermode before calculating whether or not
we should throttle.  This single attempt doesn't produce enough pressure
to shrink for cgroups with a rapidly growing amount of file caches prior
to entering allocator throttling.

As an example, we see that threads in an affected cgroup are stuck in
allocator throttling:

    # for i in $(cat cgroup.threads); do
    >     grep over_high "/proc/$i/stack"
    > done
    [<0>] mem_cgroup_handle_over_high+0x10b/0x150
    [<0>] mem_cgroup_handle_over_high+0x10b/0x150
    [<0>] mem_cgroup_handle_over_high+0x10b/0x150

...however, there is no I/O pressure reported by PSI, despite a lot of
slack file pages:

    # cat memory.pressure
    some avg10=78.50 avg60=84.99 avg300=84.53 total=5702440903
    full avg10=78.50 avg60=84.99 avg300=84.53 total=5702116959
    # cat io.pressure
    some avg10=0.00 avg60=0.00 avg300=0.00 total=78051391
    full avg10=0.00 avg60=0.00 avg300=0.00 total=78049640
    # grep _file memory.stat
    inactive_file 1370939392
    active_file 661635072

This patch changes the behaviour to retry reclaim either until the current
task goes below the 10ms grace period, or we are making no reclaim
progress at all.  In the latter case, we enter reclaim throttling as
before.

To a user, there's no intuitive reason for the reclaim behaviour to differ
from hitting memory.high as part of a new allocation, as opposed to
hitting memory.high because someone lowered its value.  As such this also
brings an added benefit: it unifies the reclaim behaviour between the two.

There's precedent for this behaviour: we already do reclaim retries when
writing to memory.{high,max}, in max reclaim, and in the page allocator
itself.

Signed-off-by: Chris Down <chris@chrisdown.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Link: http://lkml.kernel.org/r/cover.1594640214.git.chris@chrisdown.name
Link: http://lkml.kernel.org/r/a4e23b59e9ef499b575ae73a8120ee089b7d3373.1594640214.git.chris@chrisdown.name
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:25 -07:00
Documentation tmpfs: support 64-bit inums per-sb 2020-08-07 11:33:24 -07:00
LICENSES LICENSES: Rename other to deprecated 2019-05-03 06:34:32 -06:00
arch mm, treewide: rename kzfree() to kfree_sensitive() 2020-08-07 11:33:22 -07:00
block for-5.9/block-merge-20200804 2020-08-05 11:12:34 -07:00
certs .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
crypto mm, treewide: rename kzfree() to kfree_sensitive() 2020-08-07 11:33:22 -07:00
drivers mm: memcontrol: account kernel stack per node 2020-08-07 11:33:25 -07:00
fs mm: memcontrol: account kernel stack per node 2020-08-07 11:33:25 -07:00
include mm: kmem: switch to static_branch_likely() in memcg_kmem_enabled() 2020-08-07 11:33:25 -07:00
init mm/slab: expand CONFIG_SLAB_FREELIST_HARDENED to include SLAB 2020-08-07 11:33:22 -07:00
ipc mmap locking API: use coccinelle to convert mmap_sem rwsem call sites 2020-06-09 09:39:14 -07:00
kernel mm: memcontrol: account kernel stack per node 2020-08-07 11:33:25 -07:00
lib mm, treewide: rename kzfree() to kfree_sensitive() 2020-08-07 11:33:22 -07:00
mm mm, memcg: reclaim more aggressively before high allocator throttling 2020-08-07 11:33:25 -07:00
net mm, treewide: rename kzfree() to kfree_sensitive() 2020-08-07 11:33:22 -07:00
samples Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next 2020-08-05 20:13:21 -07:00
scripts mm, treewide: rename kzfree() to kfree_sensitive() 2020-08-07 11:33:22 -07:00
security mm, treewide: rename kzfree() to kfree_sensitive() 2020-08-07 11:33:22 -07:00
sound sound updates for 5.9 2020-08-06 14:27:31 -07:00
tools tools/cgroup: add memcg_slabinfo.py tool 2020-08-07 11:33:25 -07:00
usr usr: Add support for zstd compressed initramfs 2020-07-31 11:49:08 +02:00
virt s390: implement diag318 2020-08-06 12:59:31 -07:00
.clang-format block: add bio_for_each_bvec_all() 2020-05-25 11:25:24 +02:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore .gitignore: Add ZSTD-compressed files 2020-07-31 11:50:49 +02:00
.mailmap It's been a busy cycle for documentation - hopefully the busiest for a 2020-08-04 22:47:54 -07:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS CREDITS: Replace HTTP links with HTTPS ones 2020-07-23 14:53:58 -06:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS TTY/Serial patches for 5.9-rc1 2020-08-06 14:56:11 -07:00
Makefile Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next 2020-08-05 20:13:21 -07:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.