Go to file
David Hildenbrand 3fcebf9020 mm/memory_hotplug: improved dynamic memory group aware "auto-movable" online policy
Currently, the "auto-movable" online policy does not allow for hotplugged
KERNEL (ZONE_NORMAL) memory to increase the amount of MOVABLE memory we
can have, primarily, because there is no coordiantion across memory
devices and we don't want to create zone-imbalances accidentially when
unplugging memory.

However, within a single memory device it's different.  Let's allow for
KERNEL memory within a dynamic memory group to allow for more MOVABLE
within the same memory group.  The only thing we have to take care of is
that the managing driver avoids zone imbalances by unplugging MOVABLE
memory first, otherwise there can be corner cases where unplug of memory
could result in (accidential) zone imbalances.

virtio-mem is the only user of dynamic memory groups and recently added
support for prioritizing unplug of ZONE_MOVABLE over ZONE_NORMAL, so we
don't need a new toggle to enable it for dynamic memory groups.

We limit this handling to dynamic memory groups, because:

* We want to keep the runtime overhead for collecting stats when
  onlining a single memory block small.  We tend to have only a handful of
  dynamic memory groups, but we can have quite some static memory groups
  (e.g., 256 DIMMs).

* It doesn't make too much sense for static memory groups, as we try
  onlining all applicable memory blocks either completely to ZONE_MOVABLE
  or not.  In ordinary operation, we won't have a mixture of zones within
  a static memory group.

When adding memory to a dynamic memory group, we'll first online memory to
ZONE_MOVABLE as long as early KERNEL memory allows for it.  Then, we'll
online the next unit(s) to ZONE_NORMAL, until we can online the next
unit(s) to ZONE_MOVABLE.

For a simple virtio-mem device with a MOVABLE:KERNEL ratio of 3:1, it will
result in a layout like:

  [M][M][M][M][M][M][M][M][N][M][M][M][N][M][M][M]...
  ^ movable memory due to early kernel memory
			   ^ allows for more movable memory ...
			      ^-----^ ... here
				       ^ allows for more movable memory ...
				          ^-----^ ... here

While the created layout is sub-optimal when it comes to contiguous zones,
it gives us the maximum flexibility when dynamically growing/shrinking a
device; we can grow small VMs really big in small steps, and still shrink
reliably to e.g., 1/4 of the maximum VM size in this example, removing
full memory blocks along with meta data more reliably.

Mark dynamic memory groups in the xarray such that we can efficiently
iterate over them when collecting stats.  In usual setups, we have one
virtio-mem device per NUMA node, and usually only a small number of NUMA
nodes.

Note: for now, there seems to be no compelling reason to make this
behavior configurable.

Link: https://lkml.kernel.org/r/20210806124715.17090-10-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hui Zhu <teawater@gmail.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 11:50:23 -07:00
Documentation memory-hotplug.rst: complete admin-guide overhaul 2021-09-08 11:50:22 -07:00
LICENSES LICENSES/dual/CC-BY-4.0: Git rid of "smart quotes" 2021-07-15 06:31:24 -06:00
arch mm/memory_hotplug: remove nid parameter from remove_memory() and friends 2021-09-08 11:50:23 -07:00
block block-5.14-2021-08-27 2021-08-27 16:08:29 -07:00
certs Kbuild updates for v5.13 (2nd) 2021-05-08 10:00:11 -07:00
crypto crypto: drbg - select SHA512 2021-07-16 15:49:31 +08:00
drivers mm/memory_hotplug: improved dynamic memory group aware "auto-movable" online policy 2021-09-08 11:50:23 -07:00
fs Two memory management fixes for the filesystem. 2021-08-26 11:18:30 -07:00
include mm/memory_hotplug: improved dynamic memory group aware "auto-movable" online policy 2021-09-08 11:50:23 -07:00
init init: Suppress wrong warning for bootconfig cmdline parameter 2021-08-12 13:35:57 -04:00
ipc Merge branch 'akpm' (patches from Andrew) 2021-07-02 12:08:10 -07:00
kernel - Have get_push_task() check whether current has migration disabled and 2021-08-29 10:54:14 -07:00
lib lib: use PFN_PHYS() in devmem_is_allowed() 2021-08-13 14:09:32 -10:00
mm mm/memory_hotplug: improved dynamic memory group aware "auto-movable" online policy 2021-09-08 11:50:23 -07:00
net This is a one-liner fix for a serious bug that can cause the server to 2021-08-26 13:26:40 -07:00
samples Networking fixes for 5.14-rc2, including fixes from bpf and netfilter. 2021-07-14 09:24:32 -07:00
scripts Kbuild fixes for v5.14 (2nd) 2021-08-07 10:03:02 -07:00
security bpf: Add lockdown check for probe_write_user helper 2021-08-10 10:10:10 +02:00
sound another sound-fixes for 5.14-rc7 2021-08-20 12:31:10 -07:00
tools Networking fixes for 5.14-rc7, including fixes from bpf, wireless and 2021-08-19 12:33:43 -07:00
usr .gitignore: prefix local generated files with a slash 2021-05-02 00:43:35 +09:00
virt KVM: Do not leak memory for duplicate debugfs directories 2021-08-04 06:02:03 -04:00
.clang-format clang-format: Update with the latest for_each macro list 2021-05-12 23:32:39 +02:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore .gitignore: ignore only top-level modules.builtin 2021-05-02 00:43:35 +09:00
.mailmap m68k updates for v5.14 2021-06-28 14:01:03 -07:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: move Murali Karicheri to credits 2021-04-29 15:47:30 -07:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS MAINTAINERS: exfat: update my email address 2021-08-25 12:25:12 -07:00
Makefile Linux 5.14 2021-08-29 15:04:50 -07:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.