thp: update Documentation/{vm/transhuge,filesystems/proc}.txt
Add info about tmpfs/shmem with huge pages. Link: http://lkml.kernel.org/r/1466021202-61880-38-git-send-email-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit is contained in:
parent
779750d20b
commit
1b5946a84d
|
@ -436,6 +436,7 @@ Private_Dirty: 0 kB
|
|||
Referenced: 892 kB
|
||||
Anonymous: 0 kB
|
||||
AnonHugePages: 0 kB
|
||||
ShmemPmdMapped: 0 kB
|
||||
Shared_Hugetlb: 0 kB
|
||||
Private_Hugetlb: 0 kB
|
||||
Swap: 0 kB
|
||||
|
@ -464,6 +465,8 @@ accessed.
|
|||
a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
|
||||
and a page is modified, the file page is replaced by a private anonymous copy.
|
||||
"AnonHugePages" shows the ammount of memory backed by transparent hugepage.
|
||||
"ShmemPmdMapped" shows the ammount of shared (shmem/tmpfs) memory backed by
|
||||
huge pages.
|
||||
"Shared_Hugetlb" and "Private_Hugetlb" show the ammounts of memory backed by
|
||||
hugetlbfs page which is *not* counted in "RSS" or "PSS" field for historical
|
||||
reasons. And these are not included in {Shared,Private}_{Clean,Dirty} field.
|
||||
|
@ -868,6 +871,9 @@ VmallocTotal: 112216 kB
|
|||
VmallocUsed: 428 kB
|
||||
VmallocChunk: 111088 kB
|
||||
AnonHugePages: 49152 kB
|
||||
ShmemHugePages: 0 kB
|
||||
ShmemPmdMapped: 0 kB
|
||||
|
||||
|
||||
MemTotal: Total usable ram (i.e. physical ram minus a few reserved
|
||||
bits and the kernel binary code)
|
||||
|
@ -912,6 +918,9 @@ MemAvailable: An estimate of how much memory is available for starting new
|
|||
AnonHugePages: Non-file backed huge pages mapped into userspace page tables
|
||||
Mapped: files which have been mmaped, such as libraries
|
||||
Shmem: Total memory used by shared memory (shmem) and tmpfs
|
||||
ShmemHugePages: Memory used by shared memory (shmem) and tmpfs allocated
|
||||
with huge pages
|
||||
ShmemPmdMapped: Shared memory mapped into userspace with huge pages
|
||||
Slab: in-kernel data structures cache
|
||||
SReclaimable: Part of Slab, that might be reclaimed, such as caches
|
||||
SUnreclaim: Part of Slab, that cannot be reclaimed on memory pressure
|
||||
|
|
|
@ -9,8 +9,8 @@ using huge pages for the backing of virtual memory with huge pages
|
|||
that supports the automatic promotion and demotion of page sizes and
|
||||
without the shortcomings of hugetlbfs.
|
||||
|
||||
Currently it only works for anonymous memory mappings but in the
|
||||
future it can expand over the pagecache layer starting with tmpfs.
|
||||
Currently it only works for anonymous memory mappings and tmpfs/shmem.
|
||||
But in the future it can expand to other filesystems.
|
||||
|
||||
The reason applications are running faster is because of two
|
||||
factors. The first factor is almost completely irrelevant and it's not
|
||||
|
@ -57,10 +57,6 @@ miss is going to run faster.
|
|||
feature that applies to all dynamic high order allocations in the
|
||||
kernel)
|
||||
|
||||
- this initial support only offers the feature in the anonymous memory
|
||||
regions but it'd be ideal to move it to tmpfs and the pagecache
|
||||
later
|
||||
|
||||
Transparent Hugepage Support maximizes the usefulness of free memory
|
||||
if compared to the reservation approach of hugetlbfs by allowing all
|
||||
unused memory to be used as cache or other movable (or even unmovable
|
||||
|
@ -94,21 +90,21 @@ madvise(MADV_HUGEPAGE) on their critical mmapped regions.
|
|||
|
||||
== sysfs ==
|
||||
|
||||
Transparent Hugepage Support can be entirely disabled (mostly for
|
||||
debugging purposes) or only enabled inside MADV_HUGEPAGE regions (to
|
||||
avoid the risk of consuming more memory resources) or enabled system
|
||||
wide. This can be achieved with one of:
|
||||
Transparent Hugepage Support for anonymous memory can be entirely disabled
|
||||
(mostly for debugging purposes) or only enabled inside MADV_HUGEPAGE
|
||||
regions (to avoid the risk of consuming more memory resources) or enabled
|
||||
system wide. This can be achieved with one of:
|
||||
|
||||
echo always >/sys/kernel/mm/transparent_hugepage/enabled
|
||||
echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
|
||||
echo never >/sys/kernel/mm/transparent_hugepage/enabled
|
||||
|
||||
It's also possible to limit defrag efforts in the VM to generate
|
||||
hugepages in case they're not immediately free to madvise regions or
|
||||
to never try to defrag memory and simply fallback to regular pages
|
||||
unless hugepages are immediately available. Clearly if we spend CPU
|
||||
time to defrag memory, we would expect to gain even more by the fact
|
||||
we use hugepages later instead of regular pages. This isn't always
|
||||
anonymous hugepages in case they're not immediately free to madvise
|
||||
regions or to never try to defrag memory and simply fallback to regular
|
||||
pages unless hugepages are immediately available. Clearly if we spend CPU
|
||||
time to defrag memory, we would expect to gain even more by the fact we
|
||||
use hugepages later instead of regular pages. This isn't always
|
||||
guaranteed, but it may be more likely in case the allocation is for a
|
||||
MADV_HUGEPAGE region.
|
||||
|
||||
|
@ -133,9 +129,9 @@ that are have used madvise(MADV_HUGEPAGE). This is the default behaviour.
|
|||
|
||||
"never" should be self-explanatory.
|
||||
|
||||
By default kernel tries to use huge zero page on read page fault.
|
||||
It's possible to disable huge zero page by writing 0 or enable it
|
||||
back by writing 1:
|
||||
By default kernel tries to use huge zero page on read page fault to
|
||||
anonymous mapping. It's possible to disable huge zero page by writing 0
|
||||
or enable it back by writing 1:
|
||||
|
||||
echo 0 >/sys/kernel/mm/transparent_hugepage/use_zero_page
|
||||
echo 1 >/sys/kernel/mm/transparent_hugepage/use_zero_page
|
||||
|
@ -204,21 +200,67 @@ Support by passing the parameter "transparent_hugepage=always" or
|
|||
"transparent_hugepage=madvise" or "transparent_hugepage=never"
|
||||
(without "") to the kernel command line.
|
||||
|
||||
== Hugepages in tmpfs/shmem ==
|
||||
|
||||
You can control hugepage allocation policy in tmpfs with mount option
|
||||
"huge=". It can have following values:
|
||||
|
||||
- "always":
|
||||
Attempt to allocate huge pages every time we need a new page;
|
||||
|
||||
- "never":
|
||||
Do not allocate huge pages;
|
||||
|
||||
- "within_size":
|
||||
Only allocate huge page if it will be fully within i_size.
|
||||
Also respect fadvise()/madvise() hints;
|
||||
|
||||
- "advise:
|
||||
Only allocate huge pages if requested with fadvise()/madvise();
|
||||
|
||||
The default policy is "never".
|
||||
|
||||
"mount -o remount,huge= /mountpoint" works fine after mount: remounting
|
||||
huge=never will not attempt to break up huge pages at all, just stop more
|
||||
from being allocated.
|
||||
|
||||
There's also sysfs knob to control hugepage allocation policy for internal
|
||||
shmem mount: /sys/kernel/mm/transparent_hugepage/shmem_enabled. The mount
|
||||
is used for SysV SHM, memfds, shared anonymous mmaps (of /dev/zero or
|
||||
MAP_ANONYMOUS), GPU drivers' DRM objects, Ashmem.
|
||||
|
||||
In addition to policies listed above, shmem_enabled allows two further
|
||||
values:
|
||||
|
||||
- "deny":
|
||||
For use in emergencies, to force the huge option off from
|
||||
all mounts;
|
||||
- "force":
|
||||
Force the huge option on for all - very useful for testing;
|
||||
|
||||
== Need of application restart ==
|
||||
|
||||
The transparent_hugepage/enabled values only affect future
|
||||
behavior. So to make them effective you need to restart any
|
||||
application that could have been using hugepages. This also applies to
|
||||
the regions registered in khugepaged.
|
||||
The transparent_hugepage/enabled values and tmpfs mount option only affect
|
||||
future behavior. So to make them effective you need to restart any
|
||||
application that could have been using hugepages. This also applies to the
|
||||
regions registered in khugepaged.
|
||||
|
||||
== Monitoring usage ==
|
||||
|
||||
The number of transparent huge pages currently used by the system is
|
||||
available by reading the AnonHugePages field in /proc/meminfo. To
|
||||
identify what applications are using transparent huge pages, it is
|
||||
necessary to read /proc/PID/smaps and count the AnonHugePages fields
|
||||
for each mapping. Note that reading the smaps file is expensive and
|
||||
reading it frequently will incur overhead.
|
||||
The number of anonymous transparent huge pages currently used by the
|
||||
system is available by reading the AnonHugePages field in /proc/meminfo.
|
||||
To identify what applications are using anonymous transparent huge pages,
|
||||
it is necessary to read /proc/PID/smaps and count the AnonHugePages fields
|
||||
for each mapping.
|
||||
|
||||
The number of file transparent huge pages mapped to userspace is available
|
||||
by reading ShmemPmdMapped and ShmemHugePages fields in /proc/meminfo.
|
||||
To identify what applications are mapping file transparent huge pages, it
|
||||
is necessary to read /proc/PID/smaps and count the FileHugeMapped fields
|
||||
for each mapping.
|
||||
|
||||
Note that reading the smaps file is expensive and reading it
|
||||
frequently will incur overhead.
|
||||
|
||||
There are a number of counters in /proc/vmstat that may be used to
|
||||
monitor how successfully the system is providing huge pages for use.
|
||||
|
@ -238,6 +280,12 @@ thp_collapse_alloc_failed is incremented if khugepaged found a range
|
|||
of pages that should be collapsed into one huge page but failed
|
||||
the allocation.
|
||||
|
||||
thp_file_alloc is incremented every time a file huge page is successfully
|
||||
i allocated.
|
||||
|
||||
thp_file_mapped is incremented every time a file huge page is mapped into
|
||||
user address space.
|
||||
|
||||
thp_split_page is incremented every time a huge page is split into base
|
||||
pages. This can happen for a variety of reasons but a common
|
||||
reason is that a huge page is old and is being reclaimed.
|
||||
|
@ -403,19 +451,27 @@ pages:
|
|||
on relevant sub-page of the compound page.
|
||||
|
||||
- map/unmap of the whole compound page accounted in compound_mapcount
|
||||
(stored in first tail page).
|
||||
(stored in first tail page). For file huge pages, we also increment
|
||||
->_mapcount of all sub-pages in order to have race-free detection of
|
||||
last unmap of subpages.
|
||||
|
||||
PageDoubleMap() indicates that ->_mapcount in all subpages is offset up by one.
|
||||
This additional reference is required to get race-free detection of unmap of
|
||||
subpages when we have them mapped with both PMDs and PTEs.
|
||||
PageDoubleMap() indicates that the page is *possibly* mapped with PTEs.
|
||||
|
||||
For anonymous pages PageDoubleMap() also indicates ->_mapcount in all
|
||||
subpages is offset up by one. This additional reference is required to
|
||||
get race-free detection of unmap of subpages when we have them mapped with
|
||||
both PMDs and PTEs.
|
||||
|
||||
This is optimization required to lower overhead of per-subpage mapcount
|
||||
tracking. The alternative is alter ->_mapcount in all subpages on each
|
||||
map/unmap of the whole compound page.
|
||||
|
||||
We set PG_double_map when a PMD of the page got split for the first time,
|
||||
but still have PMD mapping. The additional references go away with last
|
||||
compound_mapcount.
|
||||
For anonymous pages, we set PG_double_map when a PMD of the page got split
|
||||
for the first time, but still have PMD mapping. The additional references
|
||||
go away with last compound_mapcount.
|
||||
|
||||
File pages get PG_double_map set on first map of the page with PTE and
|
||||
goes away when the page gets evicted from page cache.
|
||||
|
||||
split_huge_page internally has to distribute the refcounts in the head
|
||||
page to the tail pages before clearing all PG_head/tail bits from the page
|
||||
|
@ -427,7 +483,7 @@ sum of mapcount of all sub-pages plus one (split_huge_page caller must
|
|||
have reference for head page).
|
||||
|
||||
split_huge_page uses migration entries to stabilize page->_refcount and
|
||||
page->_mapcount.
|
||||
page->_mapcount of anonymous pages. File pages just got unmapped.
|
||||
|
||||
We safe against physical memory scanners too: the only legitimate way
|
||||
scanner can get reference to a page is get_page_unless_zero().
|
||||
|
|
Loading…
Reference in New Issue