mirror of https://gitee.com/openkylin/linux.git
Merge branch 'akpm' (patches from Andrew)
Merge second set of updates from Andrew Morton: "More of MM" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (83 commits) mm/nommu.c: fix arithmetic overflow in __vm_enough_memory() mm/mmap.c: fix arithmetic overflow in __vm_enough_memory() vmstat: Reduce time interval to stat update on idle cpu mm/page_owner.c: remove unnecessary stack_trace field Documentation/filesystems/proc.txt: describe /proc/<pid>/map_files mm: incorporate read-only pages into transparent huge pages vmstat: do not use deferrable delayed work for vmstat_update mm: more aggressive page stealing for UNMOVABLE allocations mm: always steal split buddies in fallback allocations mm: when stealing freepages, also take pages created by splitting buddy page mincore: apply page table walker on do_mincore() mm: /proc/pid/clear_refs: avoid split_huge_page() mm: pagewalk: fix misbehavior of walk_page_range for vma(VM_PFNMAP) mempolicy: apply page table walker on queue_pages_range() arch/powerpc/mm/subpage-prot.c: use walk->vma and walk_page_vma() memcg: cleanup preparation for page table walk numa_maps: remove numa_maps->vma numa_maps: fix typo in gather_hugetbl_stats pagemap: use walk->vma instead of calling find_vma() clear_refs: remove clear_refs_private->vma and introduce clear_refs_test_walk() ...
This commit is contained in:
commit
59d53737a8
|
@ -327,6 +327,85 @@ supported and the interface files "release_agent" and
|
|||
- use_hierarchy is on by default and the cgroup file for the flag is
|
||||
not created.
|
||||
|
||||
- The original lower boundary, the soft limit, is defined as a limit
|
||||
that is per default unset. As a result, the set of cgroups that
|
||||
global reclaim prefers is opt-in, rather than opt-out. The costs
|
||||
for optimizing these mostly negative lookups are so high that the
|
||||
implementation, despite its enormous size, does not even provide the
|
||||
basic desirable behavior. First off, the soft limit has no
|
||||
hierarchical meaning. All configured groups are organized in a
|
||||
global rbtree and treated like equal peers, regardless where they
|
||||
are located in the hierarchy. This makes subtree delegation
|
||||
impossible. Second, the soft limit reclaim pass is so aggressive
|
||||
that it not just introduces high allocation latencies into the
|
||||
system, but also impacts system performance due to overreclaim, to
|
||||
the point where the feature becomes self-defeating.
|
||||
|
||||
The memory.low boundary on the other hand is a top-down allocated
|
||||
reserve. A cgroup enjoys reclaim protection when it and all its
|
||||
ancestors are below their low boundaries, which makes delegation of
|
||||
subtrees possible. Secondly, new cgroups have no reserve per
|
||||
default and in the common case most cgroups are eligible for the
|
||||
preferred reclaim pass. This allows the new low boundary to be
|
||||
efficiently implemented with just a minor addition to the generic
|
||||
reclaim code, without the need for out-of-band data structures and
|
||||
reclaim passes. Because the generic reclaim code considers all
|
||||
cgroups except for the ones running low in the preferred first
|
||||
reclaim pass, overreclaim of individual groups is eliminated as
|
||||
well, resulting in much better overall workload performance.
|
||||
|
||||
- The original high boundary, the hard limit, is defined as a strict
|
||||
limit that can not budge, even if the OOM killer has to be called.
|
||||
But this generally goes against the goal of making the most out of
|
||||
the available memory. The memory consumption of workloads varies
|
||||
during runtime, and that requires users to overcommit. But doing
|
||||
that with a strict upper limit requires either a fairly accurate
|
||||
prediction of the working set size or adding slack to the limit.
|
||||
Since working set size estimation is hard and error prone, and
|
||||
getting it wrong results in OOM kills, most users tend to err on the
|
||||
side of a looser limit and end up wasting precious resources.
|
||||
|
||||
The memory.high boundary on the other hand can be set much more
|
||||
conservatively. When hit, it throttles allocations by forcing them
|
||||
into direct reclaim to work off the excess, but it never invokes the
|
||||
OOM killer. As a result, a high boundary that is chosen too
|
||||
aggressively will not terminate the processes, but instead it will
|
||||
lead to gradual performance degradation. The user can monitor this
|
||||
and make corrections until the minimal memory footprint that still
|
||||
gives acceptable performance is found.
|
||||
|
||||
In extreme cases, with many concurrent allocations and a complete
|
||||
breakdown of reclaim progress within the group, the high boundary
|
||||
can be exceeded. But even then it's mostly better to satisfy the
|
||||
allocation from the slack available in other groups or the rest of
|
||||
the system than killing the group. Otherwise, memory.max is there
|
||||
to limit this type of spillover and ultimately contain buggy or even
|
||||
malicious applications.
|
||||
|
||||
- The original control file names are unwieldy and inconsistent in
|
||||
many different ways. For example, the upper boundary hit count is
|
||||
exported in the memory.failcnt file, but an OOM event count has to
|
||||
be manually counted by listening to memory.oom_control events, and
|
||||
lower boundary / soft limit events have to be counted by first
|
||||
setting a threshold for that value and then counting those events.
|
||||
Also, usage and limit files encode their units in the filename.
|
||||
That makes the filenames very long, even though this is not
|
||||
information that a user needs to be reminded of every time they type
|
||||
out those names.
|
||||
|
||||
To address these naming issues, as well as to signal clearly that
|
||||
the new interface carries a new configuration model, the naming
|
||||
conventions in it necessarily differ from the old interface.
|
||||
|
||||
- The original limit files indicate the state of an unset limit with a
|
||||
Very High Number, and a configured limit can be unset by echoing -1
|
||||
into those files. But that very high number is implementation and
|
||||
architecture dependent and not very descriptive. And while -1 can
|
||||
be understood as an underflow into the highest possible value, -2 or
|
||||
-10M etc. do not work, so it's not consistent.
|
||||
|
||||
memory.low, memory.high, and memory.max will use the string
|
||||
"infinity" to indicate and set the highest possible value.
|
||||
|
||||
5. Planned Changes
|
||||
|
||||
|
|
|
@ -42,6 +42,7 @@ Table of Contents
|
|||
3.6 /proc/<pid>/comm & /proc/<pid>/task/<tid>/comm
|
||||
3.7 /proc/<pid>/task/<tid>/children - Information about task children
|
||||
3.8 /proc/<pid>/fdinfo/<fd> - Information about opened file
|
||||
3.9 /proc/<pid>/map_files - Information about memory mapped files
|
||||
|
||||
4 Configuring procfs
|
||||
4.1 Mount options
|
||||
|
@ -1763,6 +1764,28 @@ pair provide additional information particular to the objects they represent.
|
|||
with TIMER_ABSTIME option which will be shown in 'settime flags', but 'it_value'
|
||||
still exhibits timer's remaining time.
|
||||
|
||||
3.9 /proc/<pid>/map_files - Information about memory mapped files
|
||||
---------------------------------------------------------------------
|
||||
This directory contains symbolic links which represent memory mapped files
|
||||
the process is maintaining. Example output:
|
||||
|
||||
| lr-------- 1 root root 64 Jan 27 11:24 333c600000-333c620000 -> /usr/lib64/ld-2.18.so
|
||||
| lr-------- 1 root root 64 Jan 27 11:24 333c81f000-333c820000 -> /usr/lib64/ld-2.18.so
|
||||
| lr-------- 1 root root 64 Jan 27 11:24 333c820000-333c821000 -> /usr/lib64/ld-2.18.so
|
||||
| ...
|
||||
| lr-------- 1 root root 64 Jan 27 11:24 35d0421000-35d0422000 -> /usr/lib64/libselinux.so.1
|
||||
| lr-------- 1 root root 64 Jan 27 11:24 400000-41a000 -> /usr/bin/ls
|
||||
|
||||
The name of a link represents the virtual memory bounds of a mapping, i.e.
|
||||
vm_area_struct::vm_start-vm_area_struct::vm_end.
|
||||
|
||||
The main purpose of the map_files is to retrieve a set of memory mapped
|
||||
files in a fast way instead of parsing /proc/<pid>/maps or
|
||||
/proc/<pid>/smaps, both of which contain many more records. At the same
|
||||
time one can open(2) mappings from the listings of two processes and
|
||||
comparing their inode numbers to figure out which anonymous memory areas
|
||||
are actually shared.
|
||||
|
||||
------------------------------------------------------------------------------
|
||||
Configuring procfs
|
||||
------------------------------------------------------------------------------
|
||||
|
|
|
@ -555,12 +555,12 @@ this is causing problems for your system/application.
|
|||
|
||||
oom_dump_tasks
|
||||
|
||||
Enables a system-wide task dump (excluding kernel threads) to be
|
||||
produced when the kernel performs an OOM-killing and includes such
|
||||
information as pid, uid, tgid, vm size, rss, nr_ptes, swapents,
|
||||
oom_score_adj score, and name. This is helpful to determine why the
|
||||
OOM killer was invoked, to identify the rogue task that caused it,
|
||||
and to determine why the OOM killer chose the task it did to kill.
|
||||
Enables a system-wide task dump (excluding kernel threads) to be produced
|
||||
when the kernel performs an OOM-killing and includes such information as
|
||||
pid, uid, tgid, vm size, rss, nr_ptes, nr_pmds, swapents, oom_score_adj
|
||||
score, and name. This is helpful to determine why the OOM killer was
|
||||
invoked, to identify the rogue task that caused it, and to determine why
|
||||
the OOM killer chose the task it did to kill.
|
||||
|
||||
If this is set to zero, this information is suppressed. On very
|
||||
large systems with thousands of tasks it may not be feasible to dump
|
||||
|
|
|
@ -62,6 +62,8 @@ There are three components to pagemap:
|
|||
20. NOPAGE
|
||||
21. KSM
|
||||
22. THP
|
||||
23. BALLOON
|
||||
24. ZERO_PAGE
|
||||
|
||||
Short descriptions to the page flags:
|
||||
|
||||
|
@ -102,6 +104,12 @@ Short descriptions to the page flags:
|
|||
22. THP
|
||||
contiguous pages which construct transparent hugepages
|
||||
|
||||
23. BALLOON
|
||||
balloon compaction page
|
||||
|
||||
24. ZERO_PAGE
|
||||
zero page for pfn_zero or huge_zero page
|
||||
|
||||
[IO related page flags]
|
||||
1. ERROR IO error occurred
|
||||
3. UPTODATE page has up-to-date data
|
||||
|
|
|
@ -45,7 +45,7 @@ struct vm_area_struct;
|
|||
#define PTRS_PER_PMD (1UL << (PAGE_SHIFT-3))
|
||||
#define PTRS_PER_PGD (1UL << (PAGE_SHIFT-3))
|
||||
#define USER_PTRS_PER_PGD (TASK_SIZE / PGDIR_SIZE)
|
||||
#define FIRST_USER_ADDRESS 0
|
||||
#define FIRST_USER_ADDRESS 0UL
|
||||
|
||||
/* Number of pointers that fit on a page: this will go away. */
|
||||
#define PTRS_PER_PAGE (1UL << (PAGE_SHIFT-3))
|
||||
|
|
|
@ -211,7 +211,7 @@
|
|||
* No special requirements for lowest virtual address we permit any user space
|
||||
* mapping to be mapped at.
|
||||
*/
|
||||
#define FIRST_USER_ADDRESS 0
|
||||
#define FIRST_USER_ADDRESS 0UL
|
||||
|
||||
|
||||
/****************************************************************
|
||||
|
|
|
@ -10,6 +10,8 @@
|
|||
#ifndef _ASM_PGTABLE_2LEVEL_H
|
||||
#define _ASM_PGTABLE_2LEVEL_H
|
||||
|
||||
#define __PAGETABLE_PMD_FOLDED
|
||||
|
||||
/*
|
||||
* Hardware-wise, we have a two level page table structure, where the first
|
||||
* level has 4096 entries, and the second level has 256 entries. Each entry
|
||||
|
|
|
@ -85,7 +85,7 @@ extern unsigned int kobjsize(const void *objp);
|
|||
#define VMALLOC_START 0UL
|
||||
#define VMALLOC_END 0xffffffffUL
|
||||
|
||||
#define FIRST_USER_ADDRESS (0)
|
||||
#define FIRST_USER_ADDRESS 0UL
|
||||
|
||||
#include <asm-generic/pgtable.h>
|
||||
|
||||
|
|
|
@ -36,12 +36,6 @@
|
|||
* of type casting from pmd_t * to pte_t *.
|
||||
*/
|
||||
|
||||
struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
|
||||
int write)
|
||||
{
|
||||
return ERR_PTR(-EINVAL);
|
||||
}
|
||||
|
||||
int pud_huge(pud_t pud)
|
||||
{
|
||||
return 0;
|
||||
|
|
|
@ -97,6 +97,7 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
|
|||
|
||||
no_pte:
|
||||
pmd_free(mm, new_pmd);
|
||||
mm_dec_nr_pmds(mm);
|
||||
no_pmd:
|
||||
pud_free(mm, new_pud);
|
||||
no_pud:
|
||||
|
@ -130,9 +131,11 @@ void pgd_free(struct mm_struct *mm, pgd_t *pgd_base)
|
|||
pte = pmd_pgtable(*pmd);
|
||||
pmd_clear(pmd);
|
||||
pte_free(mm, pte);
|
||||
atomic_long_dec(&mm->nr_ptes);
|
||||
no_pmd:
|
||||
pud_clear(pud);
|
||||
pmd_free(mm, pmd);
|
||||
mm_dec_nr_pmds(mm);
|
||||
no_pud:
|
||||
pgd_clear(pgd);
|
||||
pud_free(mm, pud);
|
||||
|
@ -152,6 +155,7 @@ void pgd_free(struct mm_struct *mm, pgd_t *pgd_base)
|
|||
pmd = pmd_offset(pud, 0);
|
||||
pud_clear(pud);
|
||||
pmd_free(mm, pmd);
|
||||
mm_dec_nr_pmds(mm);
|
||||
pgd_clear(pgd);
|
||||
pud_free(mm, pud);
|
||||
}
|
||||
|
|
|
@ -45,7 +45,7 @@
|
|||
|
||||
#define vmemmap ((struct page *)(VMALLOC_END + SZ_64K))
|
||||
|
||||
#define FIRST_USER_ADDRESS 0
|
||||
#define FIRST_USER_ADDRESS 0UL
|
||||
|
||||
#ifndef __ASSEMBLY__
|
||||
extern void __pte_error(const char *file, int line, unsigned long val);
|
||||
|
|
|
@ -38,12 +38,6 @@ int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep)
|
|||
}
|
||||
#endif
|
||||
|
||||
struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
|
||||
int write)
|
||||
{
|
||||
return ERR_PTR(-EINVAL);
|
||||
}
|
||||
|
||||
int pmd_huge(pmd_t pmd)
|
||||
{
|
||||
return !(pmd_val(pmd) & PMD_TABLE_BIT);
|
||||
|
|
|
@ -30,7 +30,7 @@
|
|||
#define PGDIR_MASK (~(PGDIR_SIZE-1))
|
||||
|
||||
#define USER_PTRS_PER_PGD (TASK_SIZE / PGDIR_SIZE)
|
||||
#define FIRST_USER_ADDRESS 0
|
||||
#define FIRST_USER_ADDRESS 0UL
|
||||
|
||||
#ifndef __ASSEMBLY__
|
||||
extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
|
||||
|
|
|
@ -67,7 +67,7 @@ extern void paging_init(void);
|
|||
*/
|
||||
|
||||
#define USER_PTRS_PER_PGD (TASK_SIZE/PGDIR_SIZE)
|
||||
#define FIRST_USER_ADDRESS 0
|
||||
#define FIRST_USER_ADDRESS 0UL
|
||||
|
||||
/* zero page used for uninitialized stuff */
|
||||
#ifndef __ASSEMBLY__
|
||||
|
|
|
@ -140,7 +140,7 @@ extern unsigned long empty_zero_page;
|
|||
#define PTRS_PER_PTE 4096
|
||||
|
||||
#define USER_PGDS_IN_LAST_PML4 (TASK_SIZE / PGDIR_SIZE)
|
||||
#define FIRST_USER_ADDRESS 0
|
||||
#define FIRST_USER_ADDRESS 0UL
|
||||
|
||||
#define USER_PGD_PTRS (PAGE_OFFSET >> PGDIR_SHIFT)
|
||||
#define KERNEL_PGD_PTRS (PTRS_PER_PGD - USER_PGD_PTRS)
|
||||
|
|
|
@ -171,7 +171,7 @@ extern unsigned long _dflt_cache_att;
|
|||
extern pgd_t swapper_pg_dir[PTRS_PER_PGD]; /* located in head.S */
|
||||
|
||||
/* Seems to be zero even in architectures where the zero page is firewalled? */
|
||||
#define FIRST_USER_ADDRESS 0
|
||||
#define FIRST_USER_ADDRESS 0UL
|
||||
#define pte_special(pte) 0
|
||||
#define pte_mkspecial(pte) (pte)
|
||||
|
||||
|
|
|
@ -127,7 +127,7 @@
|
|||
#define PTRS_PER_PGD_SHIFT PTRS_PER_PTD_SHIFT
|
||||
#define PTRS_PER_PGD (1UL << PTRS_PER_PGD_SHIFT)
|
||||
#define USER_PTRS_PER_PGD (5*PTRS_PER_PGD/8) /* regions 0-4 are user regions */
|
||||
#define FIRST_USER_ADDRESS 0
|
||||
#define FIRST_USER_ADDRESS 0UL
|
||||
|
||||
/*
|
||||
* All the normal masks have the "page accessed" bits on, as any time
|
||||
|
|
|
@ -114,12 +114,6 @@ int pud_huge(pud_t pud)
|
|||
return 0;
|
||||
}
|
||||
|
||||
struct page *
|
||||
follow_huge_pmd(struct mm_struct *mm, unsigned long address, pmd_t *pmd, int write)
|
||||
{
|
||||
return NULL;
|
||||
}
|
||||
|
||||
void hugetlb_free_pgd_range(struct mmu_gather *tlb,
|
||||
unsigned long addr, unsigned long end,
|
||||
unsigned long floor, unsigned long ceiling)
|
||||
|
|
|
@ -53,7 +53,7 @@ extern unsigned long empty_zero_page[1024];
|
|||
#define PGDIR_MASK (~(PGDIR_SIZE - 1))
|
||||
|
||||
#define USER_PTRS_PER_PGD (TASK_SIZE / PGDIR_SIZE)
|
||||
#define FIRST_USER_ADDRESS 0
|
||||
#define FIRST_USER_ADDRESS 0UL
|
||||
|
||||
#ifndef __ASSEMBLY__
|
||||
/* Just any arbitrary offset to the start of the vmalloc VM area: the
|
||||
|
|
|
@ -66,7 +66,7 @@
|
|||
#define PTRS_PER_PGD 128
|
||||
#endif
|
||||
#define USER_PTRS_PER_PGD (TASK_SIZE/PGDIR_SIZE)
|
||||
#define FIRST_USER_ADDRESS 0
|
||||
#define FIRST_USER_ADDRESS 0UL
|
||||
|
||||
/* Virtual address region for use by kernel_map() */
|
||||
#ifdef CONFIG_SUN3
|
||||
|
|
|
@ -94,12 +94,6 @@ int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep)
|
|||
return 0;
|
||||
}
|
||||
|
||||
struct page *follow_huge_addr(struct mm_struct *mm,
|
||||
unsigned long address, int write)
|
||||
{
|
||||
return ERR_PTR(-EINVAL);
|
||||
}
|
||||
|
||||
int pmd_huge(pmd_t pmd)
|
||||
{
|
||||
return pmd_page_shift(pmd) > PAGE_SHIFT;
|
||||
|
|
|
@ -61,6 +61,8 @@ extern int mem_init_done;
|
|||
|
||||
#include <asm-generic/4level-fixup.h>
|
||||
|
||||
#define __PAGETABLE_PMD_FOLDED
|
||||
|
||||
#ifdef __KERNEL__
|
||||
#ifndef __ASSEMBLY__
|
||||
|
||||
|
@ -70,7 +72,7 @@ extern int mem_init_done;
|
|||
#include <asm/mmu.h>
|
||||
#include <asm/page.h>
|
||||
|
||||
#define FIRST_USER_ADDRESS 0
|
||||
#define FIRST_USER_ADDRESS 0UL
|
||||
|
||||
extern unsigned long va_to_phys(unsigned long address);
|
||||
extern pte_t *va_to_pte(unsigned long address);
|
||||
|
|
|
@ -57,7 +57,7 @@ extern int add_temporary_entry(unsigned long entrylo0, unsigned long entrylo1,
|
|||
#define PTRS_PER_PTE ((PAGE_SIZE << PTE_ORDER) / sizeof(pte_t))
|
||||
|
||||
#define USER_PTRS_PER_PGD (0x80000000UL/PGDIR_SIZE)
|
||||
#define FIRST_USER_ADDRESS 0
|
||||
#define FIRST_USER_ADDRESS 0UL
|
||||
|
||||
#define VMALLOC_START MAP_BASE
|
||||
|
||||
|
|
|
@ -301,11 +301,9 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
|
|||
start += nr << PAGE_SHIFT;
|
||||
pages += nr;
|
||||
|
||||
down_read(&mm->mmap_sem);
|
||||
ret = get_user_pages(current, mm, start,
|
||||
(end - start) >> PAGE_SHIFT,
|
||||
write, 0, pages, NULL);
|
||||
up_read(&mm->mmap_sem);
|
||||
ret = get_user_pages_unlocked(current, mm, start,
|
||||
(end - start) >> PAGE_SHIFT,
|
||||
write, 0, pages);
|
||||
|
||||
/* Have to be a bit careful with return values */
|
||||
if (nr > 0) {
|
||||
|
|
|
@ -68,12 +68,6 @@ int is_aligned_hugepage_range(unsigned long addr, unsigned long len)
|
|||
return 0;
|
||||
}
|
||||
|
||||
struct page *
|
||||
follow_huge_addr(struct mm_struct *mm, unsigned long address, int write)
|
||||
{
|
||||
return ERR_PTR(-EINVAL);
|
||||
}
|
||||
|
||||
int pmd_huge(pmd_t pmd)
|
||||
{
|
||||
return (pmd_val(pmd) & _PAGE_HUGE) != 0;
|
||||
|
@ -83,15 +77,3 @@ int pud_huge(pud_t pud)
|
|||
{
|
||||
return (pud_val(pud) & _PAGE_HUGE) != 0;
|
||||
}
|
||||
|
||||
struct page *
|
||||
follow_huge_pmd(struct mm_struct *mm, unsigned long address,
|
||||
pmd_t *pmd, int write)
|
||||
{
|
||||
struct page *page;
|
||||
|
||||
page = pte_page(*(pte_t *)pmd);
|
||||
if (page)
|
||||
page += ((address & ~HPAGE_MASK) >> PAGE_SHIFT);
|
||||
return page;
|
||||
}
|
||||
|
|
|
@ -65,7 +65,7 @@ extern void paging_init(void);
|
|||
#define PGDIR_MASK (~(PGDIR_SIZE - 1))
|
||||
|
||||
#define USER_PTRS_PER_PGD (TASK_SIZE / PGDIR_SIZE)
|
||||
#define FIRST_USER_ADDRESS 0
|
||||
#define FIRST_USER_ADDRESS 0UL
|
||||
|
||||
#define USER_PGD_PTRS (PAGE_OFFSET >> PGDIR_SHIFT)
|
||||
#define KERNEL_PGD_PTRS (PTRS_PER_PGD - USER_PGD_PTRS)
|
||||
|
|
|
@ -24,7 +24,7 @@
|
|||
#include <asm/pgtable-bits.h>
|
||||
#include <asm-generic/pgtable-nopmd.h>
|
||||
|
||||
#define FIRST_USER_ADDRESS 0
|
||||
#define FIRST_USER_ADDRESS 0UL
|
||||
|
||||
#define VMALLOC_START CONFIG_NIOS2_KERNEL_MMU_REGION_BASE
|
||||
#define VMALLOC_END (CONFIG_NIOS2_KERNEL_REGION_BASE - 1)
|
||||
|
|
|
@ -77,7 +77,7 @@ extern void paging_init(void);
|
|||
*/
|
||||
|
||||
#define USER_PTRS_PER_PGD (TASK_SIZE/PGDIR_SIZE)
|
||||
#define FIRST_USER_ADDRESS 0
|
||||
#define FIRST_USER_ADDRESS 0UL
|
||||
|
||||
/*
|
||||
* Kernels own virtual memory area.
|
||||
|
|
|
@ -134,7 +134,7 @@ extern void purge_tlb_entries(struct mm_struct *, unsigned long);
|
|||
* pgd entries used up by user/kernel:
|
||||
*/
|
||||
|
||||
#define FIRST_USER_ADDRESS 0
|
||||
#define FIRST_USER_ADDRESS 0UL
|
||||
|
||||
/* NB: The tlb miss handlers make certain assumptions about the order */
|
||||
/* of the following bits, so be careful (One example, bits 25-31 */
|
||||
|
|
|
@ -45,7 +45,7 @@ extern int icache_44x_need_flush;
|
|||
#define PTRS_PER_PGD (1 << (32 - PGDIR_SHIFT))
|
||||
|
||||
#define USER_PTRS_PER_PGD (TASK_SIZE / PGDIR_SIZE)
|
||||
#define FIRST_USER_ADDRESS 0
|
||||
#define FIRST_USER_ADDRESS 0UL
|
||||
|
||||
#define pte_ERROR(e) \
|
||||
pr_err("%s:%d: bad pte %llx.\n", __FILE__, __LINE__, \
|
||||
|
|
|
@ -12,7 +12,7 @@
|
|||
#endif
|
||||
#include <asm/barrier.h>
|
||||
|
||||
#define FIRST_USER_ADDRESS 0
|
||||
#define FIRST_USER_ADDRESS 0UL
|
||||
|
||||
/*
|
||||
* Size of EA range mapped by our pagetables.
|
||||
|
|
|
@ -714,6 +714,14 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address,
|
|||
return NULL;
|
||||
}
|
||||
|
||||
struct page *
|
||||
follow_huge_pud(struct mm_struct *mm, unsigned long address,
|
||||
pud_t *pud, int write)
|
||||
{
|
||||
BUG();
|
||||
return NULL;
|
||||
}
|
||||
|
||||
static unsigned long hugepte_addr_end(unsigned long addr, unsigned long end,
|
||||
unsigned long sz)
|
||||
{
|
||||
|
|
|
@ -134,7 +134,7 @@ static void subpage_prot_clear(unsigned long addr, unsigned long len)
|
|||
static int subpage_walk_pmd_entry(pmd_t *pmd, unsigned long addr,
|
||||
unsigned long end, struct mm_walk *walk)
|
||||
{
|
||||
struct vm_area_struct *vma = walk->private;
|
||||
struct vm_area_struct *vma = walk->vma;
|
||||
split_huge_page_pmd(vma, addr, pmd);
|
||||
return 0;
|
||||
}
|
||||
|
@ -163,9 +163,7 @@ static void subpage_mark_vma_nohuge(struct mm_struct *mm, unsigned long addr,
|
|||
if (vma->vm_start >= (addr + len))
|
||||
break;
|
||||
vma->vm_flags |= VM_NOHUGEPAGE;
|
||||
subpage_proto_walk.private = vma;
|
||||
walk_page_range(vma->vm_start, vma->vm_end,
|
||||
&subpage_proto_walk);
|
||||
walk_page_vma(vma, &subpage_proto_walk);
|
||||
vma = vma->vm_next;
|
||||
}
|
||||
}
|
||||
|
|
|
@ -99,7 +99,7 @@ extern unsigned long zero_page_mask;
|
|||
#endif /* CONFIG_64BIT */
|
||||
#define PTRS_PER_PGD 2048
|
||||
|
||||
#define FIRST_USER_ADDRESS 0
|
||||
#define FIRST_USER_ADDRESS 0UL
|
||||
|
||||
#define pte_ERROR(e) \
|
||||
printk("%s:%d: bad pte %p.\n", __FILE__, __LINE__, (void *) pte_val(e))
|
||||
|
|
|
@ -235,10 +235,8 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
|
|||
/* Try to get the remaining pages with get_user_pages */
|
||||
start += nr << PAGE_SHIFT;
|
||||
pages += nr;
|
||||
down_read(&mm->mmap_sem);
|
||||
ret = get_user_pages(current, mm, start,
|
||||
nr_pages - nr, write, 0, pages, NULL);
|
||||
up_read(&mm->mmap_sem);
|
||||
ret = get_user_pages_unlocked(current, mm, start,
|
||||
nr_pages - nr, write, 0, pages);
|
||||
/* Have to be a bit careful with return values */
|
||||
if (nr > 0)
|
||||
ret = (ret < 0) ? nr : ret + nr;
|
||||
|
|
|
@ -192,12 +192,6 @@ int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep)
|
|||
return 0;
|
||||
}
|
||||
|
||||
struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
|
||||
int write)
|
||||
{
|
||||
return ERR_PTR(-EINVAL);
|
||||
}
|
||||
|
||||
int pmd_huge(pmd_t pmd)
|
||||
{
|
||||
if (!MACHINE_HAS_HPAGE)
|
||||
|
@ -210,17 +204,3 @@ int pud_huge(pud_t pud)
|
|||
{
|
||||
return 0;
|
||||
}
|
||||
|
||||
struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
|
||||
pmd_t *pmdp, int write)
|
||||
{
|
||||
struct page *page;
|
||||
|
||||
if (!MACHINE_HAS_HPAGE)
|
||||
return NULL;
|
||||
|
||||
page = pmd_page(*pmdp);
|
||||
if (page)
|
||||
page += ((address & ~HPAGE_MASK) >> PAGE_SHIFT);
|
||||
return page;
|
||||
}
|
||||
|
|
|
@ -27,7 +27,7 @@ extern pte_t invalid_pte_table[PAGE_SIZE/sizeof(pte_t)];
|
|||
#define PTRS_PER_PTE 1024
|
||||
|
||||
#define USER_PTRS_PER_PGD (0x80000000UL/PGDIR_SIZE)
|
||||
#define FIRST_USER_ADDRESS 0
|
||||
#define FIRST_USER_ADDRESS 0UL
|
||||
|
||||
#define VMALLOC_START (0xc0000000UL)
|
||||
|
||||
|
|
|
@ -62,7 +62,7 @@ static inline unsigned long long neff_sign_extend(unsigned long val)
|
|||
/* Entries per level */
|
||||
#define PTRS_PER_PTE (PAGE_SIZE / (1 << PTE_MAGNITUDE))
|
||||
|
||||
#define FIRST_USER_ADDRESS 0
|
||||
#define FIRST_USER_ADDRESS 0UL
|
||||
|
||||
#define PHYS_ADDR_MASK29 0x1fffffff
|
||||
#define PHYS_ADDR_MASK32 0xffffffff
|
||||
|
|
|
@ -257,10 +257,8 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
|
|||
start += nr << PAGE_SHIFT;
|
||||
pages += nr;
|
||||
|
||||
down_read(&mm->mmap_sem);
|
||||
ret = get_user_pages(current, mm, start,
|
||||
(end - start) >> PAGE_SHIFT, write, 0, pages, NULL);
|
||||
up_read(&mm->mmap_sem);
|
||||
ret = get_user_pages_unlocked(current, mm, start,
|
||||
(end - start) >> PAGE_SHIFT, write, 0, pages);
|
||||
|
||||
/* Have to be a bit careful with return values */
|
||||
if (nr > 0) {
|
||||
|
|
|
@ -67,12 +67,6 @@ int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep)
|
|||
return 0;
|
||||
}
|
||||
|
||||
struct page *follow_huge_addr(struct mm_struct *mm,
|
||||
unsigned long address, int write)
|
||||
{
|
||||
return ERR_PTR(-EINVAL);
|
||||
}
|
||||
|
||||
int pmd_huge(pmd_t pmd)
|
||||
{
|
||||
return 0;
|
||||
|
@ -82,9 +76,3 @@ int pud_huge(pud_t pud)
|
|||
{
|
||||
return 0;
|
||||
}
|
||||
|
||||
struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
|
||||
pmd_t *pmd, int write)
|
||||
{
|
||||
return NULL;
|
||||
}
|
||||
|
|
|
@ -44,7 +44,7 @@ unsigned long __init bootmem_init(unsigned long *pages_avail);
|
|||
#define PTRS_PER_PMD SRMMU_PTRS_PER_PMD
|
||||
#define PTRS_PER_PGD SRMMU_PTRS_PER_PGD
|
||||
#define USER_PTRS_PER_PGD PAGE_OFFSET / SRMMU_PGDIR_SIZE
|
||||
#define FIRST_USER_ADDRESS 0
|
||||
#define FIRST_USER_ADDRESS 0UL
|
||||
#define PTE_SIZE (PTRS_PER_PTE*4)
|
||||
|
||||
#define PAGE_NONE SRMMU_PAGE_NONE
|
||||
|
@ -102,7 +102,8 @@ extern unsigned long empty_zero_page;
|
|||
*/
|
||||
static inline unsigned long srmmu_swap(unsigned long *addr, unsigned long value)
|
||||
{
|
||||
__asm__ __volatile__("swap [%2], %0" : "=&r" (value) : "0" (value), "r" (addr));
|
||||
__asm__ __volatile__("swap [%2], %0" :
|
||||
"=&r" (value) : "0" (value), "r" (addr) : "memory");
|
||||
return value;
|
||||
}
|
||||
|
||||
|
|
|
@ -93,7 +93,7 @@ bool kern_addr_valid(unsigned long addr);
|
|||
#define PTRS_PER_PGD (1UL << PGDIR_BITS)
|
||||
|
||||
/* Kernel has a separate 44bit address space. */
|
||||
#define FIRST_USER_ADDRESS 0
|
||||
#define FIRST_USER_ADDRESS 0UL
|
||||
|
||||
#define pmd_ERROR(e) \
|
||||
pr_err("%s:%d: bad pmd %p(%016lx) seen at (%pS)\n", \
|
||||
|
|
|
@ -249,10 +249,8 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
|
|||
start += nr << PAGE_SHIFT;
|
||||
pages += nr;
|
||||
|
||||
down_read(&mm->mmap_sem);
|
||||
ret = get_user_pages(current, mm, start,
|
||||
(end - start) >> PAGE_SHIFT, write, 0, pages, NULL);
|
||||
up_read(&mm->mmap_sem);
|
||||
ret = get_user_pages_unlocked(current, mm, start,
|
||||
(end - start) >> PAGE_SHIFT, write, 0, pages);
|
||||
|
||||
/* Have to be a bit careful with return values */
|
||||
if (nr > 0) {
|
||||
|
|
|
@ -215,12 +215,6 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
|
|||
return entry;
|
||||
}
|
||||
|
||||
struct page *follow_huge_addr(struct mm_struct *mm,
|
||||
unsigned long address, int write)
|
||||
{
|
||||
return ERR_PTR(-EINVAL);
|
||||
}
|
||||
|
||||
int pmd_huge(pmd_t pmd)
|
||||
{
|
||||
return 0;
|
||||
|
@ -230,9 +224,3 @@ int pud_huge(pud_t pud)
|
|||
{
|
||||
return 0;
|
||||
}
|
||||
|
||||
struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
|
||||
pmd_t *pmd, int write)
|
||||
{
|
||||
return NULL;
|
||||
}
|
||||
|
|
|
@ -67,7 +67,7 @@ extern void pgtable_cache_init(void);
|
|||
extern void paging_init(void);
|
||||
extern void set_page_homes(void);
|
||||
|
||||
#define FIRST_USER_ADDRESS 0
|
||||
#define FIRST_USER_ADDRESS 0UL
|
||||
|
||||
#define _PAGE_PRESENT HV_PTE_PRESENT
|
||||
#define _PAGE_HUGE_PAGE HV_PTE_PAGE
|
||||
|
|
|
@ -150,12 +150,6 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
|
|||
return NULL;
|
||||
}
|
||||
|
||||
struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
|
||||
int write)
|
||||
{
|
||||
return ERR_PTR(-EINVAL);
|
||||
}
|
||||
|
||||
int pmd_huge(pmd_t pmd)
|
||||
{
|
||||
return !!(pmd_val(pmd) & _PAGE_HUGE_PAGE);
|
||||
|
@ -166,28 +160,6 @@ int pud_huge(pud_t pud)
|
|||
return !!(pud_val(pud) & _PAGE_HUGE_PAGE);
|
||||
}
|
||||
|
||||
struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
|
||||
pmd_t *pmd, int write)
|
||||
{
|
||||
struct page *page;
|
||||
|
||||
page = pte_page(*(pte_t *)pmd);
|
||||
if (page)
|
||||
page += ((address & ~PMD_MASK) >> PAGE_SHIFT);
|
||||
return page;
|
||||
}
|
||||
|
||||
struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
|
||||
pud_t *pud, int write)
|
||||
{
|
||||
struct page *page;
|
||||
|
||||
page = pte_page(*(pte_t *)pud);
|
||||
if (page)
|
||||
page += ((address & ~PUD_MASK) >> PAGE_SHIFT);
|
||||
return page;
|
||||
}
|
||||
|
||||
int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep)
|
||||
{
|
||||
return 0;
|
||||
|
|
|
@ -23,7 +23,7 @@
|
|||
#define PTRS_PER_PTE 1024
|
||||
#define USER_PTRS_PER_PGD ((TASK_SIZE + (PGDIR_SIZE - 1)) / PGDIR_SIZE)
|
||||
#define PTRS_PER_PGD 1024
|
||||
#define FIRST_USER_ADDRESS 0
|
||||
#define FIRST_USER_ADDRESS 0UL
|
||||
|
||||
#define pte_ERROR(e) \
|
||||
printk("%s:%d: bad pte %p(%08lx).\n", __FILE__, __LINE__, &(e), \
|
||||
|
|
|
@ -41,7 +41,7 @@
|
|||
#endif
|
||||
|
||||
#define USER_PTRS_PER_PGD ((TASK_SIZE + (PGDIR_SIZE - 1)) / PGDIR_SIZE)
|
||||
#define FIRST_USER_ADDRESS 0
|
||||
#define FIRST_USER_ADDRESS 0UL
|
||||
|
||||
#define pte_ERROR(e) \
|
||||
printk("%s:%d: bad pte %p(%016lx).\n", __FILE__, __LINE__, &(e), \
|
||||
|
|
|
@ -69,6 +69,7 @@ pgd_t *get_pgd_slow(struct mm_struct *mm)
|
|||
|
||||
no_pte:
|
||||
pmd_free(mm, new_pmd);
|
||||
mm_dec_nr_pmds(mm);
|
||||
no_pmd:
|
||||
free_pages((unsigned long)new_pgd, 0);
|
||||
no_pgd:
|
||||
|
@ -96,7 +97,9 @@ void free_pgd_slow(struct mm_struct *mm, pgd_t *pgd)
|
|||
pte = pmd_pgtable(*pmd);
|
||||
pmd_clear(pmd);
|
||||
pte_free(mm, pte);
|
||||
atomic_long_dec(&mm->nr_ptes);
|
||||
pmd_free(mm, pmd);
|
||||
mm_dec_nr_pmds(mm);
|
||||
free:
|
||||
free_pages((unsigned long) pgd, 0);
|
||||
}
|
||||
|
|
|
@ -4,7 +4,7 @@
|
|||
#include <linux/const.h>
|
||||
#include <asm/page_types.h>
|
||||
|
||||
#define FIRST_USER_ADDRESS 0
|
||||
#define FIRST_USER_ADDRESS 0UL
|
||||
|
||||
#define _PAGE_BIT_PRESENT 0 /* is present */
|
||||
#define _PAGE_BIT_RW 1 /* writeable */
|
||||
|
|
|
@ -172,7 +172,7 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
|
|||
*/
|
||||
if (pmd_none(pmd) || pmd_trans_splitting(pmd))
|
||||
return 0;
|
||||
if (unlikely(pmd_large(pmd))) {
|
||||
if (unlikely(pmd_large(pmd) || !pmd_present(pmd))) {
|
||||
/*
|
||||
* NUMA hinting faults need to be handled in the GUP
|
||||
* slowpath for accounting purposes and so that they
|
||||
|
@ -388,10 +388,9 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
|
|||
start += nr << PAGE_SHIFT;
|
||||
pages += nr;
|
||||
|
||||
down_read(&mm->mmap_sem);
|
||||
ret = get_user_pages(current, mm, start,
|
||||
(end - start) >> PAGE_SHIFT, write, 0, pages, NULL);
|
||||
up_read(&mm->mmap_sem);
|
||||
ret = get_user_pages_unlocked(current, mm, start,
|
||||
(end - start) >> PAGE_SHIFT,
|
||||
write, 0, pages);
|
||||
|
||||
/* Have to be a bit careful with return values */
|
||||
if (nr > 0) {
|
||||
|
|
|
@ -52,23 +52,17 @@ int pud_huge(pud_t pud)
|
|||
return 0;
|
||||
}
|
||||
|
||||
struct page *
|
||||
follow_huge_pmd(struct mm_struct *mm, unsigned long address,
|
||||
pmd_t *pmd, int write)
|
||||
{
|
||||
return NULL;
|
||||
}
|
||||
#else
|
||||
|
||||
struct page *
|
||||
follow_huge_addr(struct mm_struct *mm, unsigned long address, int write)
|
||||
{
|
||||
return ERR_PTR(-EINVAL);
|
||||
}
|
||||
|
||||
/*
|
||||
* pmd_huge() returns 1 if @pmd is hugetlb related entry, that is normal
|
||||
* hugetlb entry or non-present (migration or hwpoisoned) hugetlb entry.
|
||||
* Otherwise, returns 0.
|
||||
*/
|
||||
int pmd_huge(pmd_t pmd)
|
||||
{
|
||||
return !!(pmd_val(pmd) & _PAGE_PSE);
|
||||
return !pmd_none(pmd) &&
|
||||
(pmd_val(pmd) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
|
||||
}
|
||||
|
||||
int pud_huge(pud_t pud)
|
||||
|
|
|
@ -190,7 +190,7 @@ void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmd)
|
|||
|
||||
#endif /* CONFIG_X86_PAE */
|
||||
|
||||
static void free_pmds(pmd_t *pmds[])
|
||||
static void free_pmds(struct mm_struct *mm, pmd_t *pmds[])
|
||||
{
|
||||
int i;
|
||||
|
||||
|
@ -198,10 +198,11 @@ static void free_pmds(pmd_t *pmds[])
|
|||
if (pmds[i]) {
|
||||
pgtable_pmd_page_dtor(virt_to_page(pmds[i]));
|
||||
free_page((unsigned long)pmds[i]);
|
||||
mm_dec_nr_pmds(mm);
|
||||
}
|
||||
}
|
||||
|
||||
static int preallocate_pmds(pmd_t *pmds[])
|
||||
static int preallocate_pmds(struct mm_struct *mm, pmd_t *pmds[])
|
||||
{
|
||||
int i;
|
||||
bool failed = false;
|
||||
|
@ -215,11 +216,13 @@ static int preallocate_pmds(pmd_t *pmds[])
|
|||
pmd = NULL;
|
||||
failed = true;
|
||||
}
|
||||
if (pmd)
|
||||
mm_inc_nr_pmds(mm);
|
||||
pmds[i] = pmd;
|
||||
}
|
||||
|
||||
if (failed) {
|
||||
free_pmds(pmds);
|
||||
free_pmds(mm, pmds);
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
|
@ -246,6 +249,7 @@ static void pgd_mop_up_pmds(struct mm_struct *mm, pgd_t *pgdp)
|
|||
|
||||
paravirt_release_pmd(pgd_val(pgd) >> PAGE_SHIFT);
|
||||
pmd_free(mm, pmd);
|
||||
mm_dec_nr_pmds(mm);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
@ -283,7 +287,7 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
|
|||
|
||||
mm->pgd = pgd;
|
||||
|
||||
if (preallocate_pmds(pmds) != 0)
|
||||
if (preallocate_pmds(mm, pmds) != 0)
|
||||
goto out_free_pgd;
|
||||
|
||||
if (paravirt_pgd_alloc(mm) != 0)
|
||||
|
@ -304,7 +308,7 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
|
|||
return pgd;
|
||||
|
||||
out_free_pmds:
|
||||
free_pmds(pmds);
|
||||
free_pmds(mm, pmds);
|
||||
out_free_pgd:
|
||||
free_page((unsigned long)pgd);
|
||||
out:
|
||||
|
|
|
@ -57,7 +57,7 @@
|
|||
#define PTRS_PER_PGD 1024
|
||||
#define PGD_ORDER 0
|
||||
#define USER_PTRS_PER_PGD (TASK_SIZE/PGDIR_SIZE)
|
||||
#define FIRST_USER_ADDRESS 0
|
||||
#define FIRST_USER_ADDRESS 0UL
|
||||
#define FIRST_USER_PGD_NR (FIRST_USER_ADDRESS >> PGDIR_SHIFT)
|
||||
|
||||
/*
|
||||
|
|
|
@ -124,10 +124,8 @@ int ivtv_udma_setup(struct ivtv *itv, unsigned long ivtv_dest_addr,
|
|||
}
|
||||
|
||||
/* Get user pages for DMA Xfer */
|
||||
down_read(¤t->mm->mmap_sem);
|
||||
err = get_user_pages(current, current->mm,
|
||||
user_dma.uaddr, user_dma.page_count, 0, 1, dma->map, NULL);
|
||||
up_read(¤t->mm->mmap_sem);
|
||||
err = get_user_pages_unlocked(current, current->mm,
|
||||
user_dma.uaddr, user_dma.page_count, 0, 1, dma->map);
|
||||
|
||||
if (user_dma.page_count != err) {
|
||||
IVTV_DEBUG_WARN("failed to map user pages, returned %d instead of %d\n",
|
||||
|
|
|
@ -4551,18 +4551,15 @@ static int sgl_map_user_pages(struct st_buffer *STbp,
|
|||
return -ENOMEM;
|
||||
|
||||
/* Try to fault in all of the necessary pages */
|
||||
down_read(¤t->mm->mmap_sem);
|
||||
/* rw==READ means read from drive, write into memory area */
|
||||
res = get_user_pages(
|
||||
res = get_user_pages_unlocked(
|
||||
current,
|
||||
current->mm,
|
||||
uaddr,
|
||||
nr_pages,
|
||||
rw == READ,
|
||||
0, /* don't force */
|
||||
pages,
|
||||
NULL);
|
||||
up_read(¤t->mm->mmap_sem);
|
||||
pages);
|
||||
|
||||
/* Errors and no page mapped should return here */
|
||||
if (res < nr_pages)
|
||||
|
|
|
@ -160,7 +160,12 @@ static unsigned long lowmem_scan(struct shrinker *s, struct shrink_control *sc)
|
|||
selected->pid, selected->comm,
|
||||
selected_oom_score_adj, selected_tasksize);
|
||||
lowmem_deathpending_timeout = jiffies + HZ;
|
||||
set_tsk_thread_flag(selected, TIF_MEMDIE);
|
||||
/*
|
||||
* FIXME: lowmemorykiller shouldn't abuse global OOM killer
|
||||
* infrastructure. There is no real reason why the selected
|
||||
* task should have access to the memory reserves.
|
||||
*/
|
||||
mark_tsk_oom_victim(selected);
|
||||
send_sig(SIGKILL, selected, 0);
|
||||
rem += selected_tasksize;
|
||||
}
|
||||
|
|
|
@ -90,7 +90,7 @@ static void sysrq_handle_loglevel(int key)
|
|||
|
||||
i = key - '0';
|
||||
console_loglevel = CONSOLE_LOGLEVEL_DEFAULT;
|
||||
printk("Loglevel set to %d\n", i);
|
||||
pr_info("Loglevel set to %d\n", i);
|
||||
console_loglevel = i;
|
||||
}
|
||||
static struct sysrq_key_op sysrq_loglevel_op = {
|
||||
|
@ -220,7 +220,7 @@ static void showacpu(void *dummy)
|
|||
return;
|
||||
|
||||
spin_lock_irqsave(&show_lock, flags);
|
||||
printk(KERN_INFO "CPU%d:\n", smp_processor_id());
|
||||
pr_info("CPU%d:\n", smp_processor_id());
|
||||
show_stack(NULL, NULL);
|
||||
spin_unlock_irqrestore(&show_lock, flags);
|
||||
}
|
||||
|
@ -243,7 +243,7 @@ static void sysrq_handle_showallcpus(int key)
|
|||
struct pt_regs *regs = get_irq_regs();
|
||||
|
||||
if (regs) {
|
||||
printk(KERN_INFO "CPU%d:\n", smp_processor_id());
|
||||
pr_info("CPU%d:\n", smp_processor_id());
|
||||
show_regs(regs);
|
||||
}
|
||||
schedule_work(&sysrq_showallcpus);
|
||||
|
@ -355,8 +355,9 @@ static struct sysrq_key_op sysrq_term_op = {
|
|||
|
||||
static void moom_callback(struct work_struct *ignored)
|
||||
{
|
||||
out_of_memory(node_zonelist(first_memory_node, GFP_KERNEL), GFP_KERNEL,
|
||||
0, NULL, true);
|
||||
if (!out_of_memory(node_zonelist(first_memory_node, GFP_KERNEL),
|
||||
GFP_KERNEL, 0, NULL, true))
|
||||
pr_info("OOM request ignored because killer is disabled\n");
|
||||
}
|
||||
|
||||
static DECLARE_WORK(moom_work, moom_callback);
|
||||
|
@ -522,7 +523,7 @@ void __handle_sysrq(int key, bool check_mask)
|
|||
*/
|
||||
orig_log_level = console_loglevel;
|
||||
console_loglevel = CONSOLE_LOGLEVEL_DEFAULT;
|
||||
printk(KERN_INFO "SysRq : ");
|
||||
pr_info("SysRq : ");
|
||||
|
||||
op_p = __sysrq_get_key_op(key);
|
||||
if (op_p) {
|
||||
|
@ -531,14 +532,14 @@ void __handle_sysrq(int key, bool check_mask)
|
|||
* should not) and is the invoked operation enabled?
|
||||
*/
|
||||
if (!check_mask || sysrq_on_mask(op_p->enable_mask)) {
|
||||
printk("%s\n", op_p->action_msg);
|
||||
pr_cont("%s\n", op_p->action_msg);
|
||||
console_loglevel = orig_log_level;
|
||||
op_p->handler(key);
|
||||
} else {
|
||||
printk("This sysrq operation is disabled.\n");
|
||||
pr_cont("This sysrq operation is disabled.\n");
|
||||
}
|
||||
} else {
|
||||
printk("HELP : ");
|
||||
pr_cont("HELP : ");
|
||||
/* Only print the help msg once per handler */
|
||||
for (i = 0; i < ARRAY_SIZE(sysrq_key_table); i++) {
|
||||
if (sysrq_key_table[i]) {
|
||||
|
@ -549,10 +550,10 @@ void __handle_sysrq(int key, bool check_mask)
|
|||
;
|
||||
if (j != i)
|
||||
continue;
|
||||
printk("%s ", sysrq_key_table[i]->help_msg);
|
||||
pr_cont("%s ", sysrq_key_table[i]->help_msg);
|
||||
}
|
||||
}
|
||||
printk("\n");
|
||||
pr_cont("\n");
|
||||
console_loglevel = orig_log_level;
|
||||
}
|
||||
rcu_read_unlock();
|
||||
|
|
|
@ -686,10 +686,8 @@ static ssize_t pvr2fb_write(struct fb_info *info, const char *buf,
|
|||
if (!pages)
|
||||
return -ENOMEM;
|
||||
|
||||
down_read(¤t->mm->mmap_sem);
|
||||
ret = get_user_pages(current, current->mm, (unsigned long)buf,
|
||||
nr_pages, WRITE, 0, pages, NULL);
|
||||
up_read(¤t->mm->mmap_sem);
|
||||
ret = get_user_pages_unlocked(current, current->mm, (unsigned long)buf,
|
||||
nr_pages, WRITE, 0, pages);
|
||||
|
||||
if (ret < nr_pages) {
|
||||
nr_pages = ret;
|
||||
|
|
|
@ -1407,8 +1407,8 @@ int extent_range_redirty_for_io(struct inode *inode, u64 start, u64 end)
|
|||
while (index <= end_index) {
|
||||
page = find_get_page(inode->i_mapping, index);
|
||||
BUG_ON(!page); /* Pages should be in the extent_io_tree */
|
||||
account_page_redirty(page);
|
||||
__set_page_dirty_nobuffers(page);
|
||||
account_page_redirty(page);
|
||||
page_cache_release(page);
|
||||
index++;
|
||||
}
|
||||
|
|
|
@ -5,6 +5,7 @@
|
|||
#include <linux/ksm.h>
|
||||
#include <linux/mm.h>
|
||||
#include <linux/mmzone.h>
|
||||
#include <linux/huge_mm.h>
|
||||
#include <linux/proc_fs.h>
|
||||
#include <linux/seq_file.h>
|
||||
#include <linux/hugetlb.h>
|
||||
|
@ -121,9 +122,18 @@ u64 stable_page_flags(struct page *page)
|
|||
* just checks PG_head/PG_tail, so we need to check PageLRU/PageAnon
|
||||
* to make sure a given page is a thp, not a non-huge compound page.
|
||||
*/
|
||||
else if (PageTransCompound(page) && (PageLRU(compound_head(page)) ||
|
||||
PageAnon(compound_head(page))))
|
||||
u |= 1 << KPF_THP;
|
||||
else if (PageTransCompound(page)) {
|
||||
struct page *head = compound_head(page);
|
||||
|
||||
if (PageLRU(head) || PageAnon(head))
|
||||
u |= 1 << KPF_THP;
|
||||
else if (is_huge_zero_page(head)) {
|
||||
u |= 1 << KPF_ZERO_PAGE;
|
||||
u |= 1 << KPF_THP;
|
||||
}
|
||||
} else if (is_zero_pfn(page_to_pfn(page)))
|
||||
u |= 1 << KPF_ZERO_PAGE;
|
||||
|
||||
|
||||
/*
|
||||
* Caveats on high order pages: page->_count will only be set
|
||||
|
|
|
@ -21,7 +21,7 @@
|
|||
|
||||
void task_mem(struct seq_file *m, struct mm_struct *mm)
|
||||
{
|
||||
unsigned long data, text, lib, swap;
|
||||
unsigned long data, text, lib, swap, ptes, pmds;
|
||||
unsigned long hiwater_vm, total_vm, hiwater_rss, total_rss;
|
||||
|
||||
/*
|
||||
|
@ -42,6 +42,8 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
|
|||
text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) >> 10;
|
||||
lib = (mm->exec_vm << (PAGE_SHIFT-10)) - text;
|
||||
swap = get_mm_counter(mm, MM_SWAPENTS);
|
||||
ptes = PTRS_PER_PTE * sizeof(pte_t) * atomic_long_read(&mm->nr_ptes);
|
||||
pmds = PTRS_PER_PMD * sizeof(pmd_t) * mm_nr_pmds(mm);
|
||||
seq_printf(m,
|
||||
"VmPeak:\t%8lu kB\n"
|
||||
"VmSize:\t%8lu kB\n"
|
||||
|
@ -54,6 +56,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
|
|||
"VmExe:\t%8lu kB\n"
|
||||
"VmLib:\t%8lu kB\n"
|
||||
"VmPTE:\t%8lu kB\n"
|
||||
"VmPMD:\t%8lu kB\n"
|
||||
"VmSwap:\t%8lu kB\n",
|
||||
hiwater_vm << (PAGE_SHIFT-10),
|
||||
total_vm << (PAGE_SHIFT-10),
|
||||
|
@ -63,8 +66,8 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
|
|||
total_rss << (PAGE_SHIFT-10),
|
||||
data << (PAGE_SHIFT-10),
|
||||
mm->stack_vm << (PAGE_SHIFT-10), text, lib,
|
||||
(PTRS_PER_PTE * sizeof(pte_t) *
|
||||
atomic_long_read(&mm->nr_ptes)) >> 10,
|
||||
ptes >> 10,
|
||||
pmds >> 10,
|
||||
swap << (PAGE_SHIFT-10));
|
||||
}
|
||||
|
||||
|
@ -433,7 +436,6 @@ const struct file_operations proc_tid_maps_operations = {
|
|||
|
||||
#ifdef CONFIG_PROC_PAGE_MONITOR
|
||||
struct mem_size_stats {
|
||||
struct vm_area_struct *vma;
|
||||
unsigned long resident;
|
||||
unsigned long shared_clean;
|
||||
unsigned long shared_dirty;
|
||||
|
@ -482,7 +484,7 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
|
|||
struct mm_walk *walk)
|
||||
{
|
||||
struct mem_size_stats *mss = walk->private;
|
||||
struct vm_area_struct *vma = mss->vma;
|
||||
struct vm_area_struct *vma = walk->vma;
|
||||
struct page *page = NULL;
|
||||
|
||||
if (pte_present(*pte)) {
|
||||
|
@ -506,7 +508,7 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
|
|||
struct mm_walk *walk)
|
||||
{
|
||||
struct mem_size_stats *mss = walk->private;
|
||||
struct vm_area_struct *vma = mss->vma;
|
||||
struct vm_area_struct *vma = walk->vma;
|
||||
struct page *page;
|
||||
|
||||
/* FOLL_DUMP will return -EFAULT on huge zero page */
|
||||
|
@ -527,8 +529,7 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
|
|||
static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
|
||||
struct mm_walk *walk)
|
||||
{
|
||||
struct mem_size_stats *mss = walk->private;
|
||||
struct vm_area_struct *vma = mss->vma;
|
||||
struct vm_area_struct *vma = walk->vma;
|
||||
pte_t *pte;
|
||||
spinlock_t *ptl;
|
||||
|
||||
|
@ -620,10 +621,8 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
|
|||
};
|
||||
|
||||
memset(&mss, 0, sizeof mss);
|
||||
mss.vma = vma;
|
||||
/* mmap_sem is held in m_start */
|
||||
if (vma->vm_mm && !is_vm_hugetlb_page(vma))
|
||||
walk_page_range(vma->vm_start, vma->vm_end, &smaps_walk);
|
||||
walk_page_vma(vma, &smaps_walk);
|
||||
|
||||
show_map_vma(m, vma, is_pid);
|
||||
|
||||
|
@ -737,14 +736,13 @@ enum clear_refs_types {
|
|||
};
|
||||
|
||||
struct clear_refs_private {
|
||||
struct vm_area_struct *vma;
|
||||
enum clear_refs_types type;
|
||||
};
|
||||
|
||||
#ifdef CONFIG_MEM_SOFT_DIRTY
|
||||
static inline void clear_soft_dirty(struct vm_area_struct *vma,
|
||||
unsigned long addr, pte_t *pte)
|
||||
{
|
||||
#ifdef CONFIG_MEM_SOFT_DIRTY
|
||||
/*
|
||||
* The soft-dirty tracker uses #PF-s to catch writes
|
||||
* to pages, so write-protect the pte as well. See the
|
||||
|
@ -761,19 +759,60 @@ static inline void clear_soft_dirty(struct vm_area_struct *vma,
|
|||
}
|
||||
|
||||
set_pte_at(vma->vm_mm, addr, pte, ptent);
|
||||
#endif
|
||||
}
|
||||
|
||||
static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma,
|
||||
unsigned long addr, pmd_t *pmdp)
|
||||
{
|
||||
pmd_t pmd = *pmdp;
|
||||
|
||||
pmd = pmd_wrprotect(pmd);
|
||||
pmd = pmd_clear_flags(pmd, _PAGE_SOFT_DIRTY);
|
||||
|
||||
if (vma->vm_flags & VM_SOFTDIRTY)
|
||||
vma->vm_flags &= ~VM_SOFTDIRTY;
|
||||
|
||||
set_pmd_at(vma->vm_mm, addr, pmdp, pmd);
|
||||
}
|
||||
|
||||
#else
|
||||
|
||||
static inline void clear_soft_dirty(struct vm_area_struct *vma,
|
||||
unsigned long addr, pte_t *pte)
|
||||
{
|
||||
}
|
||||
|
||||
static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma,
|
||||
unsigned long addr, pmd_t *pmdp)
|
||||
{
|
||||
}
|
||||
#endif
|
||||
|
||||
static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
|
||||
unsigned long end, struct mm_walk *walk)
|
||||
{
|
||||
struct clear_refs_private *cp = walk->private;
|
||||
struct vm_area_struct *vma = cp->vma;
|
||||
struct vm_area_struct *vma = walk->vma;
|
||||
pte_t *pte, ptent;
|
||||
spinlock_t *ptl;
|
||||
struct page *page;
|
||||
|
||||
split_huge_page_pmd(vma, addr, pmd);
|
||||
if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
|
||||
if (cp->type == CLEAR_REFS_SOFT_DIRTY) {
|
||||
clear_soft_dirty_pmd(vma, addr, pmd);
|
||||
goto out;
|
||||
}
|
||||
|
||||
page = pmd_page(*pmd);
|
||||
|
||||
/* Clear accessed and referenced bits. */
|
||||
pmdp_test_and_clear_young(vma, addr, pmd);
|
||||
ClearPageReferenced(page);
|
||||
out:
|
||||
spin_unlock(ptl);
|
||||
return 0;
|
||||
}
|
||||
|
||||
if (pmd_trans_unstable(pmd))
|
||||
return 0;
|
||||
|
||||
|
@ -802,6 +841,28 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
|
|||
return 0;
|
||||
}
|
||||
|
||||
static int clear_refs_test_walk(unsigned long start, unsigned long end,
|
||||
struct mm_walk *walk)
|
||||
{
|
||||
struct clear_refs_private *cp = walk->private;
|
||||
struct vm_area_struct *vma = walk->vma;
|
||||
|
||||
if (vma->vm_flags & VM_PFNMAP)
|
||||
return 1;
|
||||
|
||||
/*
|
||||
* Writing 1 to /proc/pid/clear_refs affects all pages.
|
||||
* Writing 2 to /proc/pid/clear_refs only affects anonymous pages.
|
||||
* Writing 3 to /proc/pid/clear_refs only affects file mapped pages.
|
||||
* Writing 4 to /proc/pid/clear_refs affects all pages.
|
||||
*/
|
||||
if (cp->type == CLEAR_REFS_ANON && vma->vm_file)
|
||||
return 1;
|
||||
if (cp->type == CLEAR_REFS_MAPPED && !vma->vm_file)
|
||||
return 1;
|
||||
return 0;
|
||||
}
|
||||
|
||||
static ssize_t clear_refs_write(struct file *file, const char __user *buf,
|
||||
size_t count, loff_t *ppos)
|
||||
{
|
||||
|
@ -842,6 +903,7 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
|
|||
};
|
||||
struct mm_walk clear_refs_walk = {
|
||||
.pmd_entry = clear_refs_pte_range,
|
||||
.test_walk = clear_refs_test_walk,
|
||||
.mm = mm,
|
||||
.private = &cp,
|
||||
};
|
||||
|
@ -861,28 +923,7 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
|
|||
}
|
||||
mmu_notifier_invalidate_range_start(mm, 0, -1);
|
||||
}
|
||||
for (vma = mm->mmap; vma; vma = vma->vm_next) {
|
||||
cp.vma = vma;
|
||||
if (is_vm_hugetlb_page(vma))
|
||||
continue;
|
||||
/*
|
||||
* Writing 1 to /proc/pid/clear_refs affects all pages.
|
||||
*
|
||||
* Writing 2 to /proc/pid/clear_refs only affects
|
||||
* Anonymous pages.
|
||||
*
|
||||
* Writing 3 to /proc/pid/clear_refs only affects file
|
||||
* mapped pages.
|
||||
*
|
||||
* Writing 4 to /proc/pid/clear_refs affects all pages.
|
||||
*/
|
||||
if (type == CLEAR_REFS_ANON && vma->vm_file)
|
||||
continue;
|
||||
if (type == CLEAR_REFS_MAPPED && !vma->vm_file)
|
||||
continue;
|
||||
walk_page_range(vma->vm_start, vma->vm_end,
|
||||
&clear_refs_walk);
|
||||
}
|
||||
walk_page_range(0, ~0UL, &clear_refs_walk);
|
||||
if (type == CLEAR_REFS_SOFT_DIRTY)
|
||||
mmu_notifier_invalidate_range_end(mm, 0, -1);
|
||||
flush_tlb_mm(mm);
|
||||
|
@ -1050,15 +1091,13 @@ static inline void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemap
|
|||
static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
|
||||
struct mm_walk *walk)
|
||||
{
|
||||
struct vm_area_struct *vma;
|
||||
struct vm_area_struct *vma = walk->vma;
|
||||
struct pagemapread *pm = walk->private;
|
||||
spinlock_t *ptl;
|
||||
pte_t *pte;
|
||||
pte_t *pte, *orig_pte;
|
||||
int err = 0;
|
||||
|
||||
/* find the first VMA at or above 'addr' */
|
||||
vma = find_vma(walk->mm, addr);
|
||||
if (vma && pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
|
||||
if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
|
||||
int pmd_flags2;
|
||||
|
||||
if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd))
|
||||
|
@ -1084,51 +1123,20 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
|
|||
if (pmd_trans_unstable(pmd))
|
||||
return 0;
|
||||
|
||||
while (1) {
|
||||
/* End of address space hole, which we mark as non-present. */
|
||||
unsigned long hole_end;
|
||||
/*
|
||||
* We can assume that @vma always points to a valid one and @end never
|
||||
* goes beyond vma->vm_end.
|
||||
*/
|
||||
orig_pte = pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
|
||||
for (; addr < end; pte++, addr += PAGE_SIZE) {
|
||||
pagemap_entry_t pme;
|
||||
|
||||
if (vma)
|
||||
hole_end = min(end, vma->vm_start);
|
||||
else
|
||||
hole_end = end;
|
||||
|
||||
for (; addr < hole_end; addr += PAGE_SIZE) {
|
||||
pagemap_entry_t pme = make_pme(PM_NOT_PRESENT(pm->v2));
|
||||
|
||||
err = add_to_pagemap(addr, &pme, pm);
|
||||
if (err)
|
||||
return err;
|
||||
}
|
||||
|
||||
if (!vma || vma->vm_start >= end)
|
||||
pte_to_pagemap_entry(&pme, pm, vma, addr, *pte);
|
||||
err = add_to_pagemap(addr, &pme, pm);
|
||||
if (err)
|
||||
break;
|
||||
/*
|
||||
* We can't possibly be in a hugetlb VMA. In general,
|
||||
* for a mm_walk with a pmd_entry and a hugetlb_entry,
|
||||
* the pmd_entry can only be called on addresses in a
|
||||
* hugetlb if the walk starts in a non-hugetlb VMA and
|
||||
* spans a hugepage VMA. Since pagemap_read walks are
|
||||
* PMD-sized and PMD-aligned, this will never be true.
|
||||
*/
|
||||
BUG_ON(is_vm_hugetlb_page(vma));
|
||||
|
||||
/* Addresses in the VMA. */
|
||||
for (; addr < min(end, vma->vm_end); addr += PAGE_SIZE) {
|
||||
pagemap_entry_t pme;
|
||||
pte = pte_offset_map(pmd, addr);
|
||||
pte_to_pagemap_entry(&pme, pm, vma, addr, *pte);
|
||||
pte_unmap(pte);
|
||||
err = add_to_pagemap(addr, &pme, pm);
|
||||
if (err)
|
||||
return err;
|
||||
}
|
||||
|
||||
if (addr == end)
|
||||
break;
|
||||
|
||||
vma = find_vma(walk->mm, addr);
|
||||
}
|
||||
pte_unmap_unlock(orig_pte, ptl);
|
||||
|
||||
cond_resched();
|
||||
|
||||
|
@ -1154,15 +1162,12 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
|
|||
struct mm_walk *walk)
|
||||
{
|
||||
struct pagemapread *pm = walk->private;
|
||||
struct vm_area_struct *vma;
|
||||
struct vm_area_struct *vma = walk->vma;
|
||||
int err = 0;
|
||||
int flags2;
|
||||
pagemap_entry_t pme;
|
||||
|
||||
vma = find_vma(walk->mm, addr);
|
||||
WARN_ON_ONCE(!vma);
|
||||
|
||||
if (vma && (vma->vm_flags & VM_SOFTDIRTY))
|
||||
if (vma->vm_flags & VM_SOFTDIRTY)
|
||||
flags2 = __PM_SOFT_DIRTY;
|
||||
else
|
||||
flags2 = 0;
|
||||
|
@ -1322,7 +1327,6 @@ const struct file_operations proc_pagemap_operations = {
|
|||
#ifdef CONFIG_NUMA
|
||||
|
||||
struct numa_maps {
|
||||
struct vm_area_struct *vma;
|
||||
unsigned long pages;
|
||||
unsigned long anon;
|
||||
unsigned long active;
|
||||
|
@ -1391,18 +1395,17 @@ static struct page *can_gather_numa_stats(pte_t pte, struct vm_area_struct *vma,
|
|||
static int gather_pte_stats(pmd_t *pmd, unsigned long addr,
|
||||
unsigned long end, struct mm_walk *walk)
|
||||
{
|
||||
struct numa_maps *md;
|
||||
struct numa_maps *md = walk->private;
|
||||
struct vm_area_struct *vma = walk->vma;
|
||||
spinlock_t *ptl;
|
||||
pte_t *orig_pte;
|
||||
pte_t *pte;
|
||||
|
||||
md = walk->private;
|
||||
|
||||
if (pmd_trans_huge_lock(pmd, md->vma, &ptl) == 1) {
|
||||
if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
|
||||
pte_t huge_pte = *(pte_t *)pmd;
|
||||
struct page *page;
|
||||
|
||||
page = can_gather_numa_stats(huge_pte, md->vma, addr);
|
||||
page = can_gather_numa_stats(huge_pte, vma, addr);
|
||||
if (page)
|
||||
gather_stats(page, md, pte_dirty(huge_pte),
|
||||
HPAGE_PMD_SIZE/PAGE_SIZE);
|
||||
|
@ -1414,7 +1417,7 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr,
|
|||
return 0;
|
||||
orig_pte = pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
|
||||
do {
|
||||
struct page *page = can_gather_numa_stats(*pte, md->vma, addr);
|
||||
struct page *page = can_gather_numa_stats(*pte, vma, addr);
|
||||
if (!page)
|
||||
continue;
|
||||
gather_stats(page, md, pte_dirty(*pte), 1);
|
||||
|
@ -1424,7 +1427,7 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr,
|
|||
return 0;
|
||||
}
|
||||
#ifdef CONFIG_HUGETLB_PAGE
|
||||
static int gather_hugetbl_stats(pte_t *pte, unsigned long hmask,
|
||||
static int gather_hugetlb_stats(pte_t *pte, unsigned long hmask,
|
||||
unsigned long addr, unsigned long end, struct mm_walk *walk)
|
||||
{
|
||||
struct numa_maps *md;
|
||||
|
@ -1443,7 +1446,7 @@ static int gather_hugetbl_stats(pte_t *pte, unsigned long hmask,
|
|||
}
|
||||
|
||||
#else
|
||||
static int gather_hugetbl_stats(pte_t *pte, unsigned long hmask,
|
||||
static int gather_hugetlb_stats(pte_t *pte, unsigned long hmask,
|
||||
unsigned long addr, unsigned long end, struct mm_walk *walk)
|
||||
{
|
||||
return 0;
|
||||
|
@ -1461,7 +1464,12 @@ static int show_numa_map(struct seq_file *m, void *v, int is_pid)
|
|||
struct numa_maps *md = &numa_priv->md;
|
||||
struct file *file = vma->vm_file;
|
||||
struct mm_struct *mm = vma->vm_mm;
|
||||
struct mm_walk walk = {};
|
||||
struct mm_walk walk = {
|
||||
.hugetlb_entry = gather_hugetlb_stats,
|
||||
.pmd_entry = gather_pte_stats,
|
||||
.private = md,
|
||||
.mm = mm,
|
||||
};
|
||||
struct mempolicy *pol;
|
||||
char buffer[64];
|
||||
int nid;
|
||||
|
@ -1472,13 +1480,6 @@ static int show_numa_map(struct seq_file *m, void *v, int is_pid)
|
|||
/* Ensure we start with an empty set of numa_maps statistics. */
|
||||
memset(md, 0, sizeof(*md));
|
||||
|
||||
md->vma = vma;
|
||||
|
||||
walk.hugetlb_entry = gather_hugetbl_stats;
|
||||
walk.pmd_entry = gather_pte_stats;
|
||||
walk.private = md;
|
||||
walk.mm = mm;
|
||||
|
||||
pol = __get_vma_policy(vma, vma->vm_start);
|
||||
if (pol) {
|
||||
mpol_to_str(buffer, sizeof(buffer), pol);
|
||||
|
@ -1512,7 +1513,8 @@ static int show_numa_map(struct seq_file *m, void *v, int is_pid)
|
|||
if (is_vm_hugetlb_page(vma))
|
||||
seq_puts(m, " huge");
|
||||
|
||||
walk_page_range(vma->vm_start, vma->vm_end, &walk);
|
||||
/* mmap_sem is held by m_start */
|
||||
walk_page_vma(vma, &walk);
|
||||
|
||||
if (!md->pages)
|
||||
goto out;
|
||||
|
|
|
@ -4,6 +4,7 @@
|
|||
#define __ARCH_HAS_4LEVEL_HACK
|
||||
#define __PAGETABLE_PUD_FOLDED
|
||||
|
||||
#define PUD_SHIFT PGDIR_SHIFT
|
||||
#define PUD_SIZE PGDIR_SIZE
|
||||
#define PUD_MASK PGDIR_MASK
|
||||
#define PTRS_PER_PUD 1
|
||||
|
|
|
@ -12,6 +12,10 @@
|
|||
#define COMPACT_PARTIAL 3
|
||||
/* The full zone was compacted */
|
||||
#define COMPACT_COMPLETE 4
|
||||
/* For more detailed tracepoint output */
|
||||
#define COMPACT_NO_SUITABLE_PAGE 5
|
||||
#define COMPACT_NOT_SUITABLE_ZONE 6
|
||||
/* When adding new state, please change compaction_status_string, too */
|
||||
|
||||
/* Used to signal whether compaction detected need_sched() or lock contention */
|
||||
/* No contention detected */
|
||||
|
@ -21,6 +25,8 @@
|
|||
/* Zone lock or lru_lock was contended in async compaction */
|
||||
#define COMPACT_CONTENDED_LOCK 2
|
||||
|
||||
struct alloc_context; /* in mm/internal.h */
|
||||
|
||||
#ifdef CONFIG_COMPACTION
|
||||
extern int sysctl_compact_memory;
|
||||
extern int sysctl_compaction_handler(struct ctl_table *table, int write,
|
||||
|
@ -30,81 +36,25 @@ extern int sysctl_extfrag_handler(struct ctl_table *table, int write,
|
|||
void __user *buffer, size_t *length, loff_t *ppos);
|
||||
|
||||
extern int fragmentation_index(struct zone *zone, unsigned int order);
|
||||
extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
|
||||
int order, gfp_t gfp_mask, nodemask_t *mask,
|
||||
enum migrate_mode mode, int *contended,
|
||||
int alloc_flags, int classzone_idx);
|
||||
extern unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
|
||||
int alloc_flags, const struct alloc_context *ac,
|
||||
enum migrate_mode mode, int *contended);
|
||||
extern void compact_pgdat(pg_data_t *pgdat, int order);
|
||||
extern void reset_isolation_suitable(pg_data_t *pgdat);
|
||||
extern unsigned long compaction_suitable(struct zone *zone, int order,
|
||||
int alloc_flags, int classzone_idx);
|
||||
|
||||
/* Do not skip compaction more than 64 times */
|
||||
#define COMPACT_MAX_DEFER_SHIFT 6
|
||||
|
||||
/*
|
||||
* Compaction is deferred when compaction fails to result in a page
|
||||
* allocation success. 1 << compact_defer_limit compactions are skipped up
|
||||
* to a limit of 1 << COMPACT_MAX_DEFER_SHIFT
|
||||
*/
|
||||
static inline void defer_compaction(struct zone *zone, int order)
|
||||
{
|
||||
zone->compact_considered = 0;
|
||||
zone->compact_defer_shift++;
|
||||
|
||||
if (order < zone->compact_order_failed)
|
||||
zone->compact_order_failed = order;
|
||||
|
||||
if (zone->compact_defer_shift > COMPACT_MAX_DEFER_SHIFT)
|
||||
zone->compact_defer_shift = COMPACT_MAX_DEFER_SHIFT;
|
||||
}
|
||||
|
||||
/* Returns true if compaction should be skipped this time */
|
||||
static inline bool compaction_deferred(struct zone *zone, int order)
|
||||
{
|
||||
unsigned long defer_limit = 1UL << zone->compact_defer_shift;
|
||||
|
||||
if (order < zone->compact_order_failed)
|
||||
return false;
|
||||
|
||||
/* Avoid possible overflow */
|
||||
if (++zone->compact_considered > defer_limit)
|
||||
zone->compact_considered = defer_limit;
|
||||
|
||||
return zone->compact_considered < defer_limit;
|
||||
}
|
||||
|
||||
/*
|
||||
* Update defer tracking counters after successful compaction of given order,
|
||||
* which means an allocation either succeeded (alloc_success == true) or is
|
||||
* expected to succeed.
|
||||
*/
|
||||
static inline void compaction_defer_reset(struct zone *zone, int order,
|
||||
bool alloc_success)
|
||||
{
|
||||
if (alloc_success) {
|
||||
zone->compact_considered = 0;
|
||||
zone->compact_defer_shift = 0;
|
||||
}
|
||||
if (order >= zone->compact_order_failed)
|
||||
zone->compact_order_failed = order + 1;
|
||||
}
|
||||
|
||||
/* Returns true if restarting compaction after many failures */
|
||||
static inline bool compaction_restarting(struct zone *zone, int order)
|
||||
{
|
||||
if (order < zone->compact_order_failed)
|
||||
return false;
|
||||
|
||||
return zone->compact_defer_shift == COMPACT_MAX_DEFER_SHIFT &&
|
||||
zone->compact_considered >= 1UL << zone->compact_defer_shift;
|
||||
}
|
||||
extern void defer_compaction(struct zone *zone, int order);
|
||||
extern bool compaction_deferred(struct zone *zone, int order);
|
||||
extern void compaction_defer_reset(struct zone *zone, int order,
|
||||
bool alloc_success);
|
||||
extern bool compaction_restarting(struct zone *zone, int order);
|
||||
|
||||
#else
|
||||
static inline unsigned long try_to_compact_pages(struct zonelist *zonelist,
|
||||
int order, gfp_t gfp_mask, nodemask_t *nodemask,
|
||||
enum migrate_mode mode, int *contended,
|
||||
int alloc_flags, int classzone_idx)
|
||||
static inline unsigned long try_to_compact_pages(gfp_t gfp_mask,
|
||||
unsigned int order, int alloc_flags,
|
||||
const struct alloc_context *ac,
|
||||
enum migrate_mode mode, int *contended)
|
||||
{
|
||||
return COMPACT_CONTINUE;
|
||||
}
|
||||
|
|
|
@ -334,18 +334,22 @@ alloc_pages(gfp_t gfp_mask, unsigned int order)
|
|||
}
|
||||
extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order,
|
||||
struct vm_area_struct *vma, unsigned long addr,
|
||||
int node);
|
||||
int node, bool hugepage);
|
||||
#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \
|
||||
alloc_pages_vma(gfp_mask, order, vma, addr, numa_node_id(), true)
|
||||
#else
|
||||
#define alloc_pages(gfp_mask, order) \
|
||||
alloc_pages_node(numa_node_id(), gfp_mask, order)
|
||||
#define alloc_pages_vma(gfp_mask, order, vma, addr, node) \
|
||||
#define alloc_pages_vma(gfp_mask, order, vma, addr, node, false)\
|
||||
alloc_pages(gfp_mask, order)
|
||||
#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \
|
||||
alloc_pages(gfp_mask, order)
|
||||
#endif
|
||||
#define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)
|
||||
#define alloc_page_vma(gfp_mask, vma, addr) \
|
||||
alloc_pages_vma(gfp_mask, 0, vma, addr, numa_node_id())
|
||||
alloc_pages_vma(gfp_mask, 0, vma, addr, numa_node_id(), false)
|
||||
#define alloc_page_vma_node(gfp_mask, vma, addr, node) \
|
||||
alloc_pages_vma(gfp_mask, 0, vma, addr, node)
|
||||
alloc_pages_vma(gfp_mask, 0, vma, addr, node, false)
|
||||
|
||||
extern struct page *alloc_kmem_pages(gfp_t gfp_mask, unsigned int order);
|
||||
extern struct page *alloc_kmem_pages_node(int nid, gfp_t gfp_mask,
|
||||
|
|
|
@ -157,6 +157,13 @@ static inline int hpage_nr_pages(struct page *page)
|
|||
extern int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
|
||||
unsigned long addr, pmd_t pmd, pmd_t *pmdp);
|
||||
|
||||
extern struct page *huge_zero_page;
|
||||
|
||||
static inline bool is_huge_zero_page(struct page *page)
|
||||
{
|
||||
return ACCESS_ONCE(huge_zero_page) == page;
|
||||
}
|
||||
|
||||
#else /* CONFIG_TRANSPARENT_HUGEPAGE */
|
||||
#define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; })
|
||||
#define HPAGE_PMD_MASK ({ BUILD_BUG(); 0; })
|
||||
|
@ -206,6 +213,11 @@ static inline int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_str
|
|||
return 0;
|
||||
}
|
||||
|
||||
static inline bool is_huge_zero_page(struct page *page)
|
||||
{
|
||||
return false;
|
||||
}
|
||||
|
||||
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
|
||||
|
||||
#endif /* _LINUX_HUGE_MM_H */
|
||||
|
|
|
@ -99,9 +99,9 @@ int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep);
|
|||
struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
|
||||
int write);
|
||||
struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
|
||||
pmd_t *pmd, int write);
|
||||
pmd_t *pmd, int flags);
|
||||
struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
|
||||
pud_t *pud, int write);
|
||||
pud_t *pud, int flags);
|
||||
int pmd_huge(pmd_t pmd);
|
||||
int pud_huge(pud_t pmd);
|
||||
unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
|
||||
|
@ -133,8 +133,8 @@ static inline void hugetlb_report_meminfo(struct seq_file *m)
|
|||
static inline void hugetlb_show_meminfo(void)
|
||||
{
|
||||
}
|
||||
#define follow_huge_pmd(mm, addr, pmd, write) NULL
|
||||
#define follow_huge_pud(mm, addr, pud, write) NULL
|
||||
#define follow_huge_pmd(mm, addr, pmd, flags) NULL
|
||||
#define follow_huge_pud(mm, addr, pud, flags) NULL
|
||||
#define prepare_hugepage_range(file, addr, len) (-EINVAL)
|
||||
#define pmd_huge(x) 0
|
||||
#define pud_huge(x) 0
|
||||
|
|
|
@ -200,17 +200,6 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, unsigned long hva,
|
|||
int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
|
||||
#endif
|
||||
|
||||
/*
|
||||
* Carry out a gup that requires IO. Allow the mm to relinquish the mmap
|
||||
* semaphore if the filemap/swap has to wait on a page lock. pagep == NULL
|
||||
* controls whether we retry the gup one more time to completion in that case.
|
||||
* Typically this is called after a FAULT_FLAG_RETRY_NOWAIT in the main tdp
|
||||
* handler.
|
||||
*/
|
||||
int kvm_get_user_page_io(struct task_struct *tsk, struct mm_struct *mm,
|
||||
unsigned long addr, bool write_fault,
|
||||
struct page **pagep);
|
||||
|
||||
enum {
|
||||
OUTSIDE_GUEST_MODE,
|
||||
IN_GUEST_MODE,
|
||||
|
|
|
@ -52,7 +52,27 @@ struct mem_cgroup_reclaim_cookie {
|
|||
unsigned int generation;
|
||||
};
|
||||
|
||||
enum mem_cgroup_events_index {
|
||||
MEM_CGROUP_EVENTS_PGPGIN, /* # of pages paged in */
|
||||
MEM_CGROUP_EVENTS_PGPGOUT, /* # of pages paged out */
|
||||
MEM_CGROUP_EVENTS_PGFAULT, /* # of page-faults */
|
||||
MEM_CGROUP_EVENTS_PGMAJFAULT, /* # of major page-faults */
|
||||
MEM_CGROUP_EVENTS_NSTATS,
|
||||
/* default hierarchy events */
|
||||
MEMCG_LOW = MEM_CGROUP_EVENTS_NSTATS,
|
||||
MEMCG_HIGH,
|
||||
MEMCG_MAX,
|
||||
MEMCG_OOM,
|
||||
MEMCG_NR_EVENTS,
|
||||
};
|
||||
|
||||
#ifdef CONFIG_MEMCG
|
||||
void mem_cgroup_events(struct mem_cgroup *memcg,
|
||||
enum mem_cgroup_events_index idx,
|
||||
unsigned int nr);
|
||||
|
||||
bool mem_cgroup_low(struct mem_cgroup *root, struct mem_cgroup *memcg);
|
||||
|
||||
int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
|
||||
gfp_t gfp_mask, struct mem_cgroup **memcgp);
|
||||
void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
|
||||
|
@ -102,6 +122,7 @@ void mem_cgroup_iter_break(struct mem_cgroup *, struct mem_cgroup *);
|
|||
* For memory reclaim.
|
||||
*/
|
||||
int mem_cgroup_inactive_anon_is_low(struct lruvec *lruvec);
|
||||
bool mem_cgroup_lruvec_online(struct lruvec *lruvec);
|
||||
int mem_cgroup_select_victim_node(struct mem_cgroup *memcg);
|
||||
unsigned long mem_cgroup_get_lru_size(struct lruvec *lruvec, enum lru_list);
|
||||
void mem_cgroup_update_lru_size(struct lruvec *, enum lru_list, int);
|
||||
|
@ -138,12 +159,10 @@ static inline bool mem_cgroup_disabled(void)
|
|||
return false;
|
||||
}
|
||||
|
||||
struct mem_cgroup *mem_cgroup_begin_page_stat(struct page *page, bool *locked,
|
||||
unsigned long *flags);
|
||||
void mem_cgroup_end_page_stat(struct mem_cgroup *memcg, bool *locked,
|
||||
unsigned long *flags);
|
||||
struct mem_cgroup *mem_cgroup_begin_page_stat(struct page *page);
|
||||
void mem_cgroup_update_page_stat(struct mem_cgroup *memcg,
|
||||
enum mem_cgroup_stat_index idx, int val);
|
||||
void mem_cgroup_end_page_stat(struct mem_cgroup *memcg);
|
||||
|
||||
static inline void mem_cgroup_inc_page_stat(struct mem_cgroup *memcg,
|
||||
enum mem_cgroup_stat_index idx)
|
||||
|
@ -176,6 +195,18 @@ void mem_cgroup_split_huge_fixup(struct page *head);
|
|||
#else /* CONFIG_MEMCG */
|
||||
struct mem_cgroup;
|
||||
|
||||
static inline void mem_cgroup_events(struct mem_cgroup *memcg,
|
||||
enum mem_cgroup_events_index idx,
|
||||
unsigned int nr)
|
||||
{
|
||||
}
|
||||
|
||||
static inline bool mem_cgroup_low(struct mem_cgroup *root,
|
||||
struct mem_cgroup *memcg)
|
||||
{
|
||||
return false;
|
||||
}
|
||||
|
||||
static inline int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
|
||||
gfp_t gfp_mask,
|
||||
struct mem_cgroup **memcgp)
|
||||
|
@ -268,6 +299,11 @@ mem_cgroup_inactive_anon_is_low(struct lruvec *lruvec)
|
|||
return 1;
|
||||
}
|
||||
|
||||
static inline bool mem_cgroup_lruvec_online(struct lruvec *lruvec)
|
||||
{
|
||||
return true;
|
||||
}
|
||||
|
||||
static inline unsigned long
|
||||
mem_cgroup_get_lru_size(struct lruvec *lruvec, enum lru_list lru)
|
||||
{
|
||||
|
@ -285,14 +321,12 @@ mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
|
|||
{
|
||||
}
|
||||
|
||||
static inline struct mem_cgroup *mem_cgroup_begin_page_stat(struct page *page,
|
||||
bool *locked, unsigned long *flags)
|
||||
static inline struct mem_cgroup *mem_cgroup_begin_page_stat(struct page *page)
|
||||
{
|
||||
return NULL;
|
||||
}
|
||||
|
||||
static inline void mem_cgroup_end_page_stat(struct mem_cgroup *memcg,
|
||||
bool *locked, unsigned long *flags)
|
||||
static inline void mem_cgroup_end_page_stat(struct mem_cgroup *memcg)
|
||||
{
|
||||
}
|
||||
|
||||
|
|
|
@ -484,7 +484,8 @@ static inline void page_mapcount_reset(struct page *page)
|
|||
|
||||
static inline int page_mapcount(struct page *page)
|
||||
{
|
||||
return atomic_read(&(page)->_mapcount) + 1;
|
||||
VM_BUG_ON_PAGE(PageSlab(page), page);
|
||||
return atomic_read(&page->_mapcount) + 1;
|
||||
}
|
||||
|
||||
static inline int page_count(struct page *page)
|
||||
|
@ -627,29 +628,28 @@ int split_free_page(struct page *page);
|
|||
* prototype for that function and accessor functions.
|
||||
* These are _only_ valid on the head of a PG_compound page.
|
||||
*/
|
||||
typedef void compound_page_dtor(struct page *);
|
||||
|
||||
static inline void set_compound_page_dtor(struct page *page,
|
||||
compound_page_dtor *dtor)
|
||||
{
|
||||
page[1].lru.next = (void *)dtor;
|
||||
page[1].compound_dtor = dtor;
|
||||
}
|
||||
|
||||
static inline compound_page_dtor *get_compound_page_dtor(struct page *page)
|
||||
{
|
||||
return (compound_page_dtor *)page[1].lru.next;
|
||||
return page[1].compound_dtor;
|
||||
}
|
||||
|
||||
static inline int compound_order(struct page *page)
|
||||
{
|
||||
if (!PageHead(page))
|
||||
return 0;
|
||||
return (unsigned long)page[1].lru.prev;
|
||||
return page[1].compound_order;
|
||||
}
|
||||
|
||||
static inline void set_compound_order(struct page *page, unsigned long order)
|
||||
{
|
||||
page[1].lru.prev = (void *)order;
|
||||
page[1].compound_order = order;
|
||||
}
|
||||
|
||||
#ifdef CONFIG_MMU
|
||||
|
@ -1164,8 +1164,6 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
|
|||
|
||||
/**
|
||||
* mm_walk - callbacks for walk_page_range
|
||||
* @pgd_entry: if set, called for each non-empty PGD (top-level) entry
|
||||
* @pud_entry: if set, called for each non-empty PUD (2nd-level) entry
|
||||
* @pmd_entry: if set, called for each non-empty PMD (3rd-level) entry
|
||||
* this handler is required to be able to handle
|
||||
* pmd_trans_huge() pmds. They may simply choose to
|
||||
|
@ -1173,16 +1171,18 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
|
|||
* @pte_entry: if set, called for each non-empty PTE (4th-level) entry
|
||||
* @pte_hole: if set, called for each hole at all levels
|
||||
* @hugetlb_entry: if set, called for each hugetlb entry
|
||||
* *Caution*: The caller must hold mmap_sem() if @hugetlb_entry
|
||||
* is used.
|
||||
* @test_walk: caller specific callback function to determine whether
|
||||
* we walk over the current vma or not. A positive returned
|
||||
* value means "do page table walk over the current vma,"
|
||||
* and a negative one means "abort current page table walk
|
||||
* right now." 0 means "skip the current vma."
|
||||
* @mm: mm_struct representing the target process of page table walk
|
||||
* @vma: vma currently walked (NULL if walking outside vmas)
|
||||
* @private: private data for callbacks' usage
|
||||
*
|
||||
* (see walk_page_range for more details)
|
||||
* (see the comment on walk_page_range() for more details)
|
||||
*/
|
||||
struct mm_walk {
|
||||
int (*pgd_entry)(pgd_t *pgd, unsigned long addr,
|
||||
unsigned long next, struct mm_walk *walk);
|
||||
int (*pud_entry)(pud_t *pud, unsigned long addr,
|
||||
unsigned long next, struct mm_walk *walk);
|
||||
int (*pmd_entry)(pmd_t *pmd, unsigned long addr,
|
||||
unsigned long next, struct mm_walk *walk);
|
||||
int (*pte_entry)(pte_t *pte, unsigned long addr,
|
||||
|
@ -1192,12 +1192,16 @@ struct mm_walk {
|
|||
int (*hugetlb_entry)(pte_t *pte, unsigned long hmask,
|
||||
unsigned long addr, unsigned long next,
|
||||
struct mm_walk *walk);
|
||||
int (*test_walk)(unsigned long addr, unsigned long next,
|
||||
struct mm_walk *walk);
|
||||
struct mm_struct *mm;
|
||||
struct vm_area_struct *vma;
|
||||
void *private;
|
||||
};
|
||||
|
||||
int walk_page_range(unsigned long addr, unsigned long end,
|
||||
struct mm_walk *walk);
|
||||
int walk_page_vma(struct vm_area_struct *vma, struct mm_walk *walk);
|
||||
void free_pgd_range(struct mmu_gather *tlb, unsigned long addr,
|
||||
unsigned long end, unsigned long floor, unsigned long ceiling);
|
||||
int copy_page_range(struct mm_struct *dst, struct mm_struct *src,
|
||||
|
@ -1261,6 +1265,17 @@ long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
|
|||
unsigned long start, unsigned long nr_pages,
|
||||
int write, int force, struct page **pages,
|
||||
struct vm_area_struct **vmas);
|
||||
long get_user_pages_locked(struct task_struct *tsk, struct mm_struct *mm,
|
||||
unsigned long start, unsigned long nr_pages,
|
||||
int write, int force, struct page **pages,
|
||||
int *locked);
|
||||
long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
|
||||
unsigned long start, unsigned long nr_pages,
|
||||
int write, int force, struct page **pages,
|
||||
unsigned int gup_flags);
|
||||
long get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
|
||||
unsigned long start, unsigned long nr_pages,
|
||||
int write, int force, struct page **pages);
|
||||
int get_user_pages_fast(unsigned long start, int nr_pages, int write,
|
||||
struct page **pages);
|
||||
struct kvec;
|
||||
|
@ -1438,8 +1453,32 @@ static inline int __pmd_alloc(struct mm_struct *mm, pud_t *pud,
|
|||
{
|
||||
return 0;
|
||||
}
|
||||
|
||||
static inline unsigned long mm_nr_pmds(struct mm_struct *mm)
|
||||
{
|
||||
return 0;
|
||||
}
|
||||
|
||||
static inline void mm_inc_nr_pmds(struct mm_struct *mm) {}
|
||||
static inline void mm_dec_nr_pmds(struct mm_struct *mm) {}
|
||||
|
||||
#else
|
||||
int __pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address);
|
||||
|
||||
static inline unsigned long mm_nr_pmds(struct mm_struct *mm)
|
||||
{
|
||||
return atomic_long_read(&mm->nr_pmds);
|
||||
}
|
||||
|
||||
static inline void mm_inc_nr_pmds(struct mm_struct *mm)
|
||||
{
|
||||
atomic_long_inc(&mm->nr_pmds);
|
||||
}
|
||||
|
||||
static inline void mm_dec_nr_pmds(struct mm_struct *mm)
|
||||
{
|
||||
atomic_long_dec(&mm->nr_pmds);
|
||||
}
|
||||
#endif
|
||||
|
||||
int __pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
|
||||
|
|
|
@ -28,6 +28,8 @@ struct mem_cgroup;
|
|||
IS_ENABLED(CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK))
|
||||
#define ALLOC_SPLIT_PTLOCKS (SPINLOCK_SIZE > BITS_PER_LONG/8)
|
||||
|
||||
typedef void compound_page_dtor(struct page *);
|
||||
|
||||
/*
|
||||
* Each physical page in the system has a struct page associated with
|
||||
* it to keep track of whatever it is we are using the page for at the
|
||||
|
@ -142,6 +144,12 @@ struct page {
|
|||
struct rcu_head rcu_head; /* Used by SLAB
|
||||
* when destroying via RCU
|
||||
*/
|
||||
/* First tail page of compound page */
|
||||
struct {
|
||||
compound_page_dtor *compound_dtor;
|
||||
unsigned long compound_order;
|
||||
};
|
||||
|
||||
#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS
|
||||
pgtable_t pmd_huge_pte; /* protected by page->ptl */
|
||||
#endif
|
||||
|
@ -355,7 +363,8 @@ struct mm_struct {
|
|||
pgd_t * pgd;
|
||||
atomic_t mm_users; /* How many users with user space? */
|
||||
atomic_t mm_count; /* How many references to "struct mm_struct" (users count as 1) */
|
||||
atomic_long_t nr_ptes; /* Page table pages */
|
||||
atomic_long_t nr_ptes; /* PTE page table pages */
|
||||
atomic_long_t nr_pmds; /* PMD page table pages */
|
||||
int map_count; /* number of VMAs */
|
||||
|
||||
spinlock_t page_table_lock; /* Protects page tables and some counters */
|
||||
|
|
|
@ -426,7 +426,7 @@ struct zone {
|
|||
const char *name;
|
||||
|
||||
/*
|
||||
* Number of MIGRATE_RESEVE page block. To maintain for just
|
||||
* Number of MIGRATE_RESERVE page block. To maintain for just
|
||||
* optimization. Protected by zone->lock.
|
||||
*/
|
||||
int nr_migrate_reserve_block;
|
||||
|
@ -970,7 +970,6 @@ static inline int zonelist_node_idx(struct zoneref *zoneref)
|
|||
* @z - The cursor used as a starting point for the search
|
||||
* @highest_zoneidx - The zone index of the highest zone to return
|
||||
* @nodes - An optional nodemask to filter the zonelist with
|
||||
* @zone - The first suitable zone found is returned via this parameter
|
||||
*
|
||||
* This function returns the next zone at or below a given zone index that is
|
||||
* within the allowed nodemask using a cursor as the starting point for the
|
||||
|
@ -980,8 +979,7 @@ static inline int zonelist_node_idx(struct zoneref *zoneref)
|
|||
*/
|
||||
struct zoneref *next_zones_zonelist(struct zoneref *z,
|
||||
enum zone_type highest_zoneidx,
|
||||
nodemask_t *nodes,
|
||||
struct zone **zone);
|
||||
nodemask_t *nodes);
|
||||
|
||||
/**
|
||||
* first_zones_zonelist - Returns the first zone at or below highest_zoneidx within the allowed nodemask in a zonelist
|
||||
|
@ -1000,8 +998,10 @@ static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist,
|
|||
nodemask_t *nodes,
|
||||
struct zone **zone)
|
||||
{
|
||||
return next_zones_zonelist(zonelist->_zonerefs, highest_zoneidx, nodes,
|
||||
zone);
|
||||
struct zoneref *z = next_zones_zonelist(zonelist->_zonerefs,
|
||||
highest_zoneidx, nodes);
|
||||
*zone = zonelist_zone(z);
|
||||
return z;
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -1018,7 +1018,8 @@ static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist,
|
|||
#define for_each_zone_zonelist_nodemask(zone, z, zlist, highidx, nodemask) \
|
||||
for (z = first_zones_zonelist(zlist, highidx, nodemask, &zone); \
|
||||
zone; \
|
||||
z = next_zones_zonelist(++z, highidx, nodemask, &zone)) \
|
||||
z = next_zones_zonelist(++z, highidx, nodemask), \
|
||||
zone = zonelist_zone(z)) \
|
||||
|
||||
/**
|
||||
* for_each_zone_zonelist - helper macro to iterate over valid zones in a zonelist at or below a given zone index
|
||||
|
|
|
@ -47,6 +47,10 @@ static inline bool oom_task_origin(const struct task_struct *p)
|
|||
return !!(p->signal->oom_flags & OOM_FLAG_ORIGIN);
|
||||
}
|
||||
|
||||
extern void mark_tsk_oom_victim(struct task_struct *tsk);
|
||||
|
||||
extern void unmark_oom_victim(void);
|
||||
|
||||
extern unsigned long oom_badness(struct task_struct *p,
|
||||
struct mem_cgroup *memcg, const nodemask_t *nodemask,
|
||||
unsigned long totalpages);
|
||||
|
@ -68,22 +72,14 @@ extern enum oom_scan_t oom_scan_process_thread(struct task_struct *task,
|
|||
unsigned long totalpages, const nodemask_t *nodemask,
|
||||
bool force_kill);
|
||||
|
||||
extern void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
|
||||
extern bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
|
||||
int order, nodemask_t *mask, bool force_kill);
|
||||
extern int register_oom_notifier(struct notifier_block *nb);
|
||||
extern int unregister_oom_notifier(struct notifier_block *nb);
|
||||
|
||||
extern bool oom_killer_disabled;
|
||||
|
||||
static inline void oom_killer_disable(void)
|
||||
{
|
||||
oom_killer_disabled = true;
|
||||
}
|
||||
|
||||
static inline void oom_killer_enable(void)
|
||||
{
|
||||
oom_killer_disabled = false;
|
||||
}
|
||||
extern bool oom_killer_disable(void);
|
||||
extern void oom_killer_enable(void);
|
||||
|
||||
extern struct task_struct *find_lock_task_mm(struct task_struct *p);
|
||||
|
||||
|
|
|
@ -41,7 +41,8 @@ int page_counter_try_charge(struct page_counter *counter,
|
|||
struct page_counter **fail);
|
||||
void page_counter_uncharge(struct page_counter *counter, unsigned long nr_pages);
|
||||
int page_counter_limit(struct page_counter *counter, unsigned long limit);
|
||||
int page_counter_memparse(const char *buf, unsigned long *nr_pages);
|
||||
int page_counter_memparse(const char *buf, const char *max,
|
||||
unsigned long *nr_pages);
|
||||
|
||||
static inline void page_counter_reset_watermark(struct page_counter *counter)
|
||||
{
|
||||
|
|
|
@ -40,7 +40,7 @@ struct page_ext {
|
|||
#ifdef CONFIG_PAGE_OWNER
|
||||
unsigned int order;
|
||||
gfp_t gfp_mask;
|
||||
struct stack_trace trace;
|
||||
unsigned int nr_entries;
|
||||
unsigned long trace_entries[8];
|
||||
#endif
|
||||
};
|
||||
|
|
|
@ -437,16 +437,6 @@ extern int reuse_swap_page(struct page *);
|
|||
extern int try_to_free_swap(struct page *);
|
||||
struct backing_dev_info;
|
||||
|
||||
#ifdef CONFIG_MEMCG
|
||||
extern void
|
||||
mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, bool swapout);
|
||||
#else
|
||||
static inline void
|
||||
mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, bool swapout)
|
||||
{
|
||||
}
|
||||
#endif
|
||||
|
||||
#else /* CONFIG_SWAP */
|
||||
|
||||
#define swap_address_space(entry) (NULL)
|
||||
|
@ -547,11 +537,6 @@ static inline swp_entry_t get_swap_page(void)
|
|||
return entry;
|
||||
}
|
||||
|
||||
static inline void
|
||||
mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent)
|
||||
{
|
||||
}
|
||||
|
||||
#endif /* CONFIG_SWAP */
|
||||
#endif /* __KERNEL__*/
|
||||
#endif /* _LINUX_SWAP_H */
|
||||
|
|
|
@ -135,6 +135,8 @@ static inline void make_migration_entry_read(swp_entry_t *entry)
|
|||
*entry = swp_entry(SWP_MIGRATION_READ, swp_offset(*entry));
|
||||
}
|
||||
|
||||
extern void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
|
||||
spinlock_t *ptl);
|
||||
extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
|
||||
unsigned long address);
|
||||
extern void migration_entry_wait_huge(struct vm_area_struct *vma,
|
||||
|
@ -148,6 +150,8 @@ static inline int is_migration_entry(swp_entry_t swp)
|
|||
}
|
||||
#define migration_entry_to_page(swp) NULL
|
||||
static inline void make_migration_entry_read(swp_entry_t *entryp) { }
|
||||
static inline void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
|
||||
spinlock_t *ptl) { }
|
||||
static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
|
||||
unsigned long address) { }
|
||||
static inline void migration_entry_wait_huge(struct vm_area_struct *vma,
|
||||
|
|
|
@ -11,39 +11,55 @@
|
|||
|
||||
DECLARE_EVENT_CLASS(mm_compaction_isolate_template,
|
||||
|
||||
TP_PROTO(unsigned long nr_scanned,
|
||||
TP_PROTO(
|
||||
unsigned long start_pfn,
|
||||
unsigned long end_pfn,
|
||||
unsigned long nr_scanned,
|
||||
unsigned long nr_taken),
|
||||
|
||||
TP_ARGS(nr_scanned, nr_taken),
|
||||
TP_ARGS(start_pfn, end_pfn, nr_scanned, nr_taken),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field(unsigned long, start_pfn)
|
||||
__field(unsigned long, end_pfn)
|
||||
__field(unsigned long, nr_scanned)
|
||||
__field(unsigned long, nr_taken)
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->start_pfn = start_pfn;
|
||||
__entry->end_pfn = end_pfn;
|
||||
__entry->nr_scanned = nr_scanned;
|
||||
__entry->nr_taken = nr_taken;
|
||||
),
|
||||
|
||||
TP_printk("nr_scanned=%lu nr_taken=%lu",
|
||||
TP_printk("range=(0x%lx ~ 0x%lx) nr_scanned=%lu nr_taken=%lu",
|
||||
__entry->start_pfn,
|
||||
__entry->end_pfn,
|
||||
__entry->nr_scanned,
|
||||
__entry->nr_taken)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(mm_compaction_isolate_template, mm_compaction_isolate_migratepages,
|
||||
|
||||
TP_PROTO(unsigned long nr_scanned,
|
||||
TP_PROTO(
|
||||
unsigned long start_pfn,
|
||||
unsigned long end_pfn,
|
||||
unsigned long nr_scanned,
|
||||
unsigned long nr_taken),
|
||||
|
||||
TP_ARGS(nr_scanned, nr_taken)
|
||||
TP_ARGS(start_pfn, end_pfn, nr_scanned, nr_taken)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(mm_compaction_isolate_template, mm_compaction_isolate_freepages,
|
||||
TP_PROTO(unsigned long nr_scanned,
|
||||
|
||||
TP_PROTO(
|
||||
unsigned long start_pfn,
|
||||
unsigned long end_pfn,
|
||||
unsigned long nr_scanned,
|
||||
unsigned long nr_taken),
|
||||
|
||||
TP_ARGS(nr_scanned, nr_taken)
|
||||
TP_ARGS(start_pfn, end_pfn, nr_scanned, nr_taken)
|
||||
);
|
||||
|
||||
TRACE_EVENT(mm_compaction_migratepages,
|
||||
|
@ -85,48 +101,199 @@ TRACE_EVENT(mm_compaction_migratepages,
|
|||
);
|
||||
|
||||
TRACE_EVENT(mm_compaction_begin,
|
||||
TP_PROTO(unsigned long zone_start, unsigned long migrate_start,
|
||||
unsigned long free_start, unsigned long zone_end),
|
||||
TP_PROTO(unsigned long zone_start, unsigned long migrate_pfn,
|
||||
unsigned long free_pfn, unsigned long zone_end, bool sync),
|
||||
|
||||
TP_ARGS(zone_start, migrate_start, free_start, zone_end),
|
||||
TP_ARGS(zone_start, migrate_pfn, free_pfn, zone_end, sync),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field(unsigned long, zone_start)
|
||||
__field(unsigned long, migrate_start)
|
||||
__field(unsigned long, free_start)
|
||||
__field(unsigned long, migrate_pfn)
|
||||
__field(unsigned long, free_pfn)
|
||||
__field(unsigned long, zone_end)
|
||||
__field(bool, sync)
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->zone_start = zone_start;
|
||||
__entry->migrate_start = migrate_start;
|
||||
__entry->free_start = free_start;
|
||||
__entry->migrate_pfn = migrate_pfn;
|
||||
__entry->free_pfn = free_pfn;
|
||||
__entry->zone_end = zone_end;
|
||||
__entry->sync = sync;
|
||||
),
|
||||
|
||||
TP_printk("zone_start=%lu migrate_start=%lu free_start=%lu zone_end=%lu",
|
||||
TP_printk("zone_start=0x%lx migrate_pfn=0x%lx free_pfn=0x%lx zone_end=0x%lx, mode=%s",
|
||||
__entry->zone_start,
|
||||
__entry->migrate_start,
|
||||
__entry->free_start,
|
||||
__entry->zone_end)
|
||||
__entry->migrate_pfn,
|
||||
__entry->free_pfn,
|
||||
__entry->zone_end,
|
||||
__entry->sync ? "sync" : "async")
|
||||
);
|
||||
|
||||
TRACE_EVENT(mm_compaction_end,
|
||||
TP_PROTO(int status),
|
||||
TP_PROTO(unsigned long zone_start, unsigned long migrate_pfn,
|
||||
unsigned long free_pfn, unsigned long zone_end, bool sync,
|
||||
int status),
|
||||
|
||||
TP_ARGS(status),
|
||||
TP_ARGS(zone_start, migrate_pfn, free_pfn, zone_end, sync, status),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field(unsigned long, zone_start)
|
||||
__field(unsigned long, migrate_pfn)
|
||||
__field(unsigned long, free_pfn)
|
||||
__field(unsigned long, zone_end)
|
||||
__field(bool, sync)
|
||||
__field(int, status)
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->zone_start = zone_start;
|
||||
__entry->migrate_pfn = migrate_pfn;
|
||||
__entry->free_pfn = free_pfn;
|
||||
__entry->zone_end = zone_end;
|
||||
__entry->sync = sync;
|
||||
__entry->status = status;
|
||||
),
|
||||
|
||||
TP_printk("status=%d", __entry->status)
|
||||
TP_printk("zone_start=0x%lx migrate_pfn=0x%lx free_pfn=0x%lx zone_end=0x%lx, mode=%s status=%s",
|
||||
__entry->zone_start,
|
||||
__entry->migrate_pfn,
|
||||
__entry->free_pfn,
|
||||
__entry->zone_end,
|
||||
__entry->sync ? "sync" : "async",
|
||||
compaction_status_string[__entry->status])
|
||||
);
|
||||
|
||||
TRACE_EVENT(mm_compaction_try_to_compact_pages,
|
||||
|
||||
TP_PROTO(
|
||||
int order,
|
||||
gfp_t gfp_mask,
|
||||
enum migrate_mode mode),
|
||||
|
||||
TP_ARGS(order, gfp_mask, mode),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field(int, order)
|
||||
__field(gfp_t, gfp_mask)
|
||||
__field(enum migrate_mode, mode)
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->order = order;
|
||||
__entry->gfp_mask = gfp_mask;
|
||||
__entry->mode = mode;
|
||||
),
|
||||
|
||||
TP_printk("order=%d gfp_mask=0x%x mode=%d",
|
||||
__entry->order,
|
||||
__entry->gfp_mask,
|
||||
(int)__entry->mode)
|
||||
);
|
||||
|
||||
DECLARE_EVENT_CLASS(mm_compaction_suitable_template,
|
||||
|
||||
TP_PROTO(struct zone *zone,
|
||||
int order,
|
||||
int ret),
|
||||
|
||||
TP_ARGS(zone, order, ret),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field(int, nid)
|
||||
__field(char *, name)
|
||||
__field(int, order)
|
||||
__field(int, ret)
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->nid = zone_to_nid(zone);
|
||||
__entry->name = (char *)zone->name;
|
||||
__entry->order = order;
|
||||
__entry->ret = ret;
|
||||
),
|
||||
|
||||
TP_printk("node=%d zone=%-8s order=%d ret=%s",
|
||||
__entry->nid,
|
||||
__entry->name,
|
||||
__entry->order,
|
||||
compaction_status_string[__entry->ret])
|
||||
);
|
||||
|
||||
DEFINE_EVENT(mm_compaction_suitable_template, mm_compaction_finished,
|
||||
|
||||
TP_PROTO(struct zone *zone,
|
||||
int order,
|
||||
int ret),
|
||||
|
||||
TP_ARGS(zone, order, ret)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(mm_compaction_suitable_template, mm_compaction_suitable,
|
||||
|
||||
TP_PROTO(struct zone *zone,
|
||||
int order,
|
||||
int ret),
|
||||
|
||||
TP_ARGS(zone, order, ret)
|
||||
);
|
||||
|
||||
#ifdef CONFIG_COMPACTION
|
||||
DECLARE_EVENT_CLASS(mm_compaction_defer_template,
|
||||
|
||||
TP_PROTO(struct zone *zone, int order),
|
||||
|
||||
TP_ARGS(zone, order),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field(int, nid)
|
||||
__field(char *, name)
|
||||
__field(int, order)
|
||||
__field(unsigned int, considered)
|
||||
__field(unsigned int, defer_shift)
|
||||
__field(int, order_failed)
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->nid = zone_to_nid(zone);
|
||||
__entry->name = (char *)zone->name;
|
||||
__entry->order = order;
|
||||
__entry->considered = zone->compact_considered;
|
||||
__entry->defer_shift = zone->compact_defer_shift;
|
||||
__entry->order_failed = zone->compact_order_failed;
|
||||
),
|
||||
|
||||
TP_printk("node=%d zone=%-8s order=%d order_failed=%d consider=%u limit=%lu",
|
||||
__entry->nid,
|
||||
__entry->name,
|
||||
__entry->order,
|
||||
__entry->order_failed,
|
||||
__entry->considered,
|
||||
1UL << __entry->defer_shift)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(mm_compaction_defer_template, mm_compaction_deferred,
|
||||
|
||||
TP_PROTO(struct zone *zone, int order),
|
||||
|
||||
TP_ARGS(zone, order)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(mm_compaction_defer_template, mm_compaction_defer_compaction,
|
||||
|
||||
TP_PROTO(struct zone *zone, int order),
|
||||
|
||||
TP_ARGS(zone, order)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(mm_compaction_defer_template, mm_compaction_defer_reset,
|
||||
|
||||
TP_PROTO(struct zone *zone, int order),
|
||||
|
||||
TP_ARGS(zone, order)
|
||||
);
|
||||
#endif
|
||||
|
||||
#endif /* _TRACE_COMPACTION_H */
|
||||
|
||||
/* This part must be outside protection */
|
||||
|
|
|
@ -268,11 +268,11 @@ TRACE_EVENT(mm_page_alloc_extfrag,
|
|||
|
||||
TP_PROTO(struct page *page,
|
||||
int alloc_order, int fallback_order,
|
||||
int alloc_migratetype, int fallback_migratetype, int new_migratetype),
|
||||
int alloc_migratetype, int fallback_migratetype),
|
||||
|
||||
TP_ARGS(page,
|
||||
alloc_order, fallback_order,
|
||||
alloc_migratetype, fallback_migratetype, new_migratetype),
|
||||
alloc_migratetype, fallback_migratetype),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( struct page *, page )
|
||||
|
@ -289,7 +289,8 @@ TRACE_EVENT(mm_page_alloc_extfrag,
|
|||
__entry->fallback_order = fallback_order;
|
||||
__entry->alloc_migratetype = alloc_migratetype;
|
||||
__entry->fallback_migratetype = fallback_migratetype;
|
||||
__entry->change_ownership = (new_migratetype == alloc_migratetype);
|
||||
__entry->change_ownership = (alloc_migratetype ==
|
||||
get_pageblock_migratetype(page));
|
||||
),
|
||||
|
||||
TP_printk("page=%p pfn=%lu alloc_order=%d fallback_order=%d pageblock_order=%d alloc_migratetype=%d fallback_migratetype=%d fragmenting=%d change_ownership=%d",
|
||||
|
|
|
@ -32,6 +32,7 @@
|
|||
#define KPF_KSM 21
|
||||
#define KPF_THP 22
|
||||
#define KPF_BALLOON 23
|
||||
#define KPF_ZERO_PAGE 24
|
||||
|
||||
|
||||
#endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */
|
||||
|
|
|
@ -435,7 +435,8 @@ static void exit_mm(struct task_struct *tsk)
|
|||
task_unlock(tsk);
|
||||
mm_update_next_owner(mm);
|
||||
mmput(mm);
|
||||
clear_thread_flag(TIF_MEMDIE);
|
||||
if (test_thread_flag(TIF_MEMDIE))
|
||||
unmark_oom_victim();
|
||||
}
|
||||
|
||||
static struct task_struct *find_alive_thread(struct task_struct *p)
|
||||
|
|
|
@ -555,6 +555,9 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p)
|
|||
INIT_LIST_HEAD(&mm->mmlist);
|
||||
mm->core_state = NULL;
|
||||
atomic_long_set(&mm->nr_ptes, 0);
|
||||
#ifndef __PAGETABLE_PMD_FOLDED
|
||||
atomic_long_set(&mm->nr_pmds, 0);
|
||||
#endif
|
||||
mm->map_count = 0;
|
||||
mm->locked_vm = 0;
|
||||
mm->pinned_vm = 0;
|
||||
|
@ -603,6 +606,14 @@ static void check_mm(struct mm_struct *mm)
|
|||
printk(KERN_ALERT "BUG: Bad rss-counter state "
|
||||
"mm:%p idx:%d val:%ld\n", mm, i, x);
|
||||
}
|
||||
|
||||
if (atomic_long_read(&mm->nr_ptes))
|
||||
pr_alert("BUG: non-zero nr_ptes on freeing mm: %ld\n",
|
||||
atomic_long_read(&mm->nr_ptes));
|
||||
if (mm_nr_pmds(mm))
|
||||
pr_alert("BUG: non-zero nr_pmds on freeing mm: %ld\n",
|
||||
mm_nr_pmds(mm));
|
||||
|
||||
#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
|
||||
VM_BUG_ON_MM(mm->pmd_huge_pte, mm);
|
||||
#endif
|
||||
|
|
|
@ -84,8 +84,8 @@ static int try_to_freeze_tasks(bool user_only)
|
|||
elapsed_msecs = elapsed_msecs64;
|
||||
|
||||
if (todo) {
|
||||
printk("\n");
|
||||
printk(KERN_ERR "Freezing of tasks %s after %d.%03d seconds "
|
||||
pr_cont("\n");
|
||||
pr_err("Freezing of tasks %s after %d.%03d seconds "
|
||||
"(%d tasks refusing to freeze, wq_busy=%d):\n",
|
||||
wakeup ? "aborted" : "failed",
|
||||
elapsed_msecs / 1000, elapsed_msecs % 1000,
|
||||
|
@ -101,37 +101,13 @@ static int try_to_freeze_tasks(bool user_only)
|
|||
read_unlock(&tasklist_lock);
|
||||
}
|
||||
} else {
|
||||
printk("(elapsed %d.%03d seconds) ", elapsed_msecs / 1000,
|
||||
pr_cont("(elapsed %d.%03d seconds) ", elapsed_msecs / 1000,
|
||||
elapsed_msecs % 1000);
|
||||
}
|
||||
|
||||
return todo ? -EBUSY : 0;
|
||||
}
|
||||
|
||||
static bool __check_frozen_processes(void)
|
||||
{
|
||||
struct task_struct *g, *p;
|
||||
|
||||
for_each_process_thread(g, p)
|
||||
if (p != current && !freezer_should_skip(p) && !frozen(p))
|
||||
return false;
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
/*
|
||||
* Returns true if all freezable tasks (except for current) are frozen already
|
||||
*/
|
||||
static bool check_frozen_processes(void)
|
||||
{
|
||||
bool ret;
|
||||
|
||||
read_lock(&tasklist_lock);
|
||||
ret = __check_frozen_processes();
|
||||
read_unlock(&tasklist_lock);
|
||||
return ret;
|
||||
}
|
||||
|
||||
/**
|
||||
* freeze_processes - Signal user space processes to enter the refrigerator.
|
||||
* The current thread will not be frozen. The same process that calls
|
||||
|
@ -142,7 +118,6 @@ static bool check_frozen_processes(void)
|
|||
int freeze_processes(void)
|
||||
{
|
||||
int error;
|
||||
int oom_kills_saved;
|
||||
|
||||
error = __usermodehelper_disable(UMH_FREEZING);
|
||||
if (error)
|
||||
|
@ -155,31 +130,24 @@ int freeze_processes(void)
|
|||
atomic_inc(&system_freezing_cnt);
|
||||
|
||||
pm_wakeup_clear();
|
||||
printk("Freezing user space processes ... ");
|
||||
pr_info("Freezing user space processes ... ");
|
||||
pm_freezing = true;
|
||||
oom_kills_saved = oom_kills_count();
|
||||
error = try_to_freeze_tasks(true);
|
||||
if (!error) {
|
||||
__usermodehelper_set_disable_depth(UMH_DISABLED);
|
||||
oom_killer_disable();
|
||||
|
||||
/*
|
||||
* There might have been an OOM kill while we were
|
||||
* freezing tasks and the killed task might be still
|
||||
* on the way out so we have to double check for race.
|
||||
*/
|
||||
if (oom_kills_count() != oom_kills_saved &&
|
||||
!check_frozen_processes()) {
|
||||
__usermodehelper_set_disable_depth(UMH_ENABLED);
|
||||
printk("OOM in progress.");
|
||||
error = -EBUSY;
|
||||
} else {
|
||||
printk("done.");
|
||||
}
|
||||
pr_cont("done.");
|
||||
}
|
||||
printk("\n");
|
||||
pr_cont("\n");
|
||||
BUG_ON(in_atomic());
|
||||
|
||||
/*
|
||||
* Now that the whole userspace is frozen we need to disbale
|
||||
* the OOM killer to disallow any further interference with
|
||||
* killable tasks.
|
||||
*/
|
||||
if (!error && !oom_killer_disable())
|
||||
error = -EBUSY;
|
||||
|
||||
if (error)
|
||||
thaw_processes();
|
||||
return error;
|
||||
|
@ -197,13 +165,14 @@ int freeze_kernel_threads(void)
|
|||
{
|
||||
int error;
|
||||
|
||||
printk("Freezing remaining freezable tasks ... ");
|
||||
pr_info("Freezing remaining freezable tasks ... ");
|
||||
|
||||
pm_nosig_freezing = true;
|
||||
error = try_to_freeze_tasks(false);
|
||||
if (!error)
|
||||
printk("done.");
|
||||
pr_cont("done.");
|
||||
|
||||
printk("\n");
|
||||
pr_cont("\n");
|
||||
BUG_ON(in_atomic());
|
||||
|
||||
if (error)
|
||||
|
@ -224,7 +193,7 @@ void thaw_processes(void)
|
|||
|
||||
oom_killer_enable();
|
||||
|
||||
printk("Restarting tasks ... ");
|
||||
pr_info("Restarting tasks ... ");
|
||||
|
||||
__usermodehelper_set_disable_depth(UMH_FREEZING);
|
||||
thaw_workqueues();
|
||||
|
@ -243,7 +212,7 @@ void thaw_processes(void)
|
|||
usermodehelper_enable();
|
||||
|
||||
schedule();
|
||||
printk("done.\n");
|
||||
pr_cont("done.\n");
|
||||
trace_suspend_resume(TPS("thaw_processes"), 0, false);
|
||||
}
|
||||
|
||||
|
@ -252,7 +221,7 @@ void thaw_kernel_threads(void)
|
|||
struct task_struct *g, *p;
|
||||
|
||||
pm_nosig_freezing = false;
|
||||
printk("Restarting kernel threads ... ");
|
||||
pr_info("Restarting kernel threads ... ");
|
||||
|
||||
thaw_workqueues();
|
||||
|
||||
|
@ -264,5 +233,5 @@ void thaw_kernel_threads(void)
|
|||
read_unlock(&tasklist_lock);
|
||||
|
||||
schedule();
|
||||
printk("done.\n");
|
||||
pr_cont("done.\n");
|
||||
}
|
||||
|
|
2
mm/cma.c
2
mm/cma.c
|
@ -199,6 +199,7 @@ int __init cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
|
|||
cma->order_per_bit = order_per_bit;
|
||||
*res_cma = cma;
|
||||
cma_area_count++;
|
||||
totalcma_pages += (size / PAGE_SIZE);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
@ -337,7 +338,6 @@ int __init cma_declare_contiguous(phys_addr_t base,
|
|||
if (ret)
|
||||
goto err;
|
||||
|
||||
totalcma_pages += (size / PAGE_SIZE);
|
||||
pr_info("Reserved %ld MiB at %pa\n", (unsigned long)size / SZ_1M,
|
||||
&base);
|
||||
return 0;
|
||||
|
|
156
mm/compaction.c
156
mm/compaction.c
|
@ -34,6 +34,17 @@ static inline void count_compact_events(enum vm_event_item item, long delta)
|
|||
#endif
|
||||
|
||||
#if defined CONFIG_COMPACTION || defined CONFIG_CMA
|
||||
#ifdef CONFIG_TRACEPOINTS
|
||||
static const char *const compaction_status_string[] = {
|
||||
"deferred",
|
||||
"skipped",
|
||||
"continue",
|
||||
"partial",
|
||||
"complete",
|
||||
"no_suitable_page",
|
||||
"not_suitable_zone",
|
||||
};
|
||||
#endif
|
||||
|
||||
#define CREATE_TRACE_POINTS
|
||||
#include <trace/events/compaction.h>
|
||||
|
@ -113,6 +124,77 @@ static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
|
|||
}
|
||||
|
||||
#ifdef CONFIG_COMPACTION
|
||||
|
||||
/* Do not skip compaction more than 64 times */
|
||||
#define COMPACT_MAX_DEFER_SHIFT 6
|
||||
|
||||
/*
|
||||
* Compaction is deferred when compaction fails to result in a page
|
||||
* allocation success. 1 << compact_defer_limit compactions are skipped up
|
||||
* to a limit of 1 << COMPACT_MAX_DEFER_SHIFT
|
||||
*/
|
||||
void defer_compaction(struct zone *zone, int order)
|
||||
{
|
||||
zone->compact_considered = 0;
|
||||
zone->compact_defer_shift++;
|
||||
|
||||
if (order < zone->compact_order_failed)
|
||||
zone->compact_order_failed = order;
|
||||
|
||||
if (zone->compact_defer_shift > COMPACT_MAX_DEFER_SHIFT)
|
||||
zone->compact_defer_shift = COMPACT_MAX_DEFER_SHIFT;
|
||||
|
||||
trace_mm_compaction_defer_compaction(zone, order);
|
||||
}
|
||||
|
||||
/* Returns true if compaction should be skipped this time */
|
||||
bool compaction_deferred(struct zone *zone, int order)
|
||||
{
|
||||
unsigned long defer_limit = 1UL << zone->compact_defer_shift;
|
||||
|
||||
if (order < zone->compact_order_failed)
|
||||
return false;
|
||||
|
||||
/* Avoid possible overflow */
|
||||
if (++zone->compact_considered > defer_limit)
|
||||
zone->compact_considered = defer_limit;
|
||||
|
||||
if (zone->compact_considered >= defer_limit)
|
||||
return false;
|
||||
|
||||
trace_mm_compaction_deferred(zone, order);
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
/*
|
||||
* Update defer tracking counters after successful compaction of given order,
|
||||
* which means an allocation either succeeded (alloc_success == true) or is
|
||||
* expected to succeed.
|
||||
*/
|
||||
void compaction_defer_reset(struct zone *zone, int order,
|
||||
bool alloc_success)
|
||||
{
|
||||
if (alloc_success) {
|
||||
zone->compact_considered = 0;
|
||||
zone->compact_defer_shift = 0;
|
||||
}
|
||||
if (order >= zone->compact_order_failed)
|
||||
zone->compact_order_failed = order + 1;
|
||||
|
||||
trace_mm_compaction_defer_reset(zone, order);
|
||||
}
|
||||
|
||||
/* Returns true if restarting compaction after many failures */
|
||||
bool compaction_restarting(struct zone *zone, int order)
|
||||
{
|
||||
if (order < zone->compact_order_failed)
|
||||
return false;
|
||||
|
||||
return zone->compact_defer_shift == COMPACT_MAX_DEFER_SHIFT &&
|
||||
zone->compact_considered >= 1UL << zone->compact_defer_shift;
|
||||
}
|
||||
|
||||
/* Returns true if the pageblock should be scanned for pages to isolate. */
|
||||
static inline bool isolation_suitable(struct compact_control *cc,
|
||||
struct page *page)
|
||||
|
@ -421,11 +503,12 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
|
|||
|
||||
}
|
||||
|
||||
trace_mm_compaction_isolate_freepages(*start_pfn, blockpfn,
|
||||
nr_scanned, total_isolated);
|
||||
|
||||
/* Record how far we have got within the block */
|
||||
*start_pfn = blockpfn;
|
||||
|
||||
trace_mm_compaction_isolate_freepages(nr_scanned, total_isolated);
|
||||
|
||||
/*
|
||||
* If strict isolation is requested by CMA then check that all the
|
||||
* pages requested were isolated. If there were any failures, 0 is
|
||||
|
@ -581,6 +664,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
|
|||
unsigned long flags = 0;
|
||||
bool locked = false;
|
||||
struct page *page = NULL, *valid_page = NULL;
|
||||
unsigned long start_pfn = low_pfn;
|
||||
|
||||
/*
|
||||
* Ensure that there are not too many pages isolated from the LRU
|
||||
|
@ -741,7 +825,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
|
|||
if (low_pfn == end_pfn)
|
||||
update_pageblock_skip(cc, valid_page, nr_isolated, true);
|
||||
|
||||
trace_mm_compaction_isolate_migratepages(nr_scanned, nr_isolated);
|
||||
trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn,
|
||||
nr_scanned, nr_isolated);
|
||||
|
||||
count_compact_events(COMPACTMIGRATE_SCANNED, nr_scanned);
|
||||
if (nr_isolated)
|
||||
|
@ -1037,7 +1122,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
|
|||
return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
|
||||
}
|
||||
|
||||
static int compact_finished(struct zone *zone, struct compact_control *cc,
|
||||
static int __compact_finished(struct zone *zone, struct compact_control *cc,
|
||||
const int migratetype)
|
||||
{
|
||||
unsigned int order;
|
||||
|
@ -1092,7 +1177,20 @@ static int compact_finished(struct zone *zone, struct compact_control *cc,
|
|||
return COMPACT_PARTIAL;
|
||||
}
|
||||
|
||||
return COMPACT_CONTINUE;
|
||||
return COMPACT_NO_SUITABLE_PAGE;
|
||||
}
|
||||
|
||||
static int compact_finished(struct zone *zone, struct compact_control *cc,
|
||||
const int migratetype)
|
||||
{
|
||||
int ret;
|
||||
|
||||
ret = __compact_finished(zone, cc, migratetype);
|
||||
trace_mm_compaction_finished(zone, cc->order, ret);
|
||||
if (ret == COMPACT_NO_SUITABLE_PAGE)
|
||||
ret = COMPACT_CONTINUE;
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -1102,7 +1200,7 @@ static int compact_finished(struct zone *zone, struct compact_control *cc,
|
|||
* COMPACT_PARTIAL - If the allocation would succeed without compaction
|
||||
* COMPACT_CONTINUE - If compaction should run now
|
||||
*/
|
||||
unsigned long compaction_suitable(struct zone *zone, int order,
|
||||
static unsigned long __compaction_suitable(struct zone *zone, int order,
|
||||
int alloc_flags, int classzone_idx)
|
||||
{
|
||||
int fragindex;
|
||||
|
@ -1146,11 +1244,24 @@ unsigned long compaction_suitable(struct zone *zone, int order,
|
|||
*/
|
||||
fragindex = fragmentation_index(zone, order);
|
||||
if (fragindex >= 0 && fragindex <= sysctl_extfrag_threshold)
|
||||
return COMPACT_SKIPPED;
|
||||
return COMPACT_NOT_SUITABLE_ZONE;
|
||||
|
||||
return COMPACT_CONTINUE;
|
||||
}
|
||||
|
||||
unsigned long compaction_suitable(struct zone *zone, int order,
|
||||
int alloc_flags, int classzone_idx)
|
||||
{
|
||||
unsigned long ret;
|
||||
|
||||
ret = __compaction_suitable(zone, order, alloc_flags, classzone_idx);
|
||||
trace_mm_compaction_suitable(zone, order, ret);
|
||||
if (ret == COMPACT_NOT_SUITABLE_ZONE)
|
||||
ret = COMPACT_SKIPPED;
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
static int compact_zone(struct zone *zone, struct compact_control *cc)
|
||||
{
|
||||
int ret;
|
||||
|
@ -1197,7 +1308,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
|
|||
zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
|
||||
}
|
||||
|
||||
trace_mm_compaction_begin(start_pfn, cc->migrate_pfn, cc->free_pfn, end_pfn);
|
||||
trace_mm_compaction_begin(start_pfn, cc->migrate_pfn,
|
||||
cc->free_pfn, end_pfn, sync);
|
||||
|
||||
migrate_prep_local();
|
||||
|
||||
|
@ -1299,7 +1411,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
|
|||
zone->compact_cached_free_pfn = free_pfn;
|
||||
}
|
||||
|
||||
trace_mm_compaction_end(ret);
|
||||
trace_mm_compaction_end(start_pfn, cc->migrate_pfn,
|
||||
cc->free_pfn, end_pfn, sync, ret);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
@ -1335,22 +1448,20 @@ int sysctl_extfrag_threshold = 500;
|
|||
|
||||
/**
|
||||
* try_to_compact_pages - Direct compact to satisfy a high-order allocation
|
||||
* @zonelist: The zonelist used for the current allocation
|
||||
* @order: The order of the current allocation
|
||||
* @gfp_mask: The GFP mask of the current allocation
|
||||
* @nodemask: The allowed nodes to allocate from
|
||||
* @order: The order of the current allocation
|
||||
* @alloc_flags: The allocation flags of the current allocation
|
||||
* @ac: The context of current allocation
|
||||
* @mode: The migration mode for async, sync light, or sync migration
|
||||
* @contended: Return value that determines if compaction was aborted due to
|
||||
* need_resched() or lock contention
|
||||
*
|
||||
* This is the main entry point for direct page compaction.
|
||||
*/
|
||||
unsigned long try_to_compact_pages(struct zonelist *zonelist,
|
||||
int order, gfp_t gfp_mask, nodemask_t *nodemask,
|
||||
enum migrate_mode mode, int *contended,
|
||||
int alloc_flags, int classzone_idx)
|
||||
unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
|
||||
int alloc_flags, const struct alloc_context *ac,
|
||||
enum migrate_mode mode, int *contended)
|
||||
{
|
||||
enum zone_type high_zoneidx = gfp_zone(gfp_mask);
|
||||
int may_enter_fs = gfp_mask & __GFP_FS;
|
||||
int may_perform_io = gfp_mask & __GFP_IO;
|
||||
struct zoneref *z;
|
||||
|
@ -1364,9 +1475,11 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
|
|||
if (!order || !may_enter_fs || !may_perform_io)
|
||||
return COMPACT_SKIPPED;
|
||||
|
||||
trace_mm_compaction_try_to_compact_pages(order, gfp_mask, mode);
|
||||
|
||||
/* Compact each zone in the list */
|
||||
for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx,
|
||||
nodemask) {
|
||||
for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx,
|
||||
ac->nodemask) {
|
||||
int status;
|
||||
int zone_contended;
|
||||
|
||||
|
@ -1374,7 +1487,8 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
|
|||
continue;
|
||||
|
||||
status = compact_zone_order(zone, order, gfp_mask, mode,
|
||||
&zone_contended, alloc_flags, classzone_idx);
|
||||
&zone_contended, alloc_flags,
|
||||
ac->classzone_idx);
|
||||
rc = max(status, rc);
|
||||
/*
|
||||
* It takes at least one zone that wasn't lock contended
|
||||
|
@ -1384,7 +1498,7 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
|
|||
|
||||
/* If a normal allocation would succeed, stop compacting */
|
||||
if (zone_watermark_ok(zone, order, low_wmark_pages(zone),
|
||||
classzone_idx, alloc_flags)) {
|
||||
ac->classzone_idx, alloc_flags)) {
|
||||
/*
|
||||
* We think the allocation will succeed in this zone,
|
||||
* but it is not certain, hence the false. The caller
|
||||
|
|
|
@ -173,7 +173,7 @@ void dump_mm(const struct mm_struct *mm)
|
|||
"get_unmapped_area %p\n"
|
||||
#endif
|
||||
"mmap_base %lu mmap_legacy_base %lu highest_vm_end %lu\n"
|
||||
"pgd %p mm_users %d mm_count %d nr_ptes %lu map_count %d\n"
|
||||
"pgd %p mm_users %d mm_count %d nr_ptes %lu nr_pmds %lu map_count %d\n"
|
||||
"hiwater_rss %lx hiwater_vm %lx total_vm %lx locked_vm %lx\n"
|
||||
"pinned_vm %lx shared_vm %lx exec_vm %lx stack_vm %lx\n"
|
||||
"start_code %lx end_code %lx start_data %lx end_data %lx\n"
|
||||
|
@ -206,6 +206,7 @@ void dump_mm(const struct mm_struct *mm)
|
|||
mm->pgd, atomic_read(&mm->mm_users),
|
||||
atomic_read(&mm->mm_count),
|
||||
atomic_long_read((atomic_long_t *)&mm->nr_ptes),
|
||||
mm_nr_pmds((struct mm_struct *)mm),
|
||||
mm->map_count,
|
||||
mm->hiwater_rss, mm->hiwater_vm, mm->total_vm, mm->locked_vm,
|
||||
mm->pinned_vm, mm->shared_vm, mm->exec_vm, mm->stack_vm,
|
||||
|
|
228
mm/gup.c
228
mm/gup.c
|
@ -167,10 +167,10 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
|
|||
if (pud_none(*pud))
|
||||
return no_page_table(vma, flags);
|
||||
if (pud_huge(*pud) && vma->vm_flags & VM_HUGETLB) {
|
||||
if (flags & FOLL_GET)
|
||||
return NULL;
|
||||
page = follow_huge_pud(mm, address, pud, flags & FOLL_WRITE);
|
||||
return page;
|
||||
page = follow_huge_pud(mm, address, pud, flags);
|
||||
if (page)
|
||||
return page;
|
||||
return no_page_table(vma, flags);
|
||||
}
|
||||
if (unlikely(pud_bad(*pud)))
|
||||
return no_page_table(vma, flags);
|
||||
|
@ -179,19 +179,10 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
|
|||
if (pmd_none(*pmd))
|
||||
return no_page_table(vma, flags);
|
||||
if (pmd_huge(*pmd) && vma->vm_flags & VM_HUGETLB) {
|
||||
page = follow_huge_pmd(mm, address, pmd, flags & FOLL_WRITE);
|
||||
if (flags & FOLL_GET) {
|
||||
/*
|
||||
* Refcount on tail pages are not well-defined and
|
||||
* shouldn't be taken. The caller should handle a NULL
|
||||
* return when trying to follow tail pages.
|
||||
*/
|
||||
if (PageHead(page))
|
||||
get_page(page);
|
||||
else
|
||||
page = NULL;
|
||||
}
|
||||
return page;
|
||||
page = follow_huge_pmd(mm, address, pmd, flags);
|
||||
if (page)
|
||||
return page;
|
||||
return no_page_table(vma, flags);
|
||||
}
|
||||
if ((flags & FOLL_NUMA) && pmd_numa(*pmd))
|
||||
return no_page_table(vma, flags);
|
||||
|
@ -584,6 +575,185 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
|
|||
return 0;
|
||||
}
|
||||
|
||||
static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
|
||||
struct mm_struct *mm,
|
||||
unsigned long start,
|
||||
unsigned long nr_pages,
|
||||
int write, int force,
|
||||
struct page **pages,
|
||||
struct vm_area_struct **vmas,
|
||||
int *locked, bool notify_drop,
|
||||
unsigned int flags)
|
||||
{
|
||||
long ret, pages_done;
|
||||
bool lock_dropped;
|
||||
|
||||
if (locked) {
|
||||
/* if VM_FAULT_RETRY can be returned, vmas become invalid */
|
||||
BUG_ON(vmas);
|
||||
/* check caller initialized locked */
|
||||
BUG_ON(*locked != 1);
|
||||
}
|
||||
|
||||
if (pages)
|
||||
flags |= FOLL_GET;
|
||||
if (write)
|
||||
flags |= FOLL_WRITE;
|
||||
if (force)
|
||||
flags |= FOLL_FORCE;
|
||||
|
||||
pages_done = 0;
|
||||
lock_dropped = false;
|
||||
for (;;) {
|
||||
ret = __get_user_pages(tsk, mm, start, nr_pages, flags, pages,
|
||||
vmas, locked);
|
||||
if (!locked)
|
||||
/* VM_FAULT_RETRY couldn't trigger, bypass */
|
||||
return ret;
|
||||
|
||||
/* VM_FAULT_RETRY cannot return errors */
|
||||
if (!*locked) {
|
||||
BUG_ON(ret < 0);
|
||||
BUG_ON(ret >= nr_pages);
|
||||
}
|
||||
|
||||
if (!pages)
|
||||
/* If it's a prefault don't insist harder */
|
||||
return ret;
|
||||
|
||||
if (ret > 0) {
|
||||
nr_pages -= ret;
|
||||
pages_done += ret;
|
||||
if (!nr_pages)
|
||||
break;
|
||||
}
|
||||
if (*locked) {
|
||||
/* VM_FAULT_RETRY didn't trigger */
|
||||
if (!pages_done)
|
||||
pages_done = ret;
|
||||
break;
|
||||
}
|
||||
/* VM_FAULT_RETRY triggered, so seek to the faulting offset */
|
||||
pages += ret;
|
||||
start += ret << PAGE_SHIFT;
|
||||
|
||||
/*
|
||||
* Repeat on the address that fired VM_FAULT_RETRY
|
||||
* without FAULT_FLAG_ALLOW_RETRY but with
|
||||
* FAULT_FLAG_TRIED.
|
||||
*/
|
||||
*locked = 1;
|
||||
lock_dropped = true;
|
||||
down_read(&mm->mmap_sem);
|
||||
ret = __get_user_pages(tsk, mm, start, 1, flags | FOLL_TRIED,
|
||||
pages, NULL, NULL);
|
||||
if (ret != 1) {
|
||||
BUG_ON(ret > 1);
|
||||
if (!pages_done)
|
||||
pages_done = ret;
|
||||
break;
|
||||
}
|
||||
nr_pages--;
|
||||
pages_done++;
|
||||
if (!nr_pages)
|
||||
break;
|
||||
pages++;
|
||||
start += PAGE_SIZE;
|
||||
}
|
||||
if (notify_drop && lock_dropped && *locked) {
|
||||
/*
|
||||
* We must let the caller know we temporarily dropped the lock
|
||||
* and so the critical section protected by it was lost.
|
||||
*/
|
||||
up_read(&mm->mmap_sem);
|
||||
*locked = 0;
|
||||
}
|
||||
return pages_done;
|
||||
}
|
||||
|
||||
/*
|
||||
* We can leverage the VM_FAULT_RETRY functionality in the page fault
|
||||
* paths better by using either get_user_pages_locked() or
|
||||
* get_user_pages_unlocked().
|
||||
*
|
||||
* get_user_pages_locked() is suitable to replace the form:
|
||||
*
|
||||
* down_read(&mm->mmap_sem);
|
||||
* do_something()
|
||||
* get_user_pages(tsk, mm, ..., pages, NULL);
|
||||
* up_read(&mm->mmap_sem);
|
||||
*
|
||||
* to:
|
||||
*
|
||||
* int locked = 1;
|
||||
* down_read(&mm->mmap_sem);
|
||||
* do_something()
|
||||
* get_user_pages_locked(tsk, mm, ..., pages, &locked);
|
||||
* if (locked)
|
||||
* up_read(&mm->mmap_sem);
|
||||
*/
|
||||
long get_user_pages_locked(struct task_struct *tsk, struct mm_struct *mm,
|
||||
unsigned long start, unsigned long nr_pages,
|
||||
int write, int force, struct page **pages,
|
||||
int *locked)
|
||||
{
|
||||
return __get_user_pages_locked(tsk, mm, start, nr_pages, write, force,
|
||||
pages, NULL, locked, true, FOLL_TOUCH);
|
||||
}
|
||||
EXPORT_SYMBOL(get_user_pages_locked);
|
||||
|
||||
/*
|
||||
* Same as get_user_pages_unlocked(...., FOLL_TOUCH) but it allows to
|
||||
* pass additional gup_flags as last parameter (like FOLL_HWPOISON).
|
||||
*
|
||||
* NOTE: here FOLL_TOUCH is not set implicitly and must be set by the
|
||||
* caller if required (just like with __get_user_pages). "FOLL_GET",
|
||||
* "FOLL_WRITE" and "FOLL_FORCE" are set implicitly as needed
|
||||
* according to the parameters "pages", "write", "force"
|
||||
* respectively.
|
||||
*/
|
||||
__always_inline long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
|
||||
unsigned long start, unsigned long nr_pages,
|
||||
int write, int force, struct page **pages,
|
||||
unsigned int gup_flags)
|
||||
{
|
||||
long ret;
|
||||
int locked = 1;
|
||||
down_read(&mm->mmap_sem);
|
||||
ret = __get_user_pages_locked(tsk, mm, start, nr_pages, write, force,
|
||||
pages, NULL, &locked, false, gup_flags);
|
||||
if (locked)
|
||||
up_read(&mm->mmap_sem);
|
||||
return ret;
|
||||
}
|
||||
EXPORT_SYMBOL(__get_user_pages_unlocked);
|
||||
|
||||
/*
|
||||
* get_user_pages_unlocked() is suitable to replace the form:
|
||||
*
|
||||
* down_read(&mm->mmap_sem);
|
||||
* get_user_pages(tsk, mm, ..., pages, NULL);
|
||||
* up_read(&mm->mmap_sem);
|
||||
*
|
||||
* with:
|
||||
*
|
||||
* get_user_pages_unlocked(tsk, mm, ..., pages);
|
||||
*
|
||||
* It is functionally equivalent to get_user_pages_fast so
|
||||
* get_user_pages_fast should be used instead, if the two parameters
|
||||
* "tsk" and "mm" are respectively equal to current and current->mm,
|
||||
* or if "force" shall be set to 1 (get_user_pages_fast misses the
|
||||
* "force" parameter).
|
||||
*/
|
||||
long get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
|
||||
unsigned long start, unsigned long nr_pages,
|
||||
int write, int force, struct page **pages)
|
||||
{
|
||||
return __get_user_pages_unlocked(tsk, mm, start, nr_pages, write,
|
||||
force, pages, FOLL_TOUCH);
|
||||
}
|
||||
EXPORT_SYMBOL(get_user_pages_unlocked);
|
||||
|
||||
/*
|
||||
* get_user_pages() - pin user pages in memory
|
||||
* @tsk: the task_struct to use for page fault accounting, or
|
||||
|
@ -633,22 +803,18 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
|
|||
* use the correct cache flushing APIs.
|
||||
*
|
||||
* See also get_user_pages_fast, for performance critical applications.
|
||||
*
|
||||
* get_user_pages should be phased out in favor of
|
||||
* get_user_pages_locked|unlocked or get_user_pages_fast. Nothing
|
||||
* should use get_user_pages because it cannot pass
|
||||
* FAULT_FLAG_ALLOW_RETRY to handle_mm_fault.
|
||||
*/
|
||||
long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
|
||||
unsigned long start, unsigned long nr_pages, int write,
|
||||
int force, struct page **pages, struct vm_area_struct **vmas)
|
||||
{
|
||||
int flags = FOLL_TOUCH;
|
||||
|
||||
if (pages)
|
||||
flags |= FOLL_GET;
|
||||
if (write)
|
||||
flags |= FOLL_WRITE;
|
||||
if (force)
|
||||
flags |= FOLL_FORCE;
|
||||
|
||||
return __get_user_pages(tsk, mm, start, nr_pages, flags, pages, vmas,
|
||||
NULL);
|
||||
return __get_user_pages_locked(tsk, mm, start, nr_pages, write, force,
|
||||
pages, vmas, NULL, false, FOLL_TOUCH);
|
||||
}
|
||||
EXPORT_SYMBOL(get_user_pages);
|
||||
|
||||
|
@ -1077,10 +1243,8 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
|
|||
start += nr << PAGE_SHIFT;
|
||||
pages += nr;
|
||||
|
||||
down_read(&mm->mmap_sem);
|
||||
ret = get_user_pages(current, mm, start,
|
||||
nr_pages - nr, write, 0, pages, NULL);
|
||||
up_read(&mm->mmap_sem);
|
||||
ret = get_user_pages_unlocked(current, mm, start,
|
||||
nr_pages - nr, write, 0, pages);
|
||||
|
||||
/* Have to be a bit careful with return values */
|
||||
if (nr > 0) {
|
||||
|
|
106
mm/huge_memory.c
106
mm/huge_memory.c
|
@ -171,12 +171,7 @@ static int start_khugepaged(void)
|
|||
}
|
||||
|
||||
static atomic_t huge_zero_refcount;
|
||||
static struct page *huge_zero_page __read_mostly;
|
||||
|
||||
static inline bool is_huge_zero_page(struct page *page)
|
||||
{
|
||||
return ACCESS_ONCE(huge_zero_page) == page;
|
||||
}
|
||||
struct page *huge_zero_page __read_mostly;
|
||||
|
||||
static inline bool is_huge_zero_pmd(pmd_t pmd)
|
||||
{
|
||||
|
@ -766,15 +761,6 @@ static inline gfp_t alloc_hugepage_gfpmask(int defrag, gfp_t extra_gfp)
|
|||
return (GFP_TRANSHUGE & ~(defrag ? 0 : __GFP_WAIT)) | extra_gfp;
|
||||
}
|
||||
|
||||
static inline struct page *alloc_hugepage_vma(int defrag,
|
||||
struct vm_area_struct *vma,
|
||||
unsigned long haddr, int nd,
|
||||
gfp_t extra_gfp)
|
||||
{
|
||||
return alloc_pages_vma(alloc_hugepage_gfpmask(defrag, extra_gfp),
|
||||
HPAGE_PMD_ORDER, vma, haddr, nd);
|
||||
}
|
||||
|
||||
/* Caller must hold page table lock. */
|
||||
static bool set_huge_zero_page(pgtable_t pgtable, struct mm_struct *mm,
|
||||
struct vm_area_struct *vma, unsigned long haddr, pmd_t *pmd,
|
||||
|
@ -795,6 +781,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
|
|||
unsigned long address, pmd_t *pmd,
|
||||
unsigned int flags)
|
||||
{
|
||||
gfp_t gfp;
|
||||
struct page *page;
|
||||
unsigned long haddr = address & HPAGE_PMD_MASK;
|
||||
|
||||
|
@ -829,8 +816,8 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
|
|||
}
|
||||
return 0;
|
||||
}
|
||||
page = alloc_hugepage_vma(transparent_hugepage_defrag(vma),
|
||||
vma, haddr, numa_node_id(), 0);
|
||||
gfp = alloc_hugepage_gfpmask(transparent_hugepage_defrag(vma), 0);
|
||||
page = alloc_hugepage_vma(gfp, vma, haddr, HPAGE_PMD_ORDER);
|
||||
if (unlikely(!page)) {
|
||||
count_vm_event(THP_FAULT_FALLBACK);
|
||||
return VM_FAULT_FALLBACK;
|
||||
|
@ -1118,10 +1105,12 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
|
|||
spin_unlock(ptl);
|
||||
alloc:
|
||||
if (transparent_hugepage_enabled(vma) &&
|
||||
!transparent_hugepage_debug_cow())
|
||||
new_page = alloc_hugepage_vma(transparent_hugepage_defrag(vma),
|
||||
vma, haddr, numa_node_id(), 0);
|
||||
else
|
||||
!transparent_hugepage_debug_cow()) {
|
||||
gfp_t gfp;
|
||||
|
||||
gfp = alloc_hugepage_gfpmask(transparent_hugepage_defrag(vma), 0);
|
||||
new_page = alloc_hugepage_vma(gfp, vma, haddr, HPAGE_PMD_ORDER);
|
||||
} else
|
||||
new_page = NULL;
|
||||
|
||||
if (unlikely(!new_page)) {
|
||||
|
@ -1423,26 +1412,6 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
|
|||
return ret;
|
||||
}
|
||||
|
||||
int mincore_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
|
||||
unsigned long addr, unsigned long end,
|
||||
unsigned char *vec)
|
||||
{
|
||||
spinlock_t *ptl;
|
||||
int ret = 0;
|
||||
|
||||
if (__pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
|
||||
/*
|
||||
* All logical pages in the range are present
|
||||
* if backed by a huge page.
|
||||
*/
|
||||
spin_unlock(ptl);
|
||||
memset(vec, 1, (end - addr) >> PAGE_SHIFT);
|
||||
ret = 1;
|
||||
}
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
int move_huge_pmd(struct vm_area_struct *vma, struct vm_area_struct *new_vma,
|
||||
unsigned long old_addr,
|
||||
unsigned long new_addr, unsigned long old_end,
|
||||
|
@ -2148,7 +2117,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
|
|||
{
|
||||
struct page *page;
|
||||
pte_t *_pte;
|
||||
int referenced = 0, none = 0;
|
||||
int none = 0;
|
||||
bool referenced = false, writable = false;
|
||||
for (_pte = pte; _pte < pte+HPAGE_PMD_NR;
|
||||
_pte++, address += PAGE_SIZE) {
|
||||
pte_t pteval = *_pte;
|
||||
|
@ -2158,7 +2128,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
|
|||
else
|
||||
goto out;
|
||||
}
|
||||
if (!pte_present(pteval) || !pte_write(pteval))
|
||||
if (!pte_present(pteval))
|
||||
goto out;
|
||||
page = vm_normal_page(vma, address, pteval);
|
||||
if (unlikely(!page))
|
||||
|
@ -2168,9 +2138,6 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
|
|||
VM_BUG_ON_PAGE(!PageAnon(page), page);
|
||||
VM_BUG_ON_PAGE(!PageSwapBacked(page), page);
|
||||
|
||||
/* cannot use mapcount: can't collapse if there's a gup pin */
|
||||
if (page_count(page) != 1)
|
||||
goto out;
|
||||
/*
|
||||
* We can do it before isolate_lru_page because the
|
||||
* page can't be freed from under us. NOTE: PG_lock
|
||||
|
@ -2179,6 +2146,29 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
|
|||
*/
|
||||
if (!trylock_page(page))
|
||||
goto out;
|
||||
|
||||
/*
|
||||
* cannot use mapcount: can't collapse if there's a gup pin.
|
||||
* The page must only be referenced by the scanned process
|
||||
* and page swap cache.
|
||||
*/
|
||||
if (page_count(page) != 1 + !!PageSwapCache(page)) {
|
||||
unlock_page(page);
|
||||
goto out;
|
||||
}
|
||||
if (pte_write(pteval)) {
|
||||
writable = true;
|
||||
} else {
|
||||
if (PageSwapCache(page) && !reuse_swap_page(page)) {
|
||||
unlock_page(page);
|
||||
goto out;
|
||||
}
|
||||
/*
|
||||
* Page is not in the swap cache. It can be collapsed
|
||||
* into a THP.
|
||||
*/
|
||||
}
|
||||
|
||||
/*
|
||||
* Isolate the page to avoid collapsing an hugepage
|
||||
* currently in use by the VM.
|
||||
|
@ -2195,9 +2185,9 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
|
|||
/* If there is no mapped pte young don't collapse the page */
|
||||
if (pte_young(pteval) || PageReferenced(page) ||
|
||||
mmu_notifier_test_young(vma->vm_mm, address))
|
||||
referenced = 1;
|
||||
referenced = true;
|
||||
}
|
||||
if (likely(referenced))
|
||||
if (likely(referenced && writable))
|
||||
return 1;
|
||||
out:
|
||||
release_pte_pages(pte, _pte);
|
||||
|
@ -2550,11 +2540,12 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
|
|||
{
|
||||
pmd_t *pmd;
|
||||
pte_t *pte, *_pte;
|
||||
int ret = 0, referenced = 0, none = 0;
|
||||
int ret = 0, none = 0;
|
||||
struct page *page;
|
||||
unsigned long _address;
|
||||
spinlock_t *ptl;
|
||||
int node = NUMA_NO_NODE;
|
||||
bool writable = false, referenced = false;
|
||||
|
||||
VM_BUG_ON(address & ~HPAGE_PMD_MASK);
|
||||
|
||||
|
@ -2573,8 +2564,11 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
|
|||
else
|
||||
goto out_unmap;
|
||||
}
|
||||
if (!pte_present(pteval) || !pte_write(pteval))
|
||||
if (!pte_present(pteval))
|
||||
goto out_unmap;
|
||||
if (pte_write(pteval))
|
||||
writable = true;
|
||||
|
||||
page = vm_normal_page(vma, _address, pteval);
|
||||
if (unlikely(!page))
|
||||
goto out_unmap;
|
||||
|
@ -2591,14 +2585,18 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
|
|||
VM_BUG_ON_PAGE(PageCompound(page), page);
|
||||
if (!PageLRU(page) || PageLocked(page) || !PageAnon(page))
|
||||
goto out_unmap;
|
||||
/* cannot use mapcount: can't collapse if there's a gup pin */
|
||||
if (page_count(page) != 1)
|
||||
/*
|
||||
* cannot use mapcount: can't collapse if there's a gup pin.
|
||||
* The page must only be referenced by the scanned process
|
||||
* and page swap cache.
|
||||
*/
|
||||
if (page_count(page) != 1 + !!PageSwapCache(page))
|
||||
goto out_unmap;
|
||||
if (pte_young(pteval) || PageReferenced(page) ||
|
||||
mmu_notifier_test_young(vma->vm_mm, address))
|
||||
referenced = 1;
|
||||
referenced = true;
|
||||
}
|
||||
if (referenced)
|
||||
if (referenced && writable)
|
||||
ret = 1;
|
||||
out_unmap:
|
||||
pte_unmap_unlock(pte, ptl);
|
||||
|
|
168
mm/hugetlb.c
168
mm/hugetlb.c
|
@ -2657,9 +2657,10 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
|
|||
goto unlock;
|
||||
|
||||
/*
|
||||
* HWPoisoned hugepage is already unmapped and dropped reference
|
||||
* Migrating hugepage or HWPoisoned hugepage is already
|
||||
* unmapped and its refcount is dropped, so just clear pte here.
|
||||
*/
|
||||
if (unlikely(is_hugetlb_entry_hwpoisoned(pte))) {
|
||||
if (unlikely(!pte_present(pte))) {
|
||||
huge_pte_clear(mm, address, ptep);
|
||||
goto unlock;
|
||||
}
|
||||
|
@ -3134,6 +3135,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
|
|||
struct page *pagecache_page = NULL;
|
||||
struct hstate *h = hstate_vma(vma);
|
||||
struct address_space *mapping;
|
||||
int need_wait_lock = 0;
|
||||
|
||||
address &= huge_page_mask(h);
|
||||
|
||||
|
@ -3171,6 +3173,16 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
|
|||
|
||||
ret = 0;
|
||||
|
||||
/*
|
||||
* entry could be a migration/hwpoison entry at this point, so this
|
||||
* check prevents the kernel from going below assuming that we have
|
||||
* a active hugepage in pagecache. This goto expects the 2nd page fault,
|
||||
* and is_hugetlb_entry_(migration|hwpoisoned) check will properly
|
||||
* handle it.
|
||||
*/
|
||||
if (!pte_present(entry))
|
||||
goto out_mutex;
|
||||
|
||||
/*
|
||||
* If we are going to COW the mapping later, we examine the pending
|
||||
* reservations for this page now. This will ensure that any
|
||||
|
@ -3190,30 +3202,31 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
|
|||
vma, address);
|
||||
}
|
||||
|
||||
/*
|
||||
* hugetlb_cow() requires page locks of pte_page(entry) and
|
||||
* pagecache_page, so here we need take the former one
|
||||
* when page != pagecache_page or !pagecache_page.
|
||||
* Note that locking order is always pagecache_page -> page,
|
||||
* so no worry about deadlock.
|
||||
*/
|
||||
page = pte_page(entry);
|
||||
get_page(page);
|
||||
if (page != pagecache_page)
|
||||
lock_page(page);
|
||||
ptl = huge_pte_lock(h, mm, ptep);
|
||||
|
||||
ptl = huge_pte_lockptr(h, mm, ptep);
|
||||
spin_lock(ptl);
|
||||
/* Check for a racing update before calling hugetlb_cow */
|
||||
if (unlikely(!pte_same(entry, huge_ptep_get(ptep))))
|
||||
goto out_ptl;
|
||||
|
||||
/*
|
||||
* hugetlb_cow() requires page locks of pte_page(entry) and
|
||||
* pagecache_page, so here we need take the former one
|
||||
* when page != pagecache_page or !pagecache_page.
|
||||
*/
|
||||
page = pte_page(entry);
|
||||
if (page != pagecache_page)
|
||||
if (!trylock_page(page)) {
|
||||
need_wait_lock = 1;
|
||||
goto out_ptl;
|
||||
}
|
||||
|
||||
get_page(page);
|
||||
|
||||
if (flags & FAULT_FLAG_WRITE) {
|
||||
if (!huge_pte_write(entry)) {
|
||||
ret = hugetlb_cow(mm, vma, address, ptep, entry,
|
||||
pagecache_page, ptl);
|
||||
goto out_ptl;
|
||||
goto out_put_page;
|
||||
}
|
||||
entry = huge_pte_mkdirty(entry);
|
||||
}
|
||||
|
@ -3221,7 +3234,10 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
|
|||
if (huge_ptep_set_access_flags(vma, address, ptep, entry,
|
||||
flags & FAULT_FLAG_WRITE))
|
||||
update_mmu_cache(vma, address, ptep);
|
||||
|
||||
out_put_page:
|
||||
if (page != pagecache_page)
|
||||
unlock_page(page);
|
||||
put_page(page);
|
||||
out_ptl:
|
||||
spin_unlock(ptl);
|
||||
|
||||
|
@ -3229,12 +3245,17 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
|
|||
unlock_page(pagecache_page);
|
||||
put_page(pagecache_page);
|
||||
}
|
||||
if (page != pagecache_page)
|
||||
unlock_page(page);
|
||||
put_page(page);
|
||||
|
||||
out_mutex:
|
||||
mutex_unlock(&htlb_fault_mutex_table[hash]);
|
||||
/*
|
||||
* Generally it's safe to hold refcount during waiting page lock. But
|
||||
* here we just wait to defer the next page fault to avoid busy loop and
|
||||
* the page is not used after unlocked before returning from the current
|
||||
* page fault. So we are safe from accessing freed page, even if we wait
|
||||
* here without taking refcount.
|
||||
*/
|
||||
if (need_wait_lock)
|
||||
wait_on_page_locked(page);
|
||||
return ret;
|
||||
}
|
||||
|
||||
|
@ -3364,7 +3385,26 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
|
|||
spin_unlock(ptl);
|
||||
continue;
|
||||
}
|
||||
if (!huge_pte_none(huge_ptep_get(ptep))) {
|
||||
pte = huge_ptep_get(ptep);
|
||||
if (unlikely(is_hugetlb_entry_hwpoisoned(pte))) {
|
||||
spin_unlock(ptl);
|
||||
continue;
|
||||
}
|
||||
if (unlikely(is_hugetlb_entry_migration(pte))) {
|
||||
swp_entry_t entry = pte_to_swp_entry(pte);
|
||||
|
||||
if (is_write_migration_entry(entry)) {
|
||||
pte_t newpte;
|
||||
|
||||
make_migration_entry_read(&entry);
|
||||
newpte = swp_entry_to_pte(entry);
|
||||
set_huge_pte_at(mm, address, ptep, newpte);
|
||||
pages++;
|
||||
}
|
||||
spin_unlock(ptl);
|
||||
continue;
|
||||
}
|
||||
if (!huge_pte_none(pte)) {
|
||||
pte = huge_ptep_get_and_clear(mm, address, ptep);
|
||||
pte = pte_mkhuge(huge_pte_modify(pte, newprot));
|
||||
pte = arch_make_huge_pte(pte, vma, NULL, 0);
|
||||
|
@ -3558,6 +3598,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
|
|||
if (saddr) {
|
||||
spte = huge_pte_offset(svma->vm_mm, saddr);
|
||||
if (spte) {
|
||||
mm_inc_nr_pmds(mm);
|
||||
get_page(virt_to_page(spte));
|
||||
break;
|
||||
}
|
||||
|
@ -3569,11 +3610,13 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
|
|||
|
||||
ptl = huge_pte_lockptr(hstate_vma(vma), mm, spte);
|
||||
spin_lock(ptl);
|
||||
if (pud_none(*pud))
|
||||
if (pud_none(*pud)) {
|
||||
pud_populate(mm, pud,
|
||||
(pmd_t *)((unsigned long)spte & PAGE_MASK));
|
||||
else
|
||||
} else {
|
||||
put_page(virt_to_page(spte));
|
||||
mm_inc_nr_pmds(mm);
|
||||
}
|
||||
spin_unlock(ptl);
|
||||
out:
|
||||
pte = (pte_t *)pmd_alloc(mm, pud, addr);
|
||||
|
@ -3604,6 +3647,7 @@ int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep)
|
|||
|
||||
pud_clear(pud);
|
||||
put_page(virt_to_page(ptep));
|
||||
mm_dec_nr_pmds(mm);
|
||||
*addr = ALIGN(*addr, HPAGE_SIZE * PTRS_PER_PTE) - HPAGE_SIZE;
|
||||
return 1;
|
||||
}
|
||||
|
@ -3660,42 +3704,64 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
|
|||
return (pte_t *) pmd;
|
||||
}
|
||||
|
||||
struct page *
|
||||
#endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */
|
||||
|
||||
/*
|
||||
* These functions are overwritable if your architecture needs its own
|
||||
* behavior.
|
||||
*/
|
||||
struct page * __weak
|
||||
follow_huge_addr(struct mm_struct *mm, unsigned long address,
|
||||
int write)
|
||||
{
|
||||
return ERR_PTR(-EINVAL);
|
||||
}
|
||||
|
||||
struct page * __weak
|
||||
follow_huge_pmd(struct mm_struct *mm, unsigned long address,
|
||||
pmd_t *pmd, int write)
|
||||
pmd_t *pmd, int flags)
|
||||
{
|
||||
struct page *page;
|
||||
|
||||
page = pte_page(*(pte_t *)pmd);
|
||||
if (page)
|
||||
page += ((address & ~PMD_MASK) >> PAGE_SHIFT);
|
||||
struct page *page = NULL;
|
||||
spinlock_t *ptl;
|
||||
retry:
|
||||
ptl = pmd_lockptr(mm, pmd);
|
||||
spin_lock(ptl);
|
||||
/*
|
||||
* make sure that the address range covered by this pmd is not
|
||||
* unmapped from other threads.
|
||||
*/
|
||||
if (!pmd_huge(*pmd))
|
||||
goto out;
|
||||
if (pmd_present(*pmd)) {
|
||||
page = pte_page(*(pte_t *)pmd) +
|
||||
((address & ~PMD_MASK) >> PAGE_SHIFT);
|
||||
if (flags & FOLL_GET)
|
||||
get_page(page);
|
||||
} else {
|
||||
if (is_hugetlb_entry_migration(huge_ptep_get((pte_t *)pmd))) {
|
||||
spin_unlock(ptl);
|
||||
__migration_entry_wait(mm, (pte_t *)pmd, ptl);
|
||||
goto retry;
|
||||
}
|
||||
/*
|
||||
* hwpoisoned entry is treated as no_page_table in
|
||||
* follow_page_mask().
|
||||
*/
|
||||
}
|
||||
out:
|
||||
spin_unlock(ptl);
|
||||
return page;
|
||||
}
|
||||
|
||||
struct page *
|
||||
follow_huge_pud(struct mm_struct *mm, unsigned long address,
|
||||
pud_t *pud, int write)
|
||||
{
|
||||
struct page *page;
|
||||
|
||||
page = pte_page(*(pte_t *)pud);
|
||||
if (page)
|
||||
page += ((address & ~PUD_MASK) >> PAGE_SHIFT);
|
||||
return page;
|
||||
}
|
||||
|
||||
#else /* !CONFIG_ARCH_WANT_GENERAL_HUGETLB */
|
||||
|
||||
/* Can be overriden by architectures */
|
||||
struct page * __weak
|
||||
follow_huge_pud(struct mm_struct *mm, unsigned long address,
|
||||
pud_t *pud, int write)
|
||||
pud_t *pud, int flags)
|
||||
{
|
||||
BUG();
|
||||
return NULL;
|
||||
}
|
||||
if (flags & FOLL_GET)
|
||||
return NULL;
|
||||
|
||||
#endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */
|
||||
return pte_page(*(pte_t *)pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
|
||||
}
|
||||
|
||||
#ifdef CONFIG_MEMORY_FAILURE
|
||||
|
||||
|
|
|
@ -279,7 +279,7 @@ static ssize_t hugetlb_cgroup_write(struct kernfs_open_file *of,
|
|||
return -EINVAL;
|
||||
|
||||
buf = strstrip(buf);
|
||||
ret = page_counter_memparse(buf, &nr_pages);
|
||||
ret = page_counter_memparse(buf, "-1", &nr_pages);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
|
|
|
@ -109,6 +109,28 @@ extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address);
|
|||
* in mm/page_alloc.c
|
||||
*/
|
||||
|
||||
/*
|
||||
* Structure for holding the mostly immutable allocation parameters passed
|
||||
* between functions involved in allocations, including the alloc_pages*
|
||||
* family of functions.
|
||||
*
|
||||
* nodemask, migratetype and high_zoneidx are initialized only once in
|
||||
* __alloc_pages_nodemask() and then never change.
|
||||
*
|
||||
* zonelist, preferred_zone and classzone_idx are set first in
|
||||
* __alloc_pages_nodemask() for the fast path, and might be later changed
|
||||
* in __alloc_pages_slowpath(). All other functions pass the whole strucure
|
||||
* by a const pointer.
|
||||
*/
|
||||
struct alloc_context {
|
||||
struct zonelist *zonelist;
|
||||
nodemask_t *nodemask;
|
||||
struct zone *preferred_zone;
|
||||
int classzone_idx;
|
||||
int migratetype;
|
||||
enum zone_type high_zoneidx;
|
||||
};
|
||||
|
||||
/*
|
||||
* Locate the struct page for both the matching buddy in our
|
||||
* pair (buddy1) and the combined O(n+1) page they form (page).
|
||||
|
|
716
mm/memcontrol.c
716
mm/memcontrol.c
File diff suppressed because it is too large
Load Diff
15
mm/memory.c
15
mm/memory.c
|
@ -428,6 +428,7 @@ static inline void free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
|
|||
pmd = pmd_offset(pud, start);
|
||||
pud_clear(pud);
|
||||
pmd_free_tlb(tlb, pmd, start);
|
||||
mm_dec_nr_pmds(tlb->mm);
|
||||
}
|
||||
|
||||
static inline void free_pud_range(struct mmu_gather *tlb, pgd_t *pgd,
|
||||
|
@ -3322,15 +3323,17 @@ int __pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address)
|
|||
|
||||
spin_lock(&mm->page_table_lock);
|
||||
#ifndef __ARCH_HAS_4LEVEL_HACK
|
||||
if (pud_present(*pud)) /* Another has populated it */
|
||||
pmd_free(mm, new);
|
||||
else
|
||||
if (!pud_present(*pud)) {
|
||||
mm_inc_nr_pmds(mm);
|
||||
pud_populate(mm, pud, new);
|
||||
#else
|
||||
if (pgd_present(*pud)) /* Another has populated it */
|
||||
} else /* Another has populated it */
|
||||
pmd_free(mm, new);
|
||||
else
|
||||
#else
|
||||
if (!pgd_present(*pud)) {
|
||||
mm_inc_nr_pmds(mm);
|
||||
pgd_populate(mm, pud, new);
|
||||
} else /* Another has populated it */
|
||||
pmd_free(mm, new);
|
||||
#endif /* __ARCH_HAS_4LEVEL_HACK */
|
||||
spin_unlock(&mm->page_table_lock);
|
||||
return 0;
|
||||
|
|
273
mm/mempolicy.c
273
mm/mempolicy.c
|
@ -471,24 +471,34 @@ static const struct mempolicy_operations mpol_ops[MPOL_MAX] = {
|
|||
static void migrate_page_add(struct page *page, struct list_head *pagelist,
|
||||
unsigned long flags);
|
||||
|
||||
struct queue_pages {
|
||||
struct list_head *pagelist;
|
||||
unsigned long flags;
|
||||
nodemask_t *nmask;
|
||||
struct vm_area_struct *prev;
|
||||
};
|
||||
|
||||
/*
|
||||
* Scan through pages checking if pages follow certain conditions,
|
||||
* and move them to the pagelist if they do.
|
||||
*/
|
||||
static int queue_pages_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
|
||||
unsigned long addr, unsigned long end,
|
||||
const nodemask_t *nodes, unsigned long flags,
|
||||
void *private)
|
||||
static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
|
||||
unsigned long end, struct mm_walk *walk)
|
||||
{
|
||||
pte_t *orig_pte;
|
||||
struct vm_area_struct *vma = walk->vma;
|
||||
struct page *page;
|
||||
struct queue_pages *qp = walk->private;
|
||||
unsigned long flags = qp->flags;
|
||||
int nid;
|
||||
pte_t *pte;
|
||||
spinlock_t *ptl;
|
||||
|
||||
orig_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
|
||||
do {
|
||||
struct page *page;
|
||||
int nid;
|
||||
split_huge_page_pmd(vma, addr, pmd);
|
||||
if (pmd_trans_unstable(pmd))
|
||||
return 0;
|
||||
|
||||
pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
|
||||
for (; addr != end; pte++, addr += PAGE_SIZE) {
|
||||
if (!pte_present(*pte))
|
||||
continue;
|
||||
page = vm_normal_page(vma, addr, *pte);
|
||||
|
@ -501,114 +511,46 @@ static int queue_pages_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
|
|||
if (PageReserved(page))
|
||||
continue;
|
||||
nid = page_to_nid(page);
|
||||
if (node_isset(nid, *nodes) == !!(flags & MPOL_MF_INVERT))
|
||||
if (node_isset(nid, *qp->nmask) == !!(flags & MPOL_MF_INVERT))
|
||||
continue;
|
||||
|
||||
if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))
|
||||
migrate_page_add(page, private, flags);
|
||||
else
|
||||
break;
|
||||
} while (pte++, addr += PAGE_SIZE, addr != end);
|
||||
pte_unmap_unlock(orig_pte, ptl);
|
||||
return addr != end;
|
||||
migrate_page_add(page, qp->pagelist, flags);
|
||||
}
|
||||
pte_unmap_unlock(pte - 1, ptl);
|
||||
cond_resched();
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void queue_pages_hugetlb_pmd_range(struct vm_area_struct *vma,
|
||||
pmd_t *pmd, const nodemask_t *nodes, unsigned long flags,
|
||||
void *private)
|
||||
static int queue_pages_hugetlb(pte_t *pte, unsigned long hmask,
|
||||
unsigned long addr, unsigned long end,
|
||||
struct mm_walk *walk)
|
||||
{
|
||||
#ifdef CONFIG_HUGETLB_PAGE
|
||||
struct queue_pages *qp = walk->private;
|
||||
unsigned long flags = qp->flags;
|
||||
int nid;
|
||||
struct page *page;
|
||||
spinlock_t *ptl;
|
||||
pte_t entry;
|
||||
|
||||
ptl = huge_pte_lock(hstate_vma(vma), vma->vm_mm, (pte_t *)pmd);
|
||||
entry = huge_ptep_get((pte_t *)pmd);
|
||||
ptl = huge_pte_lock(hstate_vma(walk->vma), walk->mm, pte);
|
||||
entry = huge_ptep_get(pte);
|
||||
if (!pte_present(entry))
|
||||
goto unlock;
|
||||
page = pte_page(entry);
|
||||
nid = page_to_nid(page);
|
||||
if (node_isset(nid, *nodes) == !!(flags & MPOL_MF_INVERT))
|
||||
if (node_isset(nid, *qp->nmask) == !!(flags & MPOL_MF_INVERT))
|
||||
goto unlock;
|
||||
/* With MPOL_MF_MOVE, we migrate only unshared hugepage. */
|
||||
if (flags & (MPOL_MF_MOVE_ALL) ||
|
||||
(flags & MPOL_MF_MOVE && page_mapcount(page) == 1))
|
||||
isolate_huge_page(page, private);
|
||||
isolate_huge_page(page, qp->pagelist);
|
||||
unlock:
|
||||
spin_unlock(ptl);
|
||||
#else
|
||||
BUG();
|
||||
#endif
|
||||
}
|
||||
|
||||
static inline int queue_pages_pmd_range(struct vm_area_struct *vma, pud_t *pud,
|
||||
unsigned long addr, unsigned long end,
|
||||
const nodemask_t *nodes, unsigned long flags,
|
||||
void *private)
|
||||
{
|
||||
pmd_t *pmd;
|
||||
unsigned long next;
|
||||
|
||||
pmd = pmd_offset(pud, addr);
|
||||
do {
|
||||
next = pmd_addr_end(addr, end);
|
||||
if (!pmd_present(*pmd))
|
||||
continue;
|
||||
if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) {
|
||||
queue_pages_hugetlb_pmd_range(vma, pmd, nodes,
|
||||
flags, private);
|
||||
continue;
|
||||
}
|
||||
split_huge_page_pmd(vma, addr, pmd);
|
||||
if (pmd_none_or_trans_huge_or_clear_bad(pmd))
|
||||
continue;
|
||||
if (queue_pages_pte_range(vma, pmd, addr, next, nodes,
|
||||
flags, private))
|
||||
return -EIO;
|
||||
} while (pmd++, addr = next, addr != end);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static inline int queue_pages_pud_range(struct vm_area_struct *vma, pgd_t *pgd,
|
||||
unsigned long addr, unsigned long end,
|
||||
const nodemask_t *nodes, unsigned long flags,
|
||||
void *private)
|
||||
{
|
||||
pud_t *pud;
|
||||
unsigned long next;
|
||||
|
||||
pud = pud_offset(pgd, addr);
|
||||
do {
|
||||
next = pud_addr_end(addr, end);
|
||||
if (pud_huge(*pud) && is_vm_hugetlb_page(vma))
|
||||
continue;
|
||||
if (pud_none_or_clear_bad(pud))
|
||||
continue;
|
||||
if (queue_pages_pmd_range(vma, pud, addr, next, nodes,
|
||||
flags, private))
|
||||
return -EIO;
|
||||
} while (pud++, addr = next, addr != end);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static inline int queue_pages_pgd_range(struct vm_area_struct *vma,
|
||||
unsigned long addr, unsigned long end,
|
||||
const nodemask_t *nodes, unsigned long flags,
|
||||
void *private)
|
||||
{
|
||||
pgd_t *pgd;
|
||||
unsigned long next;
|
||||
|
||||
pgd = pgd_offset(vma->vm_mm, addr);
|
||||
do {
|
||||
next = pgd_addr_end(addr, end);
|
||||
if (pgd_none_or_clear_bad(pgd))
|
||||
continue;
|
||||
if (queue_pages_pud_range(vma, pgd, addr, next, nodes,
|
||||
flags, private))
|
||||
return -EIO;
|
||||
} while (pgd++, addr = next, addr != end);
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
@ -641,6 +583,49 @@ static unsigned long change_prot_numa(struct vm_area_struct *vma,
|
|||
}
|
||||
#endif /* CONFIG_NUMA_BALANCING */
|
||||
|
||||
static int queue_pages_test_walk(unsigned long start, unsigned long end,
|
||||
struct mm_walk *walk)
|
||||
{
|
||||
struct vm_area_struct *vma = walk->vma;
|
||||
struct queue_pages *qp = walk->private;
|
||||
unsigned long endvma = vma->vm_end;
|
||||
unsigned long flags = qp->flags;
|
||||
|
||||
if (vma->vm_flags & VM_PFNMAP)
|
||||
return 1;
|
||||
|
||||
if (endvma > end)
|
||||
endvma = end;
|
||||
if (vma->vm_start > start)
|
||||
start = vma->vm_start;
|
||||
|
||||
if (!(flags & MPOL_MF_DISCONTIG_OK)) {
|
||||
if (!vma->vm_next && vma->vm_end < end)
|
||||
return -EFAULT;
|
||||
if (qp->prev && qp->prev->vm_end < vma->vm_start)
|
||||
return -EFAULT;
|
||||
}
|
||||
|
||||
qp->prev = vma;
|
||||
|
||||
if (vma->vm_flags & VM_PFNMAP)
|
||||
return 1;
|
||||
|
||||
if (flags & MPOL_MF_LAZY) {
|
||||
/* Similar to task_numa_work, skip inaccessible VMAs */
|
||||
if (vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE))
|
||||
change_prot_numa(vma, start, endvma);
|
||||
return 1;
|
||||
}
|
||||
|
||||
if ((flags & MPOL_MF_STRICT) ||
|
||||
((flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) &&
|
||||
vma_migratable(vma)))
|
||||
/* queue pages from current vma */
|
||||
return 0;
|
||||
return 1;
|
||||
}
|
||||
|
||||
/*
|
||||
* Walk through page tables and collect pages to be migrated.
|
||||
*
|
||||
|
@ -650,50 +635,24 @@ static unsigned long change_prot_numa(struct vm_area_struct *vma,
|
|||
*/
|
||||
static int
|
||||
queue_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end,
|
||||
const nodemask_t *nodes, unsigned long flags, void *private)
|
||||
nodemask_t *nodes, unsigned long flags,
|
||||
struct list_head *pagelist)
|
||||
{
|
||||
int err = 0;
|
||||
struct vm_area_struct *vma, *prev;
|
||||
struct queue_pages qp = {
|
||||
.pagelist = pagelist,
|
||||
.flags = flags,
|
||||
.nmask = nodes,
|
||||
.prev = NULL,
|
||||
};
|
||||
struct mm_walk queue_pages_walk = {
|
||||
.hugetlb_entry = queue_pages_hugetlb,
|
||||
.pmd_entry = queue_pages_pte_range,
|
||||
.test_walk = queue_pages_test_walk,
|
||||
.mm = mm,
|
||||
.private = &qp,
|
||||
};
|
||||
|
||||
vma = find_vma(mm, start);
|
||||
if (!vma)
|
||||
return -EFAULT;
|
||||
prev = NULL;
|
||||
for (; vma && vma->vm_start < end; vma = vma->vm_next) {
|
||||
unsigned long endvma = vma->vm_end;
|
||||
|
||||
if (endvma > end)
|
||||
endvma = end;
|
||||
if (vma->vm_start > start)
|
||||
start = vma->vm_start;
|
||||
|
||||
if (!(flags & MPOL_MF_DISCONTIG_OK)) {
|
||||
if (!vma->vm_next && vma->vm_end < end)
|
||||
return -EFAULT;
|
||||
if (prev && prev->vm_end < vma->vm_start)
|
||||
return -EFAULT;
|
||||
}
|
||||
|
||||
if (flags & MPOL_MF_LAZY) {
|
||||
/* Similar to task_numa_work, skip inaccessible VMAs */
|
||||
if (vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE))
|
||||
change_prot_numa(vma, start, endvma);
|
||||
goto next;
|
||||
}
|
||||
|
||||
if ((flags & MPOL_MF_STRICT) ||
|
||||
((flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) &&
|
||||
vma_migratable(vma))) {
|
||||
|
||||
err = queue_pages_pgd_range(vma, start, endvma, nodes,
|
||||
flags, private);
|
||||
if (err)
|
||||
break;
|
||||
}
|
||||
next:
|
||||
prev = vma;
|
||||
}
|
||||
return err;
|
||||
return walk_page_range(start, end, &queue_pages_walk);
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -1988,43 +1947,63 @@ static struct page *alloc_page_interleave(gfp_t gfp, unsigned order,
|
|||
* @order:Order of the GFP allocation.
|
||||
* @vma: Pointer to VMA or NULL if not available.
|
||||
* @addr: Virtual Address of the allocation. Must be inside the VMA.
|
||||
* @node: Which node to prefer for allocation (modulo policy).
|
||||
* @hugepage: for hugepages try only the preferred node if possible
|
||||
*
|
||||
* This function allocates a page from the kernel page pool and applies
|
||||
* a NUMA policy associated with the VMA or the current process.
|
||||
* When VMA is not NULL caller must hold down_read on the mmap_sem of the
|
||||
* mm_struct of the VMA to prevent it from going away. Should be used for
|
||||
* all allocations for pages that will be mapped into
|
||||
* user space. Returns NULL when no page can be allocated.
|
||||
*
|
||||
* Should be called with the mm_sem of the vma hold.
|
||||
* all allocations for pages that will be mapped into user space. Returns
|
||||
* NULL when no page can be allocated.
|
||||
*/
|
||||
struct page *
|
||||
alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
|
||||
unsigned long addr, int node)
|
||||
unsigned long addr, int node, bool hugepage)
|
||||
{
|
||||
struct mempolicy *pol;
|
||||
struct page *page;
|
||||
unsigned int cpuset_mems_cookie;
|
||||
struct zonelist *zl;
|
||||
nodemask_t *nmask;
|
||||
|
||||
retry_cpuset:
|
||||
pol = get_vma_policy(vma, addr);
|
||||
cpuset_mems_cookie = read_mems_allowed_begin();
|
||||
|
||||
if (unlikely(pol->mode == MPOL_INTERLEAVE)) {
|
||||
if (unlikely(IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && hugepage &&
|
||||
pol->mode != MPOL_INTERLEAVE)) {
|
||||
/*
|
||||
* For hugepage allocation and non-interleave policy which
|
||||
* allows the current node, we only try to allocate from the
|
||||
* current node and don't fall back to other nodes, as the
|
||||
* cost of remote accesses would likely offset THP benefits.
|
||||
*
|
||||
* If the policy is interleave, or does not allow the current
|
||||
* node in its nodemask, we allocate the standard way.
|
||||
*/
|
||||
nmask = policy_nodemask(gfp, pol);
|
||||
if (!nmask || node_isset(node, *nmask)) {
|
||||
mpol_cond_put(pol);
|
||||
page = alloc_pages_exact_node(node, gfp, order);
|
||||
goto out;
|
||||
}
|
||||
}
|
||||
|
||||
if (pol->mode == MPOL_INTERLEAVE) {
|
||||
unsigned nid;
|
||||
|
||||
nid = interleave_nid(pol, vma, addr, PAGE_SHIFT + order);
|
||||
mpol_cond_put(pol);
|
||||
page = alloc_page_interleave(gfp, order, nid);
|
||||
if (unlikely(!page && read_mems_allowed_retry(cpuset_mems_cookie)))
|
||||
goto retry_cpuset;
|
||||
|
||||
return page;
|
||||
goto out;
|
||||
}
|
||||
page = __alloc_pages_nodemask(gfp, order,
|
||||
policy_zonelist(gfp, pol, node),
|
||||
policy_nodemask(gfp, pol));
|
||||
|
||||
nmask = policy_nodemask(gfp, pol);
|
||||
zl = policy_zonelist(gfp, pol, node);
|
||||
mpol_cond_put(pol);
|
||||
page = __alloc_pages_nodemask(gfp, order, zl, nmask);
|
||||
out:
|
||||
if (unlikely(!page && read_mems_allowed_retry(cpuset_mems_cookie)))
|
||||
goto retry_cpuset;
|
||||
return page;
|
||||
|
|
|
@ -197,7 +197,7 @@ static void remove_migration_ptes(struct page *old, struct page *new)
|
|||
* get to the page and wait until migration is finished.
|
||||
* When we return from this function the fault will be retried.
|
||||
*/
|
||||
static void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
|
||||
void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
|
||||
spinlock_t *ptl)
|
||||
{
|
||||
pte_t pte;
|
||||
|
@ -1236,7 +1236,8 @@ static int do_move_page_to_node_array(struct mm_struct *mm,
|
|||
goto put_and_set;
|
||||
|
||||
if (PageHuge(page)) {
|
||||
isolate_huge_page(page, &pagelist);
|
||||
if (PageHead(page))
|
||||
isolate_huge_page(page, &pagelist);
|
||||
goto put_and_set;
|
||||
}
|
||||
|
||||
|
|
170
mm/mincore.c
170
mm/mincore.c
|
@ -19,38 +19,25 @@
|
|||
#include <asm/uaccess.h>
|
||||
#include <asm/pgtable.h>
|
||||
|
||||
static void mincore_hugetlb_page_range(struct vm_area_struct *vma,
|
||||
unsigned long addr, unsigned long end,
|
||||
unsigned char *vec)
|
||||
static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr,
|
||||
unsigned long end, struct mm_walk *walk)
|
||||
{
|
||||
#ifdef CONFIG_HUGETLB_PAGE
|
||||
struct hstate *h;
|
||||
unsigned char present;
|
||||
unsigned char *vec = walk->private;
|
||||
|
||||
h = hstate_vma(vma);
|
||||
while (1) {
|
||||
unsigned char present;
|
||||
pte_t *ptep;
|
||||
/*
|
||||
* Huge pages are always in RAM for now, but
|
||||
* theoretically it needs to be checked.
|
||||
*/
|
||||
ptep = huge_pte_offset(current->mm,
|
||||
addr & huge_page_mask(h));
|
||||
present = ptep && !huge_pte_none(huge_ptep_get(ptep));
|
||||
while (1) {
|
||||
*vec = present;
|
||||
vec++;
|
||||
addr += PAGE_SIZE;
|
||||
if (addr == end)
|
||||
return;
|
||||
/* check hugepage border */
|
||||
if (!(addr & ~huge_page_mask(h)))
|
||||
break;
|
||||
}
|
||||
}
|
||||
/*
|
||||
* Hugepages under user process are always in RAM and never
|
||||
* swapped out, but theoretically it needs to be checked.
|
||||
*/
|
||||
present = pte && !huge_pte_none(huge_ptep_get(pte));
|
||||
for (; addr != end; vec++, addr += PAGE_SIZE)
|
||||
*vec = present;
|
||||
walk->private = vec;
|
||||
#else
|
||||
BUG();
|
||||
#endif
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -94,9 +81,8 @@ static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff)
|
|||
return present;
|
||||
}
|
||||
|
||||
static void mincore_unmapped_range(struct vm_area_struct *vma,
|
||||
unsigned long addr, unsigned long end,
|
||||
unsigned char *vec)
|
||||
static int __mincore_unmapped_range(unsigned long addr, unsigned long end,
|
||||
struct vm_area_struct *vma, unsigned char *vec)
|
||||
{
|
||||
unsigned long nr = (end - addr) >> PAGE_SHIFT;
|
||||
int i;
|
||||
|
@ -111,23 +97,44 @@ static void mincore_unmapped_range(struct vm_area_struct *vma,
|
|||
for (i = 0; i < nr; i++)
|
||||
vec[i] = 0;
|
||||
}
|
||||
return nr;
|
||||
}
|
||||
|
||||
static void mincore_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
|
||||
unsigned long addr, unsigned long end,
|
||||
unsigned char *vec)
|
||||
static int mincore_unmapped_range(unsigned long addr, unsigned long end,
|
||||
struct mm_walk *walk)
|
||||
{
|
||||
unsigned long next;
|
||||
spinlock_t *ptl;
|
||||
pte_t *ptep;
|
||||
walk->private += __mincore_unmapped_range(addr, end,
|
||||
walk->vma, walk->private);
|
||||
return 0;
|
||||
}
|
||||
|
||||
ptep = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
|
||||
do {
|
||||
static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
|
||||
struct mm_walk *walk)
|
||||
{
|
||||
spinlock_t *ptl;
|
||||
struct vm_area_struct *vma = walk->vma;
|
||||
pte_t *ptep;
|
||||
unsigned char *vec = walk->private;
|
||||
int nr = (end - addr) >> PAGE_SHIFT;
|
||||
|
||||
if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
|
||||
memset(vec, 1, nr);
|
||||
spin_unlock(ptl);
|
||||
goto out;
|
||||
}
|
||||
|
||||
if (pmd_trans_unstable(pmd)) {
|
||||
__mincore_unmapped_range(addr, end, vma, vec);
|
||||
goto out;
|
||||
}
|
||||
|
||||
ptep = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
|
||||
for (; addr != end; ptep++, addr += PAGE_SIZE) {
|
||||
pte_t pte = *ptep;
|
||||
|
||||
next = addr + PAGE_SIZE;
|
||||
if (pte_none(pte))
|
||||
mincore_unmapped_range(vma, addr, next, vec);
|
||||
__mincore_unmapped_range(addr, addr + PAGE_SIZE,
|
||||
vma, vec);
|
||||
else if (pte_present(pte))
|
||||
*vec = 1;
|
||||
else { /* pte is a swap entry */
|
||||
|
@ -150,69 +157,12 @@ static void mincore_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
|
|||
}
|
||||
}
|
||||
vec++;
|
||||
} while (ptep++, addr = next, addr != end);
|
||||
}
|
||||
pte_unmap_unlock(ptep - 1, ptl);
|
||||
}
|
||||
|
||||
static void mincore_pmd_range(struct vm_area_struct *vma, pud_t *pud,
|
||||
unsigned long addr, unsigned long end,
|
||||
unsigned char *vec)
|
||||
{
|
||||
unsigned long next;
|
||||
pmd_t *pmd;
|
||||
|
||||
pmd = pmd_offset(pud, addr);
|
||||
do {
|
||||
next = pmd_addr_end(addr, end);
|
||||
if (pmd_trans_huge(*pmd)) {
|
||||
if (mincore_huge_pmd(vma, pmd, addr, next, vec)) {
|
||||
vec += (next - addr) >> PAGE_SHIFT;
|
||||
continue;
|
||||
}
|
||||
/* fall through */
|
||||
}
|
||||
if (pmd_none_or_trans_huge_or_clear_bad(pmd))
|
||||
mincore_unmapped_range(vma, addr, next, vec);
|
||||
else
|
||||
mincore_pte_range(vma, pmd, addr, next, vec);
|
||||
vec += (next - addr) >> PAGE_SHIFT;
|
||||
} while (pmd++, addr = next, addr != end);
|
||||
}
|
||||
|
||||
static void mincore_pud_range(struct vm_area_struct *vma, pgd_t *pgd,
|
||||
unsigned long addr, unsigned long end,
|
||||
unsigned char *vec)
|
||||
{
|
||||
unsigned long next;
|
||||
pud_t *pud;
|
||||
|
||||
pud = pud_offset(pgd, addr);
|
||||
do {
|
||||
next = pud_addr_end(addr, end);
|
||||
if (pud_none_or_clear_bad(pud))
|
||||
mincore_unmapped_range(vma, addr, next, vec);
|
||||
else
|
||||
mincore_pmd_range(vma, pud, addr, next, vec);
|
||||
vec += (next - addr) >> PAGE_SHIFT;
|
||||
} while (pud++, addr = next, addr != end);
|
||||
}
|
||||
|
||||
static void mincore_page_range(struct vm_area_struct *vma,
|
||||
unsigned long addr, unsigned long end,
|
||||
unsigned char *vec)
|
||||
{
|
||||
unsigned long next;
|
||||
pgd_t *pgd;
|
||||
|
||||
pgd = pgd_offset(vma->vm_mm, addr);
|
||||
do {
|
||||
next = pgd_addr_end(addr, end);
|
||||
if (pgd_none_or_clear_bad(pgd))
|
||||
mincore_unmapped_range(vma, addr, next, vec);
|
||||
else
|
||||
mincore_pud_range(vma, pgd, addr, next, vec);
|
||||
vec += (next - addr) >> PAGE_SHIFT;
|
||||
} while (pgd++, addr = next, addr != end);
|
||||
out:
|
||||
walk->private += nr;
|
||||
cond_resched();
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -224,18 +174,22 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v
|
|||
{
|
||||
struct vm_area_struct *vma;
|
||||
unsigned long end;
|
||||
int err;
|
||||
struct mm_walk mincore_walk = {
|
||||
.pmd_entry = mincore_pte_range,
|
||||
.pte_hole = mincore_unmapped_range,
|
||||
.hugetlb_entry = mincore_hugetlb,
|
||||
.private = vec,
|
||||
};
|
||||
|
||||
vma = find_vma(current->mm, addr);
|
||||
if (!vma || addr < vma->vm_start)
|
||||
return -ENOMEM;
|
||||
|
||||
mincore_walk.mm = vma->vm_mm;
|
||||
end = min(vma->vm_end, addr + (pages << PAGE_SHIFT));
|
||||
|
||||
if (is_vm_hugetlb_page(vma))
|
||||
mincore_hugetlb_page_range(vma, addr, end, vec);
|
||||
else
|
||||
mincore_page_range(vma, addr, end, vec);
|
||||
|
||||
err = walk_page_range(addr, end, &mincore_walk);
|
||||
if (err < 0)
|
||||
return err;
|
||||
return (end - addr) >> PAGE_SHIFT;
|
||||
}
|
||||
|
||||
|
|
|
@ -152,7 +152,7 @@ EXPORT_SYMBOL_GPL(vm_memory_committed);
|
|||
*/
|
||||
int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
|
||||
{
|
||||
unsigned long free, allowed, reserve;
|
||||
long free, allowed, reserve;
|
||||
|
||||
VM_WARN_ONCE(percpu_counter_read(&vm_committed_as) <
|
||||
-(s64)vm_committed_as_batch * num_online_cpus(),
|
||||
|
@ -220,7 +220,7 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
|
|||
*/
|
||||
if (mm) {
|
||||
reserve = sysctl_user_reserve_kbytes >> (PAGE_SHIFT - 10);
|
||||
allowed -= min(mm->total_vm / 32, reserve);
|
||||
allowed -= min_t(long, mm->total_vm / 32, reserve);
|
||||
}
|
||||
|
||||
if (percpu_counter_read_positive(&vm_committed_as) < allowed)
|
||||
|
@ -2851,9 +2851,6 @@ void exit_mmap(struct mm_struct *mm)
|
|||
vma = remove_vma(vma);
|
||||
}
|
||||
vm_unacct_memory(nr_accounted);
|
||||
|
||||
WARN_ON(atomic_long_read(&mm->nr_ptes) >
|
||||
(FIRST_USER_ADDRESS+PMD_SIZE-1)>>PMD_SHIFT);
|
||||
}
|
||||
|
||||
/* Insert vm structure into process list sorted by address
|
||||
|
|
|
@ -54,8 +54,7 @@ static inline int zref_in_nodemask(struct zoneref *zref, nodemask_t *nodes)
|
|||
/* Returns the next zone at or below highest_zoneidx in a zonelist */
|
||||
struct zoneref *next_zones_zonelist(struct zoneref *z,
|
||||
enum zone_type highest_zoneidx,
|
||||
nodemask_t *nodes,
|
||||
struct zone **zone)
|
||||
nodemask_t *nodes)
|
||||
{
|
||||
/*
|
||||
* Find the next suitable zone to use for the allocation.
|
||||
|
@ -69,7 +68,6 @@ struct zoneref *next_zones_zonelist(struct zoneref *z,
|
|||
(z->zone && !zref_in_nodemask(z, nodes)))
|
||||
z++;
|
||||
|
||||
*zone = zonelist_zone(z);
|
||||
return z;
|
||||
}
|
||||
|
||||
|
|
37
mm/nommu.c
37
mm/nommu.c
|
@ -214,6 +214,39 @@ long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
|
|||
}
|
||||
EXPORT_SYMBOL(get_user_pages);
|
||||
|
||||
long get_user_pages_locked(struct task_struct *tsk, struct mm_struct *mm,
|
||||
unsigned long start, unsigned long nr_pages,
|
||||
int write, int force, struct page **pages,
|
||||
int *locked)
|
||||
{
|
||||
return get_user_pages(tsk, mm, start, nr_pages, write, force,
|
||||
pages, NULL);
|
||||
}
|
||||
EXPORT_SYMBOL(get_user_pages_locked);
|
||||
|
||||
long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
|
||||
unsigned long start, unsigned long nr_pages,
|
||||
int write, int force, struct page **pages,
|
||||
unsigned int gup_flags)
|
||||
{
|
||||
long ret;
|
||||
down_read(&mm->mmap_sem);
|
||||
ret = get_user_pages(tsk, mm, start, nr_pages, write, force,
|
||||
pages, NULL);
|
||||
up_read(&mm->mmap_sem);
|
||||
return ret;
|
||||
}
|
||||
EXPORT_SYMBOL(__get_user_pages_unlocked);
|
||||
|
||||
long get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
|
||||
unsigned long start, unsigned long nr_pages,
|
||||
int write, int force, struct page **pages)
|
||||
{
|
||||
return __get_user_pages_unlocked(tsk, mm, start, nr_pages, write,
|
||||
force, pages, 0);
|
||||
}
|
||||
EXPORT_SYMBOL(get_user_pages_unlocked);
|
||||
|
||||
/**
|
||||
* follow_pfn - look up PFN at a user virtual address
|
||||
* @vma: memory mapping
|
||||
|
@ -1895,7 +1928,7 @@ EXPORT_SYMBOL(unmap_mapping_range);
|
|||
*/
|
||||
int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
|
||||
{
|
||||
unsigned long free, allowed, reserve;
|
||||
long free, allowed, reserve;
|
||||
|
||||
vm_acct_memory(pages);
|
||||
|
||||
|
@ -1959,7 +1992,7 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
|
|||
*/
|
||||
if (mm) {
|
||||
reserve = sysctl_user_reserve_kbytes >> (PAGE_SHIFT - 10);
|
||||
allowed -= min(mm->total_vm / 32, reserve);
|
||||
allowed -= min_t(long, mm->total_vm / 32, reserve);
|
||||
}
|
||||
|
||||
if (percpu_counter_read_positive(&vm_committed_as) < allowed)
|
||||
|
|
169
mm/oom_kill.c
169
mm/oom_kill.c
|
@ -169,8 +169,8 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
|
|||
* The baseline for the badness score is the proportion of RAM that each
|
||||
* task's rss, pagetable and swap space use.
|
||||
*/
|
||||
points = get_mm_rss(p->mm) + atomic_long_read(&p->mm->nr_ptes) +
|
||||
get_mm_counter(p->mm, MM_SWAPENTS);
|
||||
points = get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) +
|
||||
atomic_long_read(&p->mm->nr_ptes) + mm_nr_pmds(p->mm);
|
||||
task_unlock(p);
|
||||
|
||||
/*
|
||||
|
@ -266,8 +266,6 @@ enum oom_scan_t oom_scan_process_thread(struct task_struct *task,
|
|||
* Don't allow any other task to have access to the reserves.
|
||||
*/
|
||||
if (test_tsk_thread_flag(task, TIF_MEMDIE)) {
|
||||
if (unlikely(frozen(task)))
|
||||
__thaw_task(task);
|
||||
if (!force_kill)
|
||||
return OOM_SCAN_ABORT;
|
||||
}
|
||||
|
@ -353,7 +351,7 @@ static void dump_tasks(struct mem_cgroup *memcg, const nodemask_t *nodemask)
|
|||
struct task_struct *p;
|
||||
struct task_struct *task;
|
||||
|
||||
pr_info("[ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name\n");
|
||||
pr_info("[ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name\n");
|
||||
rcu_read_lock();
|
||||
for_each_process(p) {
|
||||
if (oom_unkillable_task(p, memcg, nodemask))
|
||||
|
@ -369,10 +367,11 @@ static void dump_tasks(struct mem_cgroup *memcg, const nodemask_t *nodemask)
|
|||
continue;
|
||||
}
|
||||
|
||||
pr_info("[%5d] %5d %5d %8lu %8lu %7ld %8lu %5hd %s\n",
|
||||
pr_info("[%5d] %5d %5d %8lu %8lu %7ld %7ld %8lu %5hd %s\n",
|
||||
task->pid, from_kuid(&init_user_ns, task_uid(task)),
|
||||
task->tgid, task->mm->total_vm, get_mm_rss(task->mm),
|
||||
atomic_long_read(&task->mm->nr_ptes),
|
||||
mm_nr_pmds(task->mm),
|
||||
get_mm_counter(task->mm, MM_SWAPENTS),
|
||||
task->signal->oom_score_adj, task->comm);
|
||||
task_unlock(task);
|
||||
|
@ -400,20 +399,98 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
|
|||
}
|
||||
|
||||
/*
|
||||
* Number of OOM killer invocations (including memcg OOM killer).
|
||||
* Primarily used by PM freezer to check for potential races with
|
||||
* OOM killed frozen task.
|
||||
* Number of OOM victims in flight
|
||||
*/
|
||||
static atomic_t oom_kills = ATOMIC_INIT(0);
|
||||
static atomic_t oom_victims = ATOMIC_INIT(0);
|
||||
static DECLARE_WAIT_QUEUE_HEAD(oom_victims_wait);
|
||||
|
||||
int oom_kills_count(void)
|
||||
bool oom_killer_disabled __read_mostly;
|
||||
static DECLARE_RWSEM(oom_sem);
|
||||
|
||||
/**
|
||||
* mark_tsk_oom_victim - marks the given taks as OOM victim.
|
||||
* @tsk: task to mark
|
||||
*
|
||||
* Has to be called with oom_sem taken for read and never after
|
||||
* oom has been disabled already.
|
||||
*/
|
||||
void mark_tsk_oom_victim(struct task_struct *tsk)
|
||||
{
|
||||
return atomic_read(&oom_kills);
|
||||
WARN_ON(oom_killer_disabled);
|
||||
/* OOM killer might race with memcg OOM */
|
||||
if (test_and_set_tsk_thread_flag(tsk, TIF_MEMDIE))
|
||||
return;
|
||||
/*
|
||||
* Make sure that the task is woken up from uninterruptible sleep
|
||||
* if it is frozen because OOM killer wouldn't be able to free
|
||||
* any memory and livelock. freezing_slow_path will tell the freezer
|
||||
* that TIF_MEMDIE tasks should be ignored.
|
||||
*/
|
||||
__thaw_task(tsk);
|
||||
atomic_inc(&oom_victims);
|
||||
}
|
||||
|
||||
void note_oom_kill(void)
|
||||
/**
|
||||
* unmark_oom_victim - unmarks the current task as OOM victim.
|
||||
*
|
||||
* Wakes up all waiters in oom_killer_disable()
|
||||
*/
|
||||
void unmark_oom_victim(void)
|
||||
{
|
||||
atomic_inc(&oom_kills);
|
||||
if (!test_and_clear_thread_flag(TIF_MEMDIE))
|
||||
return;
|
||||
|
||||
down_read(&oom_sem);
|
||||
/*
|
||||
* There is no need to signal the lasst oom_victim if there
|
||||
* is nobody who cares.
|
||||
*/
|
||||
if (!atomic_dec_return(&oom_victims) && oom_killer_disabled)
|
||||
wake_up_all(&oom_victims_wait);
|
||||
up_read(&oom_sem);
|
||||
}
|
||||
|
||||
/**
|
||||
* oom_killer_disable - disable OOM killer
|
||||
*
|
||||
* Forces all page allocations to fail rather than trigger OOM killer.
|
||||
* Will block and wait until all OOM victims are killed.
|
||||
*
|
||||
* The function cannot be called when there are runnable user tasks because
|
||||
* the userspace would see unexpected allocation failures as a result. Any
|
||||
* new usage of this function should be consulted with MM people.
|
||||
*
|
||||
* Returns true if successful and false if the OOM killer cannot be
|
||||
* disabled.
|
||||
*/
|
||||
bool oom_killer_disable(void)
|
||||
{
|
||||
/*
|
||||
* Make sure to not race with an ongoing OOM killer
|
||||
* and that the current is not the victim.
|
||||
*/
|
||||
down_write(&oom_sem);
|
||||
if (test_thread_flag(TIF_MEMDIE)) {
|
||||
up_write(&oom_sem);
|
||||
return false;
|
||||
}
|
||||
|
||||
oom_killer_disabled = true;
|
||||
up_write(&oom_sem);
|
||||
|
||||
wait_event(oom_victims_wait, !atomic_read(&oom_victims));
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
/**
|
||||
* oom_killer_enable - enable OOM killer
|
||||
*/
|
||||
void oom_killer_enable(void)
|
||||
{
|
||||
down_write(&oom_sem);
|
||||
oom_killer_disabled = false;
|
||||
up_write(&oom_sem);
|
||||
}
|
||||
|
||||
#define K(x) ((x) << (PAGE_SHIFT-10))
|
||||
|
@ -438,11 +515,14 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
|
|||
* If the task is already exiting, don't alarm the sysadmin or kill
|
||||
* its children or threads, just set TIF_MEMDIE so it can die quickly
|
||||
*/
|
||||
if (task_will_free_mem(p)) {
|
||||
set_tsk_thread_flag(p, TIF_MEMDIE);
|
||||
task_lock(p);
|
||||
if (p->mm && task_will_free_mem(p)) {
|
||||
mark_tsk_oom_victim(p);
|
||||
task_unlock(p);
|
||||
put_task_struct(p);
|
||||
return;
|
||||
}
|
||||
task_unlock(p);
|
||||
|
||||
if (__ratelimit(&oom_rs))
|
||||
dump_header(p, gfp_mask, order, memcg, nodemask);
|
||||
|
@ -492,6 +572,7 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
|
|||
|
||||
/* mm cannot safely be dereferenced after task_unlock(victim) */
|
||||
mm = victim->mm;
|
||||
mark_tsk_oom_victim(victim);
|
||||
pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB\n",
|
||||
task_pid_nr(victim), victim->comm, K(victim->mm->total_vm),
|
||||
K(get_mm_counter(victim->mm, MM_ANONPAGES)),
|
||||
|
@ -522,7 +603,6 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
|
|||
}
|
||||
rcu_read_unlock();
|
||||
|
||||
set_tsk_thread_flag(victim, TIF_MEMDIE);
|
||||
do_send_sig_info(SIGKILL, SEND_SIG_FORCED, victim, true);
|
||||
put_task_struct(victim);
|
||||
}
|
||||
|
@ -611,7 +691,7 @@ void oom_zonelist_unlock(struct zonelist *zonelist, gfp_t gfp_mask)
|
|||
}
|
||||
|
||||
/**
|
||||
* out_of_memory - kill the "best" process when we run out of memory
|
||||
* __out_of_memory - kill the "best" process when we run out of memory
|
||||
* @zonelist: zonelist pointer
|
||||
* @gfp_mask: memory allocation flags
|
||||
* @order: amount of memory being requested as a power of 2
|
||||
|
@ -623,7 +703,7 @@ void oom_zonelist_unlock(struct zonelist *zonelist, gfp_t gfp_mask)
|
|||
* OR try to be smart about which process to kill. Note that we
|
||||
* don't have to be perfect here, we just have to be good.
|
||||
*/
|
||||
void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
|
||||
static void __out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
|
||||
int order, nodemask_t *nodemask, bool force_kill)
|
||||
{
|
||||
const nodemask_t *mpol_mask;
|
||||
|
@ -643,9 +723,13 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
|
|||
* If current has a pending SIGKILL or is exiting, then automatically
|
||||
* select it. The goal is to allow it to allocate so that it may
|
||||
* quickly exit and free its memory.
|
||||
*
|
||||
* But don't select if current has already released its mm and cleared
|
||||
* TIF_MEMDIE flag at exit_mm(), otherwise an OOM livelock may occur.
|
||||
*/
|
||||
if (fatal_signal_pending(current) || task_will_free_mem(current)) {
|
||||
set_thread_flag(TIF_MEMDIE);
|
||||
if (current->mm &&
|
||||
(fatal_signal_pending(current) || task_will_free_mem(current))) {
|
||||
mark_tsk_oom_victim(current);
|
||||
return;
|
||||
}
|
||||
|
||||
|
@ -688,6 +772,32 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
|
|||
schedule_timeout_killable(1);
|
||||
}
|
||||
|
||||
/**
|
||||
* out_of_memory - tries to invoke OOM killer.
|
||||
* @zonelist: zonelist pointer
|
||||
* @gfp_mask: memory allocation flags
|
||||
* @order: amount of memory being requested as a power of 2
|
||||
* @nodemask: nodemask passed to page allocator
|
||||
* @force_kill: true if a task must be killed, even if others are exiting
|
||||
*
|
||||
* invokes __out_of_memory if the OOM is not disabled by oom_killer_disable()
|
||||
* when it returns false. Otherwise returns true.
|
||||
*/
|
||||
bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
|
||||
int order, nodemask_t *nodemask, bool force_kill)
|
||||
{
|
||||
bool ret = false;
|
||||
|
||||
down_read(&oom_sem);
|
||||
if (!oom_killer_disabled) {
|
||||
__out_of_memory(zonelist, gfp_mask, order, nodemask, force_kill);
|
||||
ret = true;
|
||||
}
|
||||
up_read(&oom_sem);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
/*
|
||||
* The pagefault handler calls here because it is out of memory, so kill a
|
||||
* memory-hogging task. If any populated zone has ZONE_OOM_LOCKED set, a
|
||||
|
@ -697,12 +807,25 @@ void pagefault_out_of_memory(void)
|
|||
{
|
||||
struct zonelist *zonelist;
|
||||
|
||||
down_read(&oom_sem);
|
||||
if (mem_cgroup_oom_synchronize(true))
|
||||
return;
|
||||
goto unlock;
|
||||
|
||||
zonelist = node_zonelist(first_memory_node, GFP_KERNEL);
|
||||
if (oom_zonelist_trylock(zonelist, GFP_KERNEL)) {
|
||||
out_of_memory(NULL, 0, 0, NULL, false);
|
||||
if (!oom_killer_disabled)
|
||||
__out_of_memory(NULL, 0, 0, NULL, false);
|
||||
else
|
||||
/*
|
||||
* There shouldn't be any user tasks runable while the
|
||||
* OOM killer is disabled so the current task has to
|
||||
* be a racing OOM victim for which oom_killer_disable()
|
||||
* is waiting for.
|
||||
*/
|
||||
WARN_ON(test_thread_flag(TIF_MEMDIE));
|
||||
|
||||
oom_zonelist_unlock(zonelist, GFP_KERNEL);
|
||||
}
|
||||
unlock:
|
||||
up_read(&oom_sem);
|
||||
}
|
||||
|
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue