Commit Graph

15302 Commits

Author SHA1 Message Date
Miaohe Lin 7a52d4d88a mm: memcontrol: reword obsolete comment of mem_cgroup_unmark_under_oom()
Since commit 79dfdaccd1 ("memcg: make oom_lock 0 and 1 based rather than
counter"), the mem_cgroup_unmark_under_oom() is added and the comment of
the mem_cgroup_oom_unlock() is moved here.  But this comment make no sense
here because mem_cgroup_oom_lock() does not operate on under_oom field.
So we reword the comment as this would be helpful.  [Thanks Michal Hocko
for rewording this comment.]

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Link: https://lkml.kernel.org/r/20200930095336.21323-1-linmiaohe@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:30 -07:00
Miaohe Lin d437024e69 mm/page_counter: correct the obsolete func name in the comment of page_counter_try_charge()
Since commit bbec2e1517 ("mm: rename page_counter's count/limit into
usage/max"), page_counter_limit() is renamed to page_counter_set_max().
So replace page_counter_limit with page_counter_set_max in comment.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lkml.kernel.org/r/20200917113629.14382-1-linmiaohe@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:30 -07:00
Muchun Song 5f9a4f4a70 mm: memcontrol: add the missing numa_stat interface for cgroup v2
In the cgroup v1, we have a numa_stat interface.  This is useful for
providing visibility into the numa locality information within an memcg
since the pages are allowed to be allocated from any physical node.  One
of the use cases is evaluating application performance by combining this
information with the application's CPU allocation.  But the cgroup v2 does
not.  So this patch adds the missing information.

Suggested-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Link: https://lkml.kernel.org/r/20200916100030.71698-2-songmuchun@bytedance.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:30 -07:00
Waiman Long bd0b230fe1 mm/memcg: unify swap and memsw page counters
The swap page counter is v2 only while memsw is v1 only.  As v1 and v2
controllers cannot be active at the same time, there is no point to keep
both swap and memsw page counters in mem_cgroup.  The previous patch has
made sure that memsw page counter is updated and accessed only when in v1
code paths.  So it is now safe to alias the v1 memsw page counter to v2
swap page counter.  This saves 14 long's in the size of mem_cgroup.  This
is a saving of 112 bytes for 64-bit archs.

While at it, also document which page counters are used in v1 and/or v2.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Chris Down <chris@chrisdown.name>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Yafang Shao <laoar.shao@gmail.com>
Link: https://lkml.kernel.org/r/20200914024452.19167-4-longman@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:30 -07:00
Waiman Long 8d387a5f17 mm/memcg: simplify mem_cgroup_get_max()
mem_cgroup_get_max() used to get memory+swap max from both the v1 memsw
and v2 memory+swap page counters & return the maximum of these 2 values.
This is redundant and it is more efficient to just get either the v1 or
the v2 values depending on which one is currently in use.

[longman@redhat.com: v4]
  Link: https://lkml.kernel.org/r/20200914150928.7841-1-longman@redhat.com

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Chris Down <chris@chrisdown.name>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Yafang Shao <laoar.shao@gmail.com>
Link: https://lkml.kernel.org/r/20200914024452.19167-3-longman@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:30 -07:00
Waiman Long f9f84ec56f mm/memcg: clean up obsolete enum charge_type
Patch series "mm/memcg: Miscellaneous cleanups and streamlining", v2.

This patch (of 3):

Since commit 0a31bc97c8 ("mm: memcontrol: rewrite uncharge API") and
commit 00501b531c ("mm: memcontrol: rewrite charge API") in v3.17, the
enum charge_type was no longer used anywhere.  However, the enum itself
was not removed at that time.  Remove the obsolete enum charge_type now.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Chris Down <chris@chrisdown.name>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Yafang Shao <laoar.shao@gmail.com>
Link: https://lkml.kernel.org/r/20200914024452.19167-1-longman@redhat.com
Link: https://lkml.kernel.org/r/20200914024452.19167-2-longman@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:30 -07:00
Miaohe Lin 05bdc520b3 mm: memcontrol: correct the comment of mem_cgroup_iter()
Since commit bbec2e1517 ("mm: rename page_counter's count/limit into
usage/max"), the arg @reclaim has no priority field anymore.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Link: https://lkml.kernel.org/r/20200913094129.44558-1-linmiaohe@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:30 -07:00
Roman Gushchin 19b629c979 mm: memcg/slab: fix racy access to page->mem_cgroup in mem_cgroup_from_obj()
mem_cgroup_from_obj() checks the lowest bit of the page->mem_cgroup
pointer to determine if the page has an attached obj_cgroup vector instead
of a regular memcg pointer.  If it's not set, it simple returns the
page->mem_cgroup value as a struct mem_cgroup pointer.

The commit 10befea91b ("mm: memcg/slab: use a single set of kmem_caches
for all allocations") changed the moment when this bit is set: if
previously it was set on the allocation of the slab page, now it can be
set well after, when the first accounted object is allocated on this page.

It opened a race: if page->mem_cgroup is set concurrently after the first
page_has_obj_cgroups(page) check, a pointer to the obj_cgroups array can
be returned as a memory cgroup pointer.

A simple check for page->mem_cgroup pointer for NULL before the
page_has_obj_cgroups() check fixes the race.  Indeed, if the pointer is
not NULL, it's either a simple mem_cgroup pointer or a pointer to
obj_cgroup vector.  The pointer can be asynchronously changed from NULL to
(obj_cgroup_vec | 0x1UL), but can't be changed from a valid memcg pointer
to objcg vector or back.

If the object passed to mem_cgroup_from_obj() is a slab object and
page->mem_cgroup is NULL, it means that the object is not accounted, so
the function must return NULL.

I've discovered the race looking at the code, so far I haven't seen it in
the wild.

Fixes: 10befea91b ("mm: memcg/slab: use a single set of kmem_caches for all allocations")
Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: https://lkml.kernel.org/r/20200910022435.2773735-1-guro@fb.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:30 -07:00
Gustavo A. R. Silva 61e604e636 mm: memcontrol: use the preferred form for passing the size of a structure type
Use the preferred form for passing the size of a structure type.  The
alternative form where the structure type is spelled out hurts readability
and introduces an opportunity for a bug when the object type is changed
but the corresponding object identifier to which the sizeof operator is
applied is not.

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Link: https://lkml.kernel.org/r/773e013ff2f07fe2a0b47153f14dea054c0c04f1.1596214831.git.gustavoars@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:30 -07:00
Gustavo A. R. Silva e90342e6d2 mm: memcontrol: use flex_array_size() helper in memcpy()
Make use of the flex_array_size() helper to calculate the size of a
flexible array member within an enclosing structure.

This helper offers defense-in-depth against potential integer overflows,
while at the same time makes it explicitly clear that we are dealing with
a flexible array member.

Also, remove unnecessary braces.

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Link: https://lkml.kernel.org/r/ddd60dae2d9aea1ccdd2be66634815c93696125e.1596214831.git.gustavoars@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:30 -07:00
Ira Weiny 433e7d3177 mm/memremap.c: convert devmap static branch to {inc,dec}
While reviewing Protection Key Supervisor support it was pointed out that
using a counter to track static branch enable was an anti-pattern which
was better solved using the provided static_branch_{inc,dec} functions.[1]

Fix up devmap_managed_key to work the same way.  Also this should be safer
because there is a very small (very unlikely) race when multiple callers
try to enable at the same time.

[1] https://lore.kernel.org/lkml/20200714194031.GI5523@worktop.programming.kicks-ass.net/

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Link: https://lkml.kernel.org/r/20200810235319.2796597-1-ira.weiny@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:30 -07:00
Miaohe Lin 822bca52ee mm/swapfile.c: fix potential memory leak in sys_swapon
If we failed to drain inode, we would forget to free the swap address
space allocated by init_swap_address_space() above.

Fixes: dc617f29db ("vfs: don't allow writes to swap files")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Link: https://lkml.kernel.org/r/20200930101803.53884-1-linmiaohe@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:30 -07:00
Miaohe Lin 7a3d52e45e mm/swapfile.c: remove unnecessary goto out in _swap_info_get()
It's unnecessary to goto the out label while out label is just below.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/20200930102549.1885-1-linmiaohe@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:30 -07:00
Miaohe Lin 12eab4289d mm/swap.c: fix incomplete comment in lru_cache_add_inactive_or_unevictable()
Since commit 9c4e6b1a70 ("mm, mlock, vmscan: no more skipping
pagevecs"), unevictable pages do not goes directly back onto zone's
unevictable list.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Shakeel Butt <shakeelb@google.com>
Link: https://lkml.kernel.org/r/20200927122209.59328-1-linmiaohe@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:30 -07:00
Miaohe Lin 548d9782bd mm/page_io.c: remove useless out label in __swap_writepage()
The out label is only used in one place and return ret directly without
something like resource cleanup or lock release and so on.  So we should
remove this jump label and do some cleanup.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/20200927124032.22521-1-linmiaohe@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:30 -07:00
Miaohe Lin f3bc52cb04 mm/swap_slots.c: remove always zero and unused return value of enable_swap_slots_cache()
enable_swap_slots_cache() always return zero and its return value is just
ignored by the caller.  So make enable_swap_slots_cache() void.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/20200924113554.50614-1-linmiaohe@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:30 -07:00
Miaohe Lin a3e7bea060 mm/swap.c: fix confusing comment in release_pages()
Since commit 07d8026995 ("mm: devmap: refactor 1-based refcounting for
ZONE_DEVICE pages"), we have renamed the func put_devmap_managed_page() to
page_is_devmap_managed().

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Link: https://lkml.kernel.org/r/20200905084453.19353-1-linmiaohe@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:30 -07:00
Yu Zhao 6f4dd8de48 mm: remove superfluous __ClearPageActive()
To activate a page, mark_page_accessed() always holds a reference on it.
It either gets a new reference when adding a page to
lru_pvecs.activate_page or reuses an existing one it previously got when
it added a page to lru_pvecs.lru_add.  So it doesn't call SetPageActive()
on a page that doesn't have any reference left.  Therefore, the race is
impossible these days (I didn't brother to dig into its history).

For other paths, namely reclaim and migration, a reference count is always
held while calling SetPageActive() on a page.

SetPageSlabPfmemalloc() also uses SetPageActive(), but it's irrelevant to
LRU pages.

Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Qian Cai <cai@lca.pw>
Link: http://lkml.kernel.org/r/20200818184704.3625199-2-yuzhao@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:30 -07:00
Yu Zhao cc2828b21c mm: remove activate_page() from unuse_pte()
We don't initially add anon pages to active lruvec after commit
b518154e59 ("mm/vmscan: protect the workingset on anonymous LRU").
Remove activate_page() from unuse_pte(), which seems to be missed by the
commit.  And make the function static while we are at it.

Before the commit, we called lru_cache_add_active_or_unevictable() to add
new ksm pages to active lruvec.  Therefore, activate_page() wasn't
necessary for them in the first place.

Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Link: http://lkml.kernel.org/r/20200818184704.3625199-1-yuzhao@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:30 -07:00
Gao Xiang 3264631548 swap: rename SWP_FS to SWAP_FS_OPS to avoid ambiguity
SWP_FS is used to make swap_{read,write}page() go through the filesystem,
and it's only used for swap files over NFS for now.  Otherwise it will
directly submit IO to blockdev according to swapfile extents reported by
filesystems in advance.

As Matthew pointed out [1], SWP_FS naming is somewhat confusing, so let's
rename to SWP_FS_OPS.

[1] https://lore.kernel.org/r/20200820113448.GM17456@casper.infradead.org

Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/20200822113019.11319-1-hsiangkao@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:29 -07:00
John Hubbard 146608bb75 mm/gup: protect unpin_user_pages() against npages==-ERRNO
As suggested by Dan Carpenter, fortify unpin_user_pages() just a bit,
against a typical caller mistake: check if the npages arg is really a
-ERRNO value, which would blow up the unpinning loop: WARN and return.

If this new WARN_ON() fires, then the system *might* be leaking pages (by
leaving them pinned), but probably not.  More likely, gup/pup returned a
hard -ERRNO error to the caller, who erroneously passed it here.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Link: https://lkml.kernel.org/r/20200917065706.409079-1-jhubbard@nvidia.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:29 -07:00
Barry Song 447f3e45c1 mm/gup: don't permit users to call get_user_pages with FOLL_LONGTERM
gup prohibits users from calling get_user_pages() with FOLL_PIN.  But it
allows users to call get_user_pages() with FOLL_LONGTERM only.  It seems
insensible.

Since FOLL_LONGTERM is a stricter case of FOLL_PIN, we should prohibit
users from calling get_user_pages() with FOLL_LONGTERM while not with
FOLL_PIN.

mm/gup_benchmark.c used to be the only user who did this improperly.
But it has been fixed by moving to use pin_user_pages().

[akpm@linux-foundation.org: fix CONFIG_MMU=n build]
  Link: https://lkml.kernel.org/r/CA+G9fYuNS3k0DVT62twfV746pfNhCSrk5sVMcOcQ1PGGnEseyw@mail.gmail.com

Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Link: http://lkml.kernel.org/r/20200819110100.23504-1-song.bao.hua@hisilicon.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:29 -07:00
Barry Song 657d4f7996 mm/gup_benchmark: use pin_user_pages for FOLL_LONGTERM flag
According to Documentation/core-api/pin_user_pages.rst, FOLL_PIN is a
prerequisite to FOLL_LONGTERM.  Another way of saying that is,
FOLL_LONGTERM is a specific case, more restrictive case of FOLL_PIN.

Almost all kernel modules are using pin_user_pages() with FOLL_LONGTERM,
mm/gup_benchmark.c seems to the only exception in which FOLL_PIN is not a
prerequisite to FOLL_LONGTERM.

Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: http://lkml.kernel.org/r/20200815122056.29508-1-song.bao.hua@hisilicon.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:29 -07:00
Barry Song 4c6cd03ed8 mm/gup_benchmark: update the documentation in Kconfig
In the beginning, mm/gup_benchmark.c supported get_user_pages_fast() only,
but right now, it supports the benchmarking of a couple of
get_user_pages() related calls like:

* get_user_pages_fast()
* get_user_pages()
* pin_user_pages_fast()
* pin_user_pages()

The documentation is confusing and needs update.

Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lkml.kernel.org/r/20200821032546.19992-1-song.bao.hua@hisilicon.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:29 -07:00
Yafang Shao eb1d7a65f0 mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED
Our users reported that there're some random latency spikes when their RT
process is running.  Finally we found that latency spike is caused by
FADV_DONTNEED.  Which may call lru_add_drain_all() to drain LRU cache on
remote CPUs, and then waits the per-cpu work to complete.  The wait time
is uncertain, which may be tens millisecond.

That behavior is unreasonable, because this process is bound to a specific
CPU and the file is only accessed by itself, IOW, there should be no
pagecache pages on a per-cpu pagevec of a remote CPU.  That unreasonable
behavior is partially caused by the wrong comparation of the number of
invalidated pages and the number of the target.  For example,

        if (count < (end_index - start_index + 1))

The count above is how many pages were invalidated in the local CPU, and
(end_index - start_index + 1) is how many pages should be invalidated.
The usage of (end_index - start_index + 1) is incorrect, because they are
virtual addresses, which may not mapped to pages.  Besides that, there may
be holes between start and end.  So we'd better check whether there are
still pages on per-cpu pagevec after drain the local cpu, and then decide
whether or not to call lru_add_drain_all().

After I applied it with a hotfix to our production environment, most of
the lru_add_drain_all() can be avoided.

Suggested-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lkml.kernel.org/r/20200923133318.14373-1-laoar.shao@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:29 -07:00
Matthew Wilcox (Oracle) 27a83a609b mm/filemap: fix filemap_map_pages for THP
We dereference page->mapping and page->index directly after calling
find_subpage() and these fields are not valid for tail pages.  While
commit 4101196b19 ("mm: page cache: store only head pages in i_pages")
introduced the call to find_subpage(), the problem existed prior to this;
I'm going to suggest all the way back to when THPs first existed.

The user-visible effects of this are almost negligible.  To hit it, you
have to mmap a tmpfs file at an unaligned address and then it's only a
disabled optimisation causing page faults to happen more frequently than
they otherwise would.

Fix this by keeping both head and page pointers and checking the
appropriate one.  We could use page_mapping() and page_to_index(), but
that's higher overhead.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Link: https://lkml.kernel.org/r/20200911012532.24761-1-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:29 -07:00
Matthew Wilcox (Oracle) a8cf7f272b mm: add find_lock_head
Add a new FGP_HEAD flag which avoids calling find_subpage() and add a
convenience wrapper for it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Link: https://lkml.kernel.org/r/20200910183318.20139-9-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:29 -07:00
Matthew Wilcox (Oracle) 63ec1973dd mm/shmem: return head page from find_lock_entry
Convert shmem_getpage_gfp() (the only remaining caller of
find_lock_entry()) to cope with a head page being returned instead of
the subpage for the index.

[willy@infradead.org: fix BUG()s]
  Link https://lore.kernel.org/linux-mm/20200912032042.GA6583@casper.infradead.org/

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Link: https://lkml.kernel.org/r/20200910183318.20139-8-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:29 -07:00
Matthew Wilcox (Oracle) a6de4b4873 mm: convert find_get_entry to return the head page
There are only four callers remaining of find_get_entry().
get_shadow_from_swap_cache() only wants to see shadow entries and doesn't
care about which page is returned.  Push the find_subpage() call into
find_lock_entry(), find_get_incore_page() and pagecache_get_page().

[willy@infradead.org: fix oops]
  Link: https://lkml.kernel.org/r/20200914112738.GM6583@casper.infradead.org

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Link: https://lkml.kernel.org/r/20200910183318.20139-7-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:29 -07:00
Matthew Wilcox (Oracle) 9dfc8ff34b i915: use find_lock_page instead of find_lock_entry
i915 does not want to see value entries.  Switch it to use
find_lock_page() instead, and remove the export of find_lock_entry().
Move find_lock_entry() and find_get_entry() to mm/internal.h to discourage
any future use.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Link: https://lkml.kernel.org/r/20200910183318.20139-6-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:29 -07:00
Matthew Wilcox (Oracle) e6e88712e4 mm: optimise madvise WILLNEED
Instead of calling find_get_entry() for every page index, use an XArray
iterator to skip over NULL entries, and avoid calling get_page(),
because we only want the swap entries.

[willy@infradead.org: fix LTP soft lockups]
  Link: https://lkml.kernel.org/r/20200914165032.GS6583@casper.infradead.org

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Qian Cai <cai@redhat.com>
Link: https://lkml.kernel.org/r/20200910183318.20139-4-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:29 -07:00
Matthew Wilcox (Oracle) f5df8635c5 mm: use find_get_incore_page in memcontrol
The current code does not protect against swapoff of the underlying
swap device, so this is a bug fix as well as a worthwhile reduction in
code complexity.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Link: https://lkml.kernel.org/r/20200910183318.20139-3-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:29 -07:00
Matthew Wilcox (Oracle) 61ef186557 mm: factor find_get_incore_page out of mincore_page
Patch series "Return head pages from find_*_entry", v2.

This patch series started out as part of the THP patch set, but it has
some nice effects along the way and it seems worth splitting it out and
submitting separately.

Currently find_get_entry() and find_lock_entry() return the page
corresponding to the requested index, but the first thing most callers do
is find the head page, which we just threw away.  As part of auditing all
the callers, I found some misuses of the APIs and some plain
inefficiencies that I've fixed.

The diffstat is unflattering, but I added more kernel-doc and a new wrapper.

This patch (of 8);

Provide this functionality from the swap cache.  It's useful for
more than just mincore().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Link: https://lkml.kernel.org/r/20200910183318.20139-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20200910183318.20139-2-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:29 -07:00
John Hubbard bac3cf4d01 mm, dump_page: rename head_mapcount() --> head_compound_mapcount()
Rename head_pincount() --> head_compound_pincount().  These names are more
accurate (or less misleading) than the original ones.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Link: https://lkml.kernel.org/r/20200807183358.105097-1-jhubbard@nvidia.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:29 -07:00
Matthew Wilcox (Oracle) 853322a671 mm/debug.c: do not dereference i_ino blindly
__dump_page() checks i_dentry is fetchable and i_ino is earlier in the
struct than i_ino, so it ought to work fine, but it's possible that struct
randomisation has reordered i_ino after i_dentry and the pointer is just
wild enough that i_dentry is fetchable and i_ino isn't.

Also print the inode number if the dentry is invalid.

Reported-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lkml.kernel.org/r/20200819185710.28180-1-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:29 -07:00
Dan Williams b7b3c01b19 mm/memremap_pages: support multiple ranges per invocation
In support of device-dax growing the ability to front physically
dis-contiguous ranges of memory, update devm_memremap_pages() to track
multiple ranges with a single reference counter and devm instance.

Convert all [devm_]memremap_pages() users to specify the number of ranges
they are mapping in their 'struct dev_pagemap' instance.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: "Jérôme Glisse" <jglisse@redhat.co
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Yan <yanaijie@huawei.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Jia He <justin.he@arm.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/159643103789.4062302.18426128170217903785.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106116293.30709.13350662794915396198.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:28 -07:00
Dan Williams a4574f63ed mm/memremap_pages: convert to 'struct range'
The 'struct resource' in 'struct dev_pagemap' is only used for holding
resource span information.  The other fields, 'name', 'flags', 'desc',
'parent', 'sibling', and 'child' are all unused wasted space.

This is in preparation for introducing a multi-range extension of
devm_memremap_pages().

The bulk of this change is unwinding all the places internal to libnvdimm
that used 'struct resource' unnecessarily, and replacing instances of
'struct dev_pagemap'.res with 'struct dev_pagemap'.range.

P2PDMA had a minor usage of the resource flags field, but only to report
failures with "%pR".  That is replaced with an open coded print of the
range.

[dan.carpenter@oracle.com: mm/hmm/test: use after free in dmirror_allocate_chunk()]
  Link: https://lkml.kernel.org/r/20200926121402.GA7467@kadam

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>	[xen]
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Yan <yanaijie@huawei.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Jia He <justin.he@arm.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/159643103173.4062302.768998885691711532.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106115761.30709.13539840236873663620.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:28 -07:00
Dan Williams a035b6bf86 mm/memory_hotplug: introduce default phys_to_target_node() implementation
In preparation to set a fallback value for dev_dax->target_node, introduce
generic fallback helpers for phys_to_target_node()

A generic implementation based on node-data or memblock was proposed, but
as noted by Mike:

    "Here again, I would prefer to add a weak default for
     phys_to_target_node() because the "generic" implementation is not really
     generic.

     The fallback to reserved ranges is x86 specfic because on x86 most of
     the reserved areas is not in memblock.memory. AFAIK, no other
     architecture does this."

The info message in the generic memory_add_physaddr_to_nid()
implementation is fixed up to properly reflect that
memory_add_physaddr_to_nid() communicates "online" node info and
phys_to_target_node() indicates "target / to-be-onlined" node info.

[akpm@linux-foundation.org: fix CONFIG_MEMORY_HOTPLUG=n build]
  Link: https://lkml.kernel.org/r/202008252130.7YrHIyMI%25lkp@intel.com

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Jia He <justin.he@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Jason Yan <yanaijie@huawei.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Link: https://lkml.kernel.org/r/159643097768.4062302.3135192588966888630.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:27 -07:00
Hui Su 1abbef4f51 mm,kmemleak-test.c: move kmemleak-test.c to samples dir
kmemleak-test.c is just a kmemleak test module, which also can not be used
as a built-in kernel module.  Thus, i think it may should not be in mm
dir, and move the kmemleak-test.c to samples/kmemleak/kmemleak-test.c.
Fix the spelling of built-in by the way.

Signed-off-by: Hui Su <sh_def@163.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Rob Herring <robh@kernel.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Divya Indi <divya.indi@oracle.com>
Cc: Tomas Winkler <tomas.winkler@intel.com>
Cc: David Howells <dhowells@redhat.com>
Link: https://lkml.kernel.org/r/20200925183729.GA172837@rlk
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:27 -07:00
Davidlohr Bueso c4b28963fd mm/kmemleak: rely on rcu for task stack scanning
kmemleak_scan() currently relies on the big tasklist_lock hammer to
stabilize iterating through the tasklist.  Instead, this patch proposes
simply using rcu along with the rcu-safe for_each_process_thread flavor
(without changing scan semantics), which doesn't make use of
next_thread/p->thread_group and thus cannot race with exit.  Furthermore,
any races with fork() and not seeing the new child should be benign as
it's not running yet and can also be detected by the next scan.

Avoiding the tasklist_lock could prove beneficial for performance
considering the scan operation is done periodically.  I have seen
improvements of 30%-ish when doing similar replacements on very
pathological microbenchmarks (ie stressing get/setpriority(2)).

However my main motivation is that it's one less user of the global
lock, something that Linus has long time wanted to see gone eventually
(if ever) even if the traditional fairness issues has been dealt with
now with qrwlocks.  Of course this is a very long ways ahead.  This
patch also kills another user of the deprecated tsk->thread_group.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Qian Cai <cai@lca.pw>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20200820203902.11308-1-dave@stgolabs.net
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:27 -07:00
Abel Wu 9cf7a11183 mm/slub: make add_full() condition more explicit
The commit below is incomplete, as it didn't handle the add_full() part.
commit a4d3f8916c ("slub: remove useless kmem_cache_debug() before
remove_full()")

This patch checks for SLAB_STORE_USER instead of kmem_cache_debug(), since
that should be the only context in which we need the list_lock for
add_full().

Signed-off-by: Abel Wu <wuyun.wu@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Liu Xiang <liu.xiang6@zte.com.cn>
Link: https://lkml.kernel.org/r/20200811020240.1231-1-wuyun.wu@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:27 -07:00
Abel Wu 9f986d998a mm/slub: fix missing ALLOC_SLOWPATH stat when bulk alloc
The ALLOC_SLOWPATH statistics is missing in bulk allocation now.  Fix it
by doing statistics in alloc slow path.

Signed-off-by: Abel Wu <wuyun.wu@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Hewenliang <hewenliang4@huawei.com>
Cc: Hu Shiyuan <hushiyuan@huawei.com>
Link: http://lkml.kernel.org/r/20200811022427.1363-1-wuyun.wu@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:27 -07:00
Abel Wu c270cf3041 mm/slub.c: branch optimization in free slowpath
The two conditions are mutually exclusive and gcc compiler will optimise
this into if-else-like pattern.  Given that the majority of free_slowpath
is free_frozen, let's provide some hint to the compilers.

Tests (perf bench sched messaging -g 20 -l 400000, executed 10x
after reboot) are done and the summarized result:

	un-patched	patched
max.	192.316		189.851
min.	187.267		186.252
avg.	189.154		188.086
stdev.	1.37		0.99

Signed-off-by: Abel Wu <wuyun.wu@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Hewenliang <hewenliang4@huawei.com>
Cc: Hu Shiyuan <hushiyuan@huawei.com>
Link: http://lkml.kernel.org/r/20200813101812.1617-1-wuyun.wu@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:27 -07:00
Mateusz Nosek c1ff3f9549 mm/slab.c: clean code by removing redundant if condition
The removed code was unnecessary and changed nothing in the flow, since in
case of returning NULL by 'kmem_cache_alloc_node' returning 'freelist'
from the function in question is the same as returning NULL.

Signed-off-by: Mateusz Nosek <mateusznosek0@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Link: https://lkml.kernel.org/r/20200915230329.13002-1-mateusznosek0@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:27 -07:00
Linus Torvalds 3ad11d7ac8 block-5.10-2020-10-12
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl+EWUgQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpnoxEADCVSNBRkpV0OVkOEC3wf8EGhXhk01Jnjtl
 u5Mg2V55hcgJ0thQxBV/V28XyqmsEBrmAVi0Yf8Vr9Qbq4Ze08Wae4ChS4rEOyh1
 jTcGYWx5aJB3ChLvV/HI0nWQ3bkj03mMrL3SW8rhhf5DTyKHsVeTenpx42Qu/FKf
 fRzi09FSr3Pjd0B+EX6gunwJnlyXQC5Fa4AA0GhnXJzAznANXxHkkcXu8a6Yw75x
 e28CfhIBliORsK8sRHLoUnPpeTe1vtxCBhBMsE+gJAj9ZUOWMzvNFIPP4FvfawDy
 6cCQo2m1azJ/IdZZCDjFUWyjh+wxdKMp+NNryEcoV+VlqIoc3n98rFwrSL+GIq5Z
 WVwEwq+AcwoMCsD29Lu1ytL2PQ/RVqcJP5UheMrbL4vzefNfJFumQVZLIcX0k943
 8dFL2QHL+H/hM9Dx5y5rjeiWkAlq75v4xPKVjh/DHb4nehddCqn/+DD5HDhNANHf
 c1kmmEuYhvLpIaC4DHjE6DwLh8TPKahJjwsGuBOTr7D93NUQD+OOWsIhX6mNISIl
 FFhP8cd0/ZZVV//9j+q+5B4BaJsT+ZtwmrelKFnPdwPSnh+3iu8zPRRWO+8P8fRC
 YvddxuJAmE6BLmsAYrdz6Xb/wqfyV44cEiyivF0oBQfnhbtnXwDnkDWSfJD1bvCm
 ZwfpDh2+Tg==
 =LzyE
 -----END PGP SIGNATURE-----

Merge tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:

 - Series of merge handling cleanups (Baolin, Christoph)

 - Series of blk-throttle fixes and cleanups (Baolin)

 - Series cleaning up BDI, seperating the block device from the
   backing_dev_info (Christoph)

 - Removal of bdget() as a generic API (Christoph)

 - Removal of blkdev_get() as a generic API (Christoph)

 - Cleanup of is-partition checks (Christoph)

 - Series reworking disk revalidation (Christoph)

 - Series cleaning up bio flags (Christoph)

 - bio crypt fixes (Eric)

 - IO stats inflight tweak (Gabriel)

 - blk-mq tags fixes (Hannes)

 - Buffer invalidation fixes (Jan)

 - Allow soft limits for zone append (Johannes)

 - Shared tag set improvements (John, Kashyap)

 - Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel)

 - DM no-wait support (Mike, Konstantin)

 - Request allocation improvements (Ming)

 - Allow md/dm/bcache to use IO stat helpers (Song)

 - Series improving blk-iocost (Tejun)

 - Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang,
   Xianting, Yang, Yufen, yangerkun)

* tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits)
  block: fix uapi blkzoned.h comments
  blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue
  blk-mq: get rid of the dead flush handle code path
  block: get rid of unnecessary local variable
  block: fix comment and add lockdep assert
  blk-mq: use helper function to test hw stopped
  block: use helper function to test queue register
  block: remove redundant mq check
  block: invoke blk_mq_exit_sched no matter whether have .exit_sched
  percpu_ref: don't refer to ref->data if it isn't allocated
  block: ratelimit handle_bad_sector() message
  blk-throttle: Re-use the throtl_set_slice_end()
  blk-throttle: Open code __throtl_de/enqueue_tg()
  blk-throttle: Move service tree validation out of the throtl_rb_first()
  blk-throttle: Move the list operation after list validation
  blk-throttle: Fix IO hang for a corner case
  blk-throttle: Avoid tracking latency if low limit is invalid
  blk-throttle: Avoid getting the current time if tg->last_finish_time is 0
  blk-throttle: Remove a meaningless parameter for throtl_downgrade_state()
  block: Remove redundant 'return' statement
  ...
2020-10-13 12:12:44 -07:00
Linus Torvalds 85ed13e78d Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull compat iovec cleanups from Al Viro:
 "Christoph's series around import_iovec() and compat variant thereof"

* 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  security/keys: remove compat_keyctl_instantiate_key_iov
  mm: remove compat_process_vm_{readv,writev}
  fs: remove compat_sys_vmsplice
  fs: remove the compat readv/writev syscalls
  fs: remove various compat readv/writev helpers
  iov_iter: transparently handle compat iovecs in import_iovec
  iov_iter: refactor rw_copy_check_uvector and import_iovec
  iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c
  compat.h: fix a spelling error in <linux/compat.h>
2020-10-12 16:35:51 -07:00
Linus Torvalds 50d228345a As hoped, things calmed down for docs this cycle; fewer changes and almost
no conflicts at all.  This pull includes:
 
  - A reworked and expanded user-mode Linux document
  - Some simplifications and improvements for submitting-patches.rst
  - An emergency fix for (some) problems with Sphinx 3.x
  - Some welcome automarkup improvements to automatically generate
    cross-references to struct definitions and other documents
  - The usual collection of translation updates, typo fixes, etc.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl+ErNYPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5Y284H/3bv9fahbg16AJcKYqJXFHGpDs3CsASPnJqQ
 9HQoV5tg6Qd4kI3oFb+30l8SK73Wr2t685/DhOPDRR/vN3B5M1vOQvPRL/dEqiwi
 aUEhtMbnC/trSbteXsjGDWT+1EnI/+R3NFV++WiRp1XxE4DRXL3xySTeviR0IW+V
 rQxU7VCcVp0bklVH+gqjrsvqU5iZeckyZB6evc8X92ThhzjNprR5KVxxgl1wxcu/
 dPYizHoKYVoLVNw50rwPGt2hmq9RpyDM6Xh9UhLHcA57ENyzr8NNTJBOT0tVMTWV
 smU01X/ECoy54kj1w8AKP+f7F0G7DUU+Jz68X0X/kYPq520dUs4=
 =Ovox
 -----END PGP SIGNATURE-----

Merge tag 'docs-5.10' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "As hoped, things calmed down for docs this cycle; fewer changes and
  almost no conflicts at all. This includes:

   - A reworked and expanded user-mode Linux document

   - Some simplifications and improvements for submitting-patches.rst

   - An emergency fix for (some) problems with Sphinx 3.x

   - Some welcome automarkup improvements to automatically generate
     cross-references to struct definitions and other documents

   - The usual collection of translation updates, typo fixes, etc"

* tag 'docs-5.10' of git://git.lwn.net/linux: (81 commits)
  gpiolib: Update indentation in driver.rst for code excerpts
  Documentation/admin-guide: tainted-kernels: Fix typo occured
  Documentation: better locations for sysfs-pci, sysfs-tagging
  docs: programming-languages: refresh blurb on clang support
  Documentation: kvm: fix a typo
  Documentation: Chinese translation of Documentation/arm64/amu.rst
  doc: zh_CN: index files in arm64 subdirectory
  mailmap: add entry for <mstarovoitov@marvell.com>
  doc: seq_file: clarify role of *pos in ->next()
  docs: trace: ring-buffer-design.rst: use the new SPDX tag
  Documentation: kernel-parameters: clarify "module." parameters
  Fix references to nommu-mmap.rst
  docs: rewrite admin-guide/sysctl/abi.rst
  docs: fb: Remove vesafb scrollback boot option
  docs: fb: Remove sstfb scrollback boot option
  docs: fb: Remove matroxfb scrollback boot option
  docs: fb: Remove framebuffer scrollback boot option
  docs: replace the old User Mode Linux HowTo with a new one
  Documentation/admin-guide: blockdev/ramdisk: remove use of "rdev"
  Documentation/admin-guide: README & svga: remove use of "rdev"
  ...
2020-10-12 16:21:29 -07:00
Linus Torvalds ed016af52e These are the locking updates for v5.10:
- Add deadlock detection for recursive read-locks. The rationale is outlined
    in:
 
      224ec489d3cd: ("lockdep/Documention: Recursive read lock detection reasoning")
 
    The main deadlock pattern we want to detect is:
 
            TASK A:                 TASK B:
 
            read_lock(X);
                                    write_lock(X);
            read_lock_2(X);
 
  - Add "latch sequence counters" (seqcount_latch_t):
 
       A sequence counter variant where the counter even/odd value is used to
       switch between two copies of protected data. This allows the read path,
       typically NMIs, to safely interrupt the write side critical section.
 
    We utilize this new variant for sched-clock, and to make x86 TSC handling safer.
 
  - Other seqlock cleanups, fixes and enhancements
 
  - KCSAN updates
 
  - LKMM updates
 
  - Misc updates, cleanups and fixes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl+EX6QRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g3gxAAkg+Jy/tcdRxlxlEDOQPFy1mBqvFmulNA
 pGFPkB6dzqmAWF/NfOZSl4g/h/mqGYsq2V+PfK5E8Sq8DQ/yCmnLhjgVOHNUUliv
 x0WWfOysNgJdtdf69NLYJufIQhxhyI0dwFHHoHIsCdGdGqjh2DVevQFPFTBjdpOc
 BUZYo+u3gCaCdB6A2nmlcWYbEw8eVEHgv3qLG6dq46J0KJOV0HfliqJoU3EZqH+s
 977LvEIo+THfuYWMo/Jepwngbi0y36KeeukOAdwm9fK196htBHIUR+YPPrAe+FWD
 z+UXP5IS5XIw9V1sGLmUaC2m+6gpdW19jKBtlzPkxHXmJmsgiZdLLeytEh3WYey7
 nzfH+9Jd4NyyZKucLssYkOjf6P5BxGKCyJ9LXb7vlSthIhiDdFNx47oKtW4hxjOY
 jubsI3BP5c3G1sIBIjTS53XmOhJg+Z52FxTpQ33JswXn1wGidcHZiuNHZuU5q28p
 +tn8rGb2NGJFb4Sw/Vp0yTcqIpEXf+vweiQoaxm6tc9BWzcVzZntGnh0i3gFotx/
 VgKafN4+pgXgo6bwHbN2WBK2FGyvcXFaptfaOMZL48En82hJ1DI6EnBEYN+vuERQ
 JcCXg+iHeeVbxoou7q8NJxITkBmEL5xNBIugXRRqNSP3fXLxKjFuPYqT84/e7yZi
 elGTReYcq6g=
 =Iq51
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:
 "These are the locking updates for v5.10:

   - Add deadlock detection for recursive read-locks.

     The rationale is outlined in commit 224ec489d3 ("lockdep/
     Documention: Recursive read lock detection reasoning")

     The main deadlock pattern we want to detect is:

           TASK A:                 TASK B:

           read_lock(X);
                                   write_lock(X);
           read_lock_2(X);

   - Add "latch sequence counters" (seqcount_latch_t):

     A sequence counter variant where the counter even/odd value is used
     to switch between two copies of protected data. This allows the
     read path, typically NMIs, to safely interrupt the write side
     critical section.

     We utilize this new variant for sched-clock, and to make x86 TSC
     handling safer.

   - Other seqlock cleanups, fixes and enhancements

   - KCSAN updates

   - LKMM updates

   - Misc updates, cleanups and fixes"

* tag 'locking-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
  lockdep: Revert "lockdep: Use raw_cpu_*() for per-cpu variables"
  lockdep: Fix lockdep recursion
  lockdep: Fix usage_traceoverflow
  locking/atomics: Check atomic-arch-fallback.h too
  locking/seqlock: Tweak DEFINE_SEQLOCK() kernel doc
  lockdep: Optimize the memory usage of circular queue
  seqlock: Unbreak lockdep
  seqlock: PREEMPT_RT: Do not starve seqlock_t writers
  seqlock: seqcount_LOCKNAME_t: Introduce PREEMPT_RT support
  seqlock: seqcount_t: Implement all read APIs as statement expressions
  seqlock: Use unique prefix for seqcount_t property accessors
  seqlock: seqcount_LOCKNAME_t: Standardize naming convention
  seqlock: seqcount latch APIs: Only allow seqcount_latch_t
  rbtree_latch: Use seqcount_latch_t
  x86/tsc: Use seqcount_latch_t
  timekeeping: Use seqcount_latch_t
  time/sched_clock: Use seqcount_latch_t
  seqlock: Introduce seqcount_latch_t
  mm/swap: Do not abuse the seqcount_t latching API
  time/sched_clock: Use raw_read_seqcount_latch() during suspend
  ...
2020-10-12 13:06:20 -07:00
Linus Torvalds 6734e20e39 arm64 updates for 5.10
- Userspace support for the Memory Tagging Extension introduced by Armv8.5.
   Kernel support (via KASAN) is likely to follow in 5.11.
 
 - Selftests for MTE, Pointer Authentication and FPSIMD/SVE context
   switching.
 
 - Fix and subsequent rewrite of our Spectre mitigations, including the
   addition of support for PR_SPEC_DISABLE_NOEXEC.
 
 - Support for the Armv8.3 Pointer Authentication enhancements.
 
 - Support for ASID pinning, which is required when sharing page-tables with
   the SMMU.
 
 - MM updates, including treating flush_tlb_fix_spurious_fault() as a no-op.
 
 - Perf/PMU driver updates, including addition of the ARM CMN PMU driver and
   also support to handle CPU PMU IRQs as NMIs.
 
 - Allow prefetchable PCI BARs to be exposed to userspace using normal
   non-cacheable mappings.
 
 - Implementation of ARCH_STACKWALK for unwinding.
 
 - Improve reporting of unexpected kernel traps due to BPF JIT failure.
 
 - Improve robustness of user-visible HWCAP strings and their corresponding
   numerical constants.
 
 - Removal of TEXT_OFFSET.
 
 - Removal of some unused functions, parameters and prototypes.
 
 - Removal of MPIDR-based topology detection in favour of firmware
   description.
 
 - Cleanups to handling of SVE and FPSIMD register state in preparation
   for potential future optimisation of handling across syscalls.
 
 - Cleanups to the SDEI driver in preparation for support in KVM.
 
 - Miscellaneous cleanups and refactoring work.
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl+AUXMQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNFc1B/4q2Kabe+pPu7s1f58Q+OTaEfqcr3F1qh27
 F1YpFZUYxg0GPfPsFrnbJpo5WKo7wdR9ceI9yF/GHjs7A/MSoQJis3pG6SlAd9c0
 nMU5tCwhg9wfq6asJtl0/IPWem6cqqhdzC6m808DjeHuyi2CCJTt0vFWH3OeHEhG
 cfmLfaSNXOXa/MjEkT8y1AXJ/8IpIpzkJeCRA1G5s18PXV9Kl5bafIo9iqyfKPLP
 0rJljBmoWbzuCSMc81HmGUQI4+8KRp6HHhyZC/k0WEVgj3LiumT7am02bdjZlTnK
 BeNDKQsv2Jk8pXP2SlrI3hIUTz0bM6I567FzJEokepvTUzZ+CVBi
 =9J8H
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "There's quite a lot of code here, but much of it is due to the
  addition of a new PMU driver as well as some arm64-specific selftests
  which is an area where we've traditionally been lagging a bit.

  In terms of exciting features, this includes support for the Memory
  Tagging Extension which narrowly missed 5.9, hopefully allowing
  userspace to run with use-after-free detection in production on CPUs
  that support it. Work is ongoing to integrate the feature with KASAN
  for 5.11.

  Another change that I'm excited about (assuming they get the hardware
  right) is preparing the ASID allocator for sharing the CPU page-table
  with the SMMU. Those changes will also come in via Joerg with the
  IOMMU pull.

  We do stray outside of our usual directories in a few places, mostly
  due to core changes required by MTE. Although much of this has been
  Acked, there were a couple of places where we unfortunately didn't get
  any review feedback.

  Other than that, we ran into a handful of minor conflicts in -next,
  but nothing that should post any issues.

  Summary:

   - Userspace support for the Memory Tagging Extension introduced by
     Armv8.5. Kernel support (via KASAN) is likely to follow in 5.11.

   - Selftests for MTE, Pointer Authentication and FPSIMD/SVE context
     switching.

   - Fix and subsequent rewrite of our Spectre mitigations, including
     the addition of support for PR_SPEC_DISABLE_NOEXEC.

   - Support for the Armv8.3 Pointer Authentication enhancements.

   - Support for ASID pinning, which is required when sharing
     page-tables with the SMMU.

   - MM updates, including treating flush_tlb_fix_spurious_fault() as a
     no-op.

   - Perf/PMU driver updates, including addition of the ARM CMN PMU
     driver and also support to handle CPU PMU IRQs as NMIs.

   - Allow prefetchable PCI BARs to be exposed to userspace using normal
     non-cacheable mappings.

   - Implementation of ARCH_STACKWALK for unwinding.

   - Improve reporting of unexpected kernel traps due to BPF JIT
     failure.

   - Improve robustness of user-visible HWCAP strings and their
     corresponding numerical constants.

   - Removal of TEXT_OFFSET.

   - Removal of some unused functions, parameters and prototypes.

   - Removal of MPIDR-based topology detection in favour of firmware
     description.

   - Cleanups to handling of SVE and FPSIMD register state in
     preparation for potential future optimisation of handling across
     syscalls.

   - Cleanups to the SDEI driver in preparation for support in KVM.

   - Miscellaneous cleanups and refactoring work"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (148 commits)
  Revert "arm64: initialize per-cpu offsets earlier"
  arm64: random: Remove no longer needed prototypes
  arm64: initialize per-cpu offsets earlier
  kselftest/arm64: Check mte tagged user address in kernel
  kselftest/arm64: Verify KSM page merge for MTE pages
  kselftest/arm64: Verify all different mmap MTE options
  kselftest/arm64: Check forked child mte memory accessibility
  kselftest/arm64: Verify mte tag inclusion via prctl
  kselftest/arm64: Add utilities and a test to validate mte memory
  perf: arm-cmn: Fix conversion specifiers for node type
  perf: arm-cmn: Fix unsigned comparison to less than zero
  arm64: dbm: Invalidate local TLB when setting TCR_EL1.HD
  arm64: mm: Make flush_tlb_fix_spurious_fault() a no-op
  arm64: Add support for PR_SPEC_DISABLE_NOEXEC prctl() option
  arm64: Pull in task_stack_page() to Spectre-v4 mitigation code
  KVM: arm64: Allow patching EL2 vectors even with KASLR is not enabled
  arm64: Get rid of arm64_ssbd_state
  KVM: arm64: Convert ARCH_WORKAROUND_2 to arm64_get_spectre_v4_state()
  KVM: arm64: Get rid of kvm_arm_have_ssbd()
  KVM: arm64: Simplify handling of ARCH_WORKAROUND_2
  ...
2020-10-12 10:00:51 -07:00
Vijay Balakrishna 4aab2be098 mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged
When memory is hotplug added or removed the min_free_kbytes should be
recalculated based on what is expected by khugepaged.  Currently after
hotplug, min_free_kbytes will be set to a lower default and higher
default set when THP enabled is lost.

This change restores min_free_kbytes as expected for THP consumers.

[vijayb@linux.microsoft.com: v5]
  Link: https://lkml.kernel.org/r/1601398153-5517-1-git-send-email-vijayb@linux.microsoft.com

Fixes: f000565adb ("thp: set recommended min free kbytes")
Signed-off-by: Vijay Balakrishna <vijayb@linux.microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Allen Pais <apais@microsoft.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/1600305709-2319-2-git-send-email-vijayb@linux.microsoft.com
Link: https://lkml.kernel.org/r/1600204258-13683-1-git-send-email-vijayb@linux.microsoft.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-11 10:31:11 -07:00