Commit Graph

676130 Commits

Author SHA1 Message Date
Ross Zwisler 678c9fd043 dax: add tracepoints to dax_load_hole()
Add tracepoints to dax_load_hole(), following the same logging conventions
as the rest of DAX.

Here is the logging generated by a PTE read from a hole:

  read-1075  [002] ....
    62.362108: dax_pte_fault: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280

  read-1075  [002] ....
    62.362140: dax_load_hole: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280 NOPAGE

  read-1075  [002] ....
    62.362141: dax_pte_fault_done: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280 NOPAGE

Link: http://lkml.kernel.org/r/20170221195116.13278-4-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:16 -07:00
Ross Zwisler c3ff68d7d1 dax: add tracepoints to dax_pfn_mkwrite()
Add tracepoints to dax_pfn_mkwrite(), following the same logging
conventions as the rest of DAX.

Here is an example PTE fault followed by a pfn_mkwrite:

  small_aligned-1094  [002] ....
   374.084998: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200

  small_aligned-1094  [002] ....
   374.085145: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200 MAJOR|NOPAGE

  small_aligned-1094  [002] ....
   374.085165: dax_pfn_mkwrite: dev 259:0 ino 0x1003 shared WRITE|MKWRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200 NOPAGE

Link: http://lkml.kernel.org/r/20170221195116.13278-3-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:15 -07:00
Ross Zwisler a9c42b33ed dax: add tracepoints to dax_iomap_pte_fault()
Patch series "second round of tracepoints for DAX".

This second round of DAX tracepoint patches adds tracing to the PTE
fault path (dax_iomap_pte_fault(), dax_pfn_mkwrite(), dax_load_hole(),
dax_insert_mapping()) and to the writeback path
(dax_writeback_mapping_range(), dax_writeback_one()).

The purpose of this tracing is to give us a high level view of what DAX
is doing, whether faults are being serviced by PMDs or PTEs, and by real
storage or by zero pages covering holes.

I do have some patches nearly ready which also add tracing to
grab_mapping_entry() and dax_insert_mapping_entry().  These are more
targeted at logging how we are interacting with the radix tree, how we
use empty entries for locking, whether we "downgrade" huge zero pages to
4k PTE sized allocations, etc.  In the end it seemed to me that this
might be too detailed to have as constantly present tracepoints, but if
anyone sees value in having tracepoints like this in the DAX code
permanently (Jan?), please let me know and I'll add those last two
patches.

All these tracepoints were done to be consistent with the style of the
XFS tracepoints and with the existing DAX PMD tracepoints.

This patch (of 6):

Add tracepoints to dax_iomap_pte_fault(), following the same logging
conventions as the rest of DAX.

Here is an example fault that initially tries to be serviced by the PMD
fault handler but which falls back to PTEs because the VMA isn't large
enough to hold a PMD:

  small-1086  [005] ....
   71.140014: xfs_filemap_huge_fault: dev 259:0 ino 0x1003

  small-1086  [005] ....
    71.140027: dax_pmd_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 vm_start 0x10200000 vm_end 0x10500000 pgoff 0x220 max_pgoff 0x1400

  small-1086  [005] ....
    71.140028: dax_pmd_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 vm_start 0x10200000 vm_end 0x10500000 pgoff 0x220 max_pgoff 0x1400 FALLBACK

  small-1086  [005] ....
    71.140035: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220

  small-1086  [005] ....
    71.140396: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220 MAJOR|NOPAGE

Link: http://lkml.kernel.org/r/20170221195116.13278-2-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:15 -07:00
Vlastimil Babka dcbe82149c mtd: nand: nandsim: convert to memalloc_noreclaim_*()
Nandsim has own functions set_memalloc() and clear_memalloc() for robust
setting and clearing of PF_MEMALLOC.  Replace them by the new generic
helpers.  No functional change.

Link: http://lkml.kernel.org/r/20170405074700.29871-5-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Boris Brezillon <boris.brezillon@free-electrons.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Chris Leech <cleech@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Lee Duncan <lduncan@suse.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:15 -07:00
Vlastimil Babka f108304872 treewide: convert PF_MEMALLOC manipulations to new helpers
We now have memalloc_noreclaim_{save,restore} helpers for robust setting
and clearing of PF_MEMALLOC.  Let's convert the code which was using the
generic tsk_restore_flags().  No functional change.

[vbabka@suse.cz: in net/core/sock.c the hunk is missing]
Link: http://lkml.kernel.org/r/20170405074700.29871-4-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Lee Duncan <lduncan@suse.com>
Cc: Chris Leech <cleech@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Boris Brezillon <boris.brezillon@free-electrons.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Wouter Verhelst <w@uter.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:15 -07:00
Vlastimil Babka 499118e966 mm: introduce memalloc_noreclaim_{save,restore}
The previous patch ("mm: prevent potential recursive reclaim due to
clearing PF_MEMALLOC") has shown that simply setting and clearing
PF_MEMALLOC in current->flags can result in wrongly clearing a
pre-existing PF_MEMALLOC flag and potentially lead to recursive reclaim.
Let's introduce helpers that support proper nesting by saving the
previous stat of the flag, similar to the existing memalloc_noio_* and
memalloc_nofs_* helpers.  Convert existing setting/clearing of
PF_MEMALLOC within mm to the new helpers.

There are no known issues with the converted code, but the change makes
it more robust.

Link: http://lkml.kernel.org/r/20170405074700.29871-3-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Boris Brezillon <boris.brezillon@free-electrons.com>
Cc: Chris Leech <cleech@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Lee Duncan <lduncan@suse.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:15 -07:00
Vlastimil Babka 62be1511b1 mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC
Patch series "more robust PF_MEMALLOC handling"

This series aims to unify the setting and clearing of PF_MEMALLOC, which
prevents recursive reclaim.  There are some places that clear the flag
unconditionally from current->flags, which may result in clearing a
pre-existing flag.  This already resulted in a bug report that Patch 1
fixes (without the new helpers, to make backporting easier).  Patch 2
introduces the new helpers, modelled after existing memalloc_noio_* and
memalloc_nofs_* helpers, and converts mm core to use them.  Patches 3
and 4 convert non-mm code.

This patch (of 4):

__alloc_pages_direct_compact() sets PF_MEMALLOC to prevent deadlock
during page migration by lock_page() (see the comment in
__unmap_and_move()).  Then it unconditionally clears the flag, which can
clear a pre-existing PF_MEMALLOC flag and result in recursive reclaim.
This was not a problem until commit a8161d1ed6 ("mm, page_alloc:
restructure direct compaction handling in slowpath"), because direct
compation was called only after direct reclaim, which was skipped when
PF_MEMALLOC flag was set.

Even now it's only a theoretical issue, as the new callsite of
__alloc_pages_direct_compact() is reached only for costly orders and
when gfp_pfmemalloc_allowed() is true, which means either
__GFP_NOMEMALLOC is in gfp_flags or in_interrupt() is true.  There is no
such known context, but let's play it safe and make
__alloc_pages_direct_compact() robust for cases where PF_MEMALLOC is
already set.

Fixes: a8161d1ed6 ("mm, page_alloc: restructure direct compaction handling in slowpath")
Link: http://lkml.kernel.org/r/20170405074700.29871-2-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Boris Brezillon <boris.brezillon@free-electrons.com>
Cc: Chris Leech <cleech@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Lee Duncan <lduncan@suse.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:15 -07:00
Oliver O'Halloran 3b6521f535 mm/huge_memory.c: deposit a pgtable for DAX PMD faults when required
Although all architectures use a deposited page table for THP on
anonymous VMAs, some architectures (s390 and powerpc) require the
deposited storage even for file backed VMAs due to quirks of their MMUs.

This patch adds support for depositing a table in DAX PMD fault handling
path for archs that require it.  Other architectures should see no
functional changes.

Link: http://lkml.kernel.org/r/20170411174233.21902-3-oohall@gmail.com
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: linux-nvdimm@ml01.01.org
Cc: Oliver O'Halloran <oohall@gmail.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:15 -07:00
Oliver O'Halloran c14a6eb44d mm/huge_memory.c: use zap_deposited_table() more
Depending on the flags of the PMD being zapped there may or may not be a
deposited pgtable to be freed.  In two of the three cases this is open
coded while the third uses the zap_deposited_table() helper.  This patch
converts the others to use the helper to clean things up a bit.

Link: http://lkml.kernel.org/r/20170411174233.21902-2-oohall@gmail.com
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: linux-nvdimm@ml01.01.org
Cc: Oliver O'Halloran <oohall@gmail.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:15 -07:00
Deepa Dinamani bfe1c56645 time: delete CURRENT_TIME_SEC and CURRENT_TIME
All uses of CURRENT_TIME_SEC and CURRENT_TIME macros have been replaced
by other time functions.  These macros are also not y2038 safe.  And,
all their use cases can be fulfilled by y2038 safe ktime_get_* variants.

Link: http://lkml.kernel.org/r/1491613030-11599-12-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:15 -07:00
Stephen Rothwell b32c8c7648 gfs2: replace CURRENT_TIME with current_time
Link: http://lkml.kernel.org/r/20170420161852.0492bc3f@canb.auug.org.au
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:15 -07:00
Deepa Dinamani 24d0d03c2e apparmorfs: replace CURRENT_TIME with current_time()
CURRENT_TIME macro is not y2038 safe on 32 bit systems.

The patch replaces all the uses of CURRENT_TIME by current_time().

This is also in preparation for the patch that transitions vfs
timestamps to use 64 bit time and hence make them y2038 safe.
current_time() is also planned to be transitioned to y2038 safe behavior
along with this change.

CURRENT_TIME macro will be deleted before merging the aforementioned
change.

Link: http://lkml.kernel.org/r/1491613030-11599-11-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:15 -07:00
Deepa Dinamani 47f38c539e lustre: replace CURRENT_TIME macro
CURRENT_TIME macro is not y2038 safe on 32 bit systems.

The patch replaces all the uses of CURRENT_TIME by current_time() for
filesystem times, and ktime_get_* functions for others.

struct timespec is also not y2038 safe.  Retain timespec for timestamp
representation here as lustre uses it internally everywhere.  These
references will be changed to use struct timespec64 in a separate patch.

This is also in preparation for the patch that transitions vfs
timestamps to use 64 bit time and hence make them y2038 safe.
current_time() is also planned to be transitioned to y2038 safe behavior
along with this change.

CURRENT_TIME macro will be deleted before merging the aforementioned
change.

Link: http://lkml.kernel.org/r/1491613030-11599-10-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: James Simmons <jsimmons@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:15 -07:00
Deepa Dinamani 607a11ad94 fs: ubifs: replace CURRENT_TIME_SEC with current_time
CURRENT_TIME_SEC is not y2038 safe.  current_time() will be transitioned
to use 64 bit time along with vfs in a separate patch.  There is no plan
to transition CURRENT_TIME_SEC to use y2038 safe time interfaces.

current_time() returns timestamps according to the granularities set in
the inode's super_block.  The granularity check to call
current_fs_time() or CURRENT_TIME_SEC is not required.

Use current_time() directly to update inode timestamp.  Use
timespec_trunc during file system creation, before the first inode is
created.

Link: http://lkml.kernel.org/r/1491613030-11599-9-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:15 -07:00
Deepa Dinamani a88e99e976 fs: ufs: use ktime_get_real_ts64() for birthtime
CURRENT_TIME is not y2038 safe.  Replace it with ktime_get_real_ts64().
Inode time formats are already 64 bit long and accommodates time64_t.

Link: http://lkml.kernel.org/r/1491613030-11599-6-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:15 -07:00
Deepa Dinamani 1134e09100 fs: ceph: CURRENT_TIME with ktime_get_real_ts()
CURRENT_TIME is not y2038 safe.  The macro will be deleted and all the
references to it will be replaced by ktime_get_* apis.

struct timespec is also not y2038 safe.  Retain timespec for timestamp
representation here as ceph uses it internally everywhere.  These
references will be changed to use struct timespec64 in a separate patch.

The current_fs_time() api is being changed to use vfs struct inode* as
an argument instead of struct super_block*.

Set the new mds client request r_stamp field using ktime_get_real_ts()
instead of using current_fs_time().

Also, since r_stamp is used as mtime on the server, use timespec_trunc()
to truncate the timestamp, using the right granularity from the
superblock.

This api will be transitioned to be y2038 safe along with vfs.

Link: http://lkml.kernel.org/r/1491613030-11599-5-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
M:	Ilya Dryomov <idryomov@gmail.com>
M:	"Yan, Zheng" <zyan@redhat.com>
M:	Sage Weil <sage@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:15 -07:00
Deepa Dinamani e37fea58f7 fs: cifs: replace CURRENT_TIME by other appropriate apis
CURRENT_TIME macro is not y2038 safe on 32 bit systems.

The patch replaces all the uses of CURRENT_TIME by current_time() for
filesystem times, and ktime_get_* functions for authentication
timestamps and timezone calculations.

This is also in preparation for the patch that transitions vfs
timestamps to use 64 bit time and hence make them y2038 safe.

CURRENT_TIME macro will be deleted before merging the aforementioned
change.

The inode timestamps read from the server are assumed to have correct
granularity and range.

The patch also assumes that the difference between server and client
times lie in the range INT_MIN..INT_MAX.  This is valid because this is
the difference between current times between server and client, and the
largest timezone difference is in the range of one day.

All cifs timestamps currently use timespec representation internally.
Authentication and timezone timestamps can also be transitioned into
using timespec64 when all other timestamps for cifs is transitioned to
use timespec64.

Link: http://lkml.kernel.org/r/1491613030-11599-4-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: Steve French <sfrench@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:15 -07:00
Deepa Dinamani 51aad0aee5 trace: make trace_hwlat timestamp y2038 safe
struct timespec is not y2038 safe on 32 bit machines and needs to be
replaced by struct timespec64 in order to represent times beyond year
2038 on such machines.

Fix all the timestamp representation in struct trace_hwlat and all the
corresponding implementations.

Link: http://lkml.kernel.org/r/1491613030-11599-3-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:15 -07:00
Deepa Dinamani 48fbfe50f1 fs: f2fs: use ktime_get_real_seconds for sit_info times
CURRENT_TIME_SEC is not y2038 safe.

Replace use of CURRENT_TIME_SEC with ktime_get_real_seconds in segment
timestamps used by GC algorithm including the segment mtime timestamps.

Link: http://lkml.kernel.org/r/1491613030-11599-2-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:15 -07:00
Kees Cook 063246641d format-security: move static strings to const
While examining output from trial builds with -Wformat-security enabled,
many strings were found that should be defined as "const", or as a char
array instead of char pointer.  This makes some static analysis easier,
by producing fewer false positives.

As these are all trivial changes, it seemed best to put them all in a
single patch rather than chopping them up per maintainer.

Link: http://lkml.kernel.org/r/20170405214711.GA5711@beast
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Jes Sorensen <jes@trained-monkey.org>	[runner.c]
Cc: Tony Lindgren <tony@atomide.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: David Airlie <airlied@linux.ie>
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Salil Mehta <salil.mehta@huawei.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Patrice Chotard <patrice.chotard@st.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Mugunthan V N <mugunthanvnm@ti.com>
Cc: Felipe Balbi <felipe.balbi@linux.intel.com>
Cc: Jarod Wilson <jarod@redhat.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Antonio Quartulli <a@unstable.cc>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Kejian Yan <yankejian@huawei.com>
Cc: Daode Huang <huangdaode@hisilicon.com>
Cc: Qianqian Xie <xieqianqian@huawei.com>
Cc: Philippe Reynes <tremyfr@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Christian Gromm <christian.gromm@microchip.com>
Cc: Andrey Shvetsov <andrey.shvetsov@k2l.de>
Cc: Jason Litzinger <jlitzingerdev@gmail.com>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:14 -07:00
SeongJae Park 929f9d285a Documentation/vm/transhuge.txt: fix trivial typos
[akpm@linux-foundation.org: fixes per Randy]
Link: http://lkml.kernel.org/r/20170405210259.2067-1-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:14 -07:00
Tetsuo Handa c718a97514 fs: semove set but not checked AOP_FLAG_UNINTERRUPTIBLE flag
Commit afddba49d1 ("fs: introduce write_begin, write_end, and
perform_write aops") introduced AOP_FLAG_UNINTERRUPTIBLE flag which was
checked in pagecache_write_begin(), but that check was removed by
4e02ed4b4a ("fs: remove prepare_write/commit_write").

Between these two commits, commit d9414774dc ("cifs: Convert cifs to
new aops.") added a check in cifs_write_begin(), but that check was soon
removed by commit a98ee8c1c7 ("[CIFS] fix regression in
cifs_write_begin/cifs_write_end").

Therefore, AOP_FLAG_UNINTERRUPTIBLE flag is checked nowhere.  Let's
remove this flag.  This patch has no functionality changes.

Link: http://lkml.kernel.org/r/1489294781-53494-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:14 -07:00
Andi Kleen f44a2920c8 include/linux/uaccess.h: remove expensive WARN_ON in pagefault_disabled_dec
pagefault_disabled_dec is frequently used inline, and it has a WARN_ON
for underflow that expands to about 6.5k of extra code.  The warning
doesn't seem to be that useful and worth so much code so remove it.

If it was needed could make it depending on some debug kernel option.

Saves ~6.5k in my kernel

     text    data     bss     dec     hex filename
  9039417 5367568 11116544        25523529        1857549 vmlinux-before-pf
  9032805 5367568 11116544        25516917        1855b75 vmlinux-pf

Link: http://lkml.kernel.org/r/20170315021431.13107-8-andi@firstfloor.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:14 -07:00
Andi Kleen 68b43744c1 drivers/scsi/megaraid: remove expensive inline from megasas_return_cmd
Remove an inline from a fairly big function that is used often.  It's
unlikely that calling or not calling it makes a lot of difference.

Saves around 8k text in my kernel.

     text    data     bss     dec     hex filename
  9047801 5367568 11116544        25531913        1859609 vmlinux-before-megasas
  9039417 5367568 11116544        25523529        1857549 vmlinux-megasas

Link: http://lkml.kernel.org/r/20170315021431.13107-7-andi@firstfloor.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Kashyap Desai <kashyap.desai@avagotech.com>
Cc: Sumit Saxena <sumit.saxena@avagotech.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:14 -07:00
Andi Kleen ec48c940da kref: remove WARN_ON for NULL release functions
The kref functions check for NULL release functions.  This WARN_ON seems
rather pointless.  We will eventually release and then just crash
nicely.  It is also somewhat expensive because these functions are
inlined in a lot of places.  Removing the WARN_ONs saves around 2.3k in
this kernel (likely more in others with more drivers)

     text    data     bss     dec     hex filename
  9083992 5367600 11116544        25568136        1862388 vmlinux-before-load-avg
  9070166 5367600 11116544        25554310        185ed86 vmlinux-load-avg

Link: http://lkml.kernel.org/r/20170315021431.13107-5-andi@firstfloor.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:14 -07:00
Laura Abbott e6ccbff0e9 treewide: decouple cacheflush.h and set_memory.h
Now that all call sites, completely decouple cacheflush.h and
set_memory.h

[sfr@canb.auug.org.au: kprobes/x86: merge fix for set_memory.h decoupling]
  Link: http://lkml.kernel.org/r/20170418180903.10300fd3@canb.auug.org.au
Link: http://lkml.kernel.org/r/1488920133-27229-17-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:14 -07:00
Andrew Morton 880d5a36ef drivers/staging/media/atomisp/pci/atomisp2: use set_memory.h
Cc: Laura Abbott <labbott@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:14 -07:00
Laura Abbott 8d5a1181f3 drivers/video/fbdev/vermilion/vermilion.c: use set_memory.h header
set_memory_* functions have moved to set_memory.h.  Switch to this
explicitly.

Link: http://lkml.kernel.org/r/1488920133-27229-16-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:14 -07:00
Laura Abbott 056d16b214 drivers/misc/sram-exec.c: use set_memory.h header
set_memory_* functions have moved to set_memory.h.  Switch to this
explicitly.

Link: http://lkml.kernel.org/r/1488920133-27229-15-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:14 -07:00
Laura Abbott 7f80f51358 alsa: use set_memory.h header
set_memory_* functions have moved to set_memory.h.  Switch to this
explicitly.

Link: http://lkml.kernel.org/r/1488920133-27229-14-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:14 -07:00
Laura Abbott 50327ddfbc kernel/power/snapshot.c: use set_memory.h header
set_memory_* functions have moved to set_memory.h.  Switch to this
explicitly.

Link: http://lkml.kernel.org/r/1488920133-27229-13-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:14 -07:00
Laura Abbott bbca07c307 kernel/module.c: use set_memory.h header
set_memory_* functions have moved to set_memory.h.  Switch to this
explicitly.

Link: http://lkml.kernel.org/r/1488920133-27229-12-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Jessica Yu <jeyu@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:14 -07:00
Laura Abbott 2d0bde57f3 include/linux/filter.h: use set_memory.h header
set_memory_* functions have moved to set_memory.h.  Switch to this
explicitly.

Link: http://lkml.kernel.org/r/1488920133-27229-11-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:14 -07:00
Laura Abbott 23f19a563b drivers/watchdog/hpwdt.c: use set_memory.h header
set_memory_* functions have moved to set_memory.h.  Switch to this
explicitly.

Link: http://lkml.kernel.org/r/1488920133-27229-10-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:14 -07:00
Laura Abbott 0c14dac9a4 drivers/hwtracing/intel_th/msu.c: use set_memory.h header
set_memory_* functions have moved to set_memory.h.  Switch to this
explicitly.

Link: http://lkml.kernel.org/r/1488920133-27229-9-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:14 -07:00
Laura Abbott ed3ba07946 drm: use set_memory.h header
set_memory_* functions have moved to set_memory.h.  Switch to this
explicitly.

[akpm@linux-foundation.org: track drivers/gpu/drm/i915/i915_gem_gtt.c linux-next changes]
Link: http://lkml.kernel.org/r/1488920133-27229-8-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:14 -07:00
Laura Abbott e47036b45a agp: use set_memory.h header
set_memory_* functions have moved to set_memory.h.  Switch to this
explicitly.

Link: http://lkml.kernel.org/r/1488920133-27229-7-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:13 -07:00
Laura Abbott d11636511e x86: use set_memory.h header
set_memory_* functions have moved to set_memory.h.  Switch to this
explicitly.

Link: http://lkml.kernel.org/r/1488920133-27229-6-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:13 -07:00
Laura Abbott e6c7c63001 s390: use set_memory.h header
set_memory_* functions have moved to set_memory.h.  Switch to this
explicitly

Link: http://lkml.kernel.org/r/1488920133-27229-5-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:13 -07:00
Laura Abbott d4bbc30bb0 arm64: use set_memory.h header
The set_memory_* functions have moved to set_memory.h.  Use that header
explicitly.

Link: http://lkml.kernel.org/r/1488920133-27229-4-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:13 -07:00
Laura Abbott 74d86a7063 arm: use set_memory.h header
set_memory_* functions have moved to set_memory.h.  Switch to this
explicitly

Link: http://lkml.kernel.org/r/1488920133-27229-3-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:13 -07:00
Laura Abbott 299878bac3 treewide: move set_memory_* functions away from cacheflush.h
Patch series "set_memory_* functions header refactor", v3.

The set_memory_* APIs came out of a desire to have a better way to
change memory attributes.  Many of these attributes were linked to cache
functionality so the prototypes were put in cacheflush.h.  These days,
the APIs have grown and have a much wider use than just cache APIs.  To
support this growth, split off set_memory_* and friends into a separate
header file to avoid growing cacheflush.h for APIs that have nothing to
do with caches.

Link: http://lkml.kernel.org/r/1488920133-27229-2-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:13 -07:00
Joe Perches 8ac1ed7914 treewide: spelling: correct diffrent[iate] and banlance typos
Add these misspellings to scripts/spelling.txt too

Link: http://lkml.kernel.org/r/962aace119675e5fe87be2a88ddac1a5486f8e60.1490931810.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:13 -07:00
Masahiro Yamada 6e7c2b4dd3 scripts/spelling.txt: add "intialise(d)" pattern and fix typo instances
Fix typos and add the following to the scripts/spelling.txt:

  intialisation||initialisation
  intialised||initialised
  intialise||initialise

This commit does not intend to change the British spelling itself.

Link: http://lkml.kernel.org/r/1481573103-11329-18-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:13 -07:00
Stephen Boyd ad61dd303a scripts/spelling.txt: add regsiter -> register spelling mistake
This typo is quite common.  Fix it and add it to the spelling file so
that checkpatch catches it earlier.

Link: http://lkml.kernel.org/r/20170317011131.6881-2-sboyd@codeaurora.org
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:13 -07:00
Stephen Boyd d1b7c9344b scripts/spelling.txt: add "memory" pattern and fix typos
Fix typos and add the following to the scripts/spelling.txt:

      momery||memory

Link: http://lkml.kernel.org/r/20170317011131.6881-1-sboyd@codeaurora.org
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:13 -07:00
Michal Hocko 19809c2da2 mm, vmalloc: use __GFP_HIGHMEM implicitly
__vmalloc* allows users to provide gfp flags for the underlying
allocation.  This API is quite popular

  $ git grep "=[[:space:]]__vmalloc\|return[[:space:]]*__vmalloc" | wc -l
  77

The only problem is that many people are not aware that they really want
to give __GFP_HIGHMEM along with other flags because there is really no
reason to consume precious lowmemory on CONFIG_HIGHMEM systems for pages
which are mapped to the kernel vmalloc space.  About half of users don't
use this flag, though.  This signals that we make the API unnecessarily
too complex.

This patch simply uses __GFP_HIGHMEM implicitly when allocating pages to
be mapped to the vmalloc space.  Current users which add __GFP_HIGHMEM
are simplified and drop the flag.

Link: http://lkml.kernel.org/r/20170307141020.29107-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Cristopher Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:13 -07:00
Huang Ying 54f180d3c1 mm, swap: use kvzalloc to allocate some swap data structures
Now vzalloc() is used in swap code to allocate various data structures,
such as swap cache, swap slots cache, cluster info, etc.  Because the
size may be too large on some system, so that normal kzalloc() may fail.
But using kzalloc() has some advantages, for example, less memory
fragmentation, less TLB pressure, etc.  So change the data structure
allocation in swap code to use kvzalloc() which will try kzalloc()
firstly, and fallback to vzalloc() if kzalloc() failed.

In general, although kmalloc() will reduce the number of high-order
pages in short term, vmalloc() will cause more pain for memory
fragmentation in the long term.  And the swap data structure allocation
that is changed in this patch is expected to be long term allocation.

From Dave Hansen:
 "for example, we have a two-page data structure. vmalloc() takes two
  effectively random order-0 pages, probably from two different 2M pages
  and pins them. That "kills" two 2M pages. kmalloc(), allocating two
  *contiguous* pages, will not cross a 2M boundary. That means it will
  only "kill" the possibility of a single 2M page. More 2M pages == less
  fragmentation.

The allocation in this patch occurs during swap on time, which is
usually done during system boot, so usually we have high opportunity to
allocate the contiguous pages successfully.

The allocation for swap_map[] in struct swap_info_struct is not changed,
because that is usually quite large and vmalloc_to_page() is used for
it.  That makes it a little harder to change.

Link: http://lkml.kernel.org/r/20170407064911.25447-1-ying.huang@intel.com
Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Tim Chen <tim.c.chen@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:13 -07:00
Michal Hocko bc4e54f6e9 drivers/md/bcache/super.c: use kvmalloc
bcache_device_init uses kmalloc for small requests and vmalloc for those
which are larger than 64 pages.  This alone is a strange criterion.
Moreover kmalloc can fallback to vmalloc on the failure.  Let's simply
use kvmalloc instead as it knows how to handle the fallback properly

Link: http://lkml.kernel.org/r/20170306103327.2766-5-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:13 -07:00
Michal Hocko d224e93818 drivers/md/dm-ioctl.c: use kvmalloc rather than opencoded variant
copy_params uses kmalloc with vmalloc fallback.  We already have a
helper for that - kvmalloc.  This caller requires GFP_NOIO semantic so
it hasn't been converted with many others by previous patches.  All we
need to achieve this semantic is to use the scope
memalloc_noio_{save,restore} around kvmalloc.

Link: http://lkml.kernel.org/r/20170306103327.2766-4-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:13 -07:00