linux

Commit Graph

Author	SHA1	Message	Date
Joe Perches	45c55e92fc	checkpatch: warn on logging continuations pr_cont(...) and printk(KERN_CONT ...) uses should be discouraged as their output can be interleaved by multiple logging processes. Link: http://lkml.kernel.org/r/7100ba00098694ec81471a299583ed068975fd05.1483465888.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:57 -08:00
Joe Perches	77cb8546bc	checkpatch: warn on embedded function names Embedded function names are less appropriate to use when refactoring can cause function renaming. Prefer the use of "%s", __func__ to embedded function names. Link: http://lkml.kernel.org/r/ac9631fdbac5af3507c5bfe88ad9064f0ed764ec.1483510416.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:57 -08:00
Sven Schmidt	69c78423b8	lib/lz4: remove back-compat wrappers Remove the functions introduced as wrappers for providing backwards compatibility to the prior LZ4 version. They're not needed anymore since there's no callers left. Link: http://lkml.kernel.org/r/1486321748-19085-6-git-send-email-4sschmid@informatik.uni-hamburg.de Signed-off-by: Sven Schmidt <4sschmid@informatik.uni-hamburg.de> Cc: Bongkyu Kim <bongkyu.kim@lge.com> Cc: Rui Salvaterra <rsalvaterra@gmail.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: David S. Miller <davem@davemloft.net> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Kees Cook <keescook@chromium.org> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:57 -08:00
Sven Schmidt	d21b5ff12d	fs/pstore: fs/squashfs: change usage of LZ4 to work with new LZ4 version Update fs/pstore and fs/squashfs to use the updated functions from the new LZ4 module. Link: http://lkml.kernel.org/r/1486321748-19085-5-git-send-email-4sschmid@informatik.uni-hamburg.de Signed-off-by: Sven Schmidt <4sschmid@informatik.uni-hamburg.de> Cc: Bongkyu Kim <bongkyu.kim@lge.com> Cc: Rui Salvaterra <rsalvaterra@gmail.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: David S. Miller <davem@davemloft.net> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Kees Cook <keescook@chromium.org> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:57 -08:00
Sven Schmidt	73a15ac6d5	crypto: change LZ4 modules to work with new LZ4 module version Update the crypto modules using LZ4 compression as well as the test cases in testmgr.h to work with the new LZ4 module version. Link: http://lkml.kernel.org/r/1486321748-19085-4-git-send-email-4sschmid@informatik.uni-hamburg.de Signed-off-by: Sven Schmidt <4sschmid@informatik.uni-hamburg.de> Cc: Bongkyu Kim <bongkyu.kim@lge.com> Cc: Rui Salvaterra <rsalvaterra@gmail.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: David S. Miller <davem@davemloft.net> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Kees Cook <keescook@chromium.org> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:57 -08:00
Sven Schmidt	e23d54e483	lib/decompress_unlz4: change module to work with new LZ4 module version Update the unlz4 wrapper to work with the updated LZ4 kernel module version. Link: http://lkml.kernel.org/r/1486321748-19085-3-git-send-email-4sschmid@informatik.uni-hamburg.de Signed-off-by: Sven Schmidt <4sschmid@informatik.uni-hamburg.de> Cc: Bongkyu Kim <bongkyu.kim@lge.com> Cc: Rui Salvaterra <rsalvaterra@gmail.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: David S. Miller <davem@davemloft.net> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Kees Cook <keescook@chromium.org> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:57 -08:00
Sven Schmidt	4e1a33b105	lib: update LZ4 compressor module Patch series "Update LZ4 compressor module", v7. This patchset updates the LZ4 compression module to a version based on LZ4 v1.7.3 allowing to use the fast compression algorithm aka LZ4 fast which provides an "acceleration" parameter as a tradeoff between high compression ratio and high compression speed. We want to use LZ4 fast in order to support compression in lustre and (mostly, based on that) investigate data reduction techniques in behalf of storage systems. Also, it will be useful for other users of LZ4 compression, as with LZ4 fast it is possible to enable applications to use fast and/or high compression depending on the usecase. For instance, ZRAM is offering a LZ4 backend and could benefit from an updated LZ4 in the kernel. LZ4 homepage: http://www.lz4.org/ LZ4 source repository: https://github.com/lz4/lz4 Source version: 1.7.3 Benchmark (taken from [1], Core i5-4300U @1.9GHz): ----------------\|--------------\|----------------\|---------- Compressor \| Compression \| Decompression \| Ratio ----------------\|--------------\|----------------\|---------- memcpy \| 4200 MB/s \| 4200 MB/s \| 1.000 LZ4 fast 50 \| 1080 MB/s \| 2650 MB/s \| 1.375 LZ4 fast 17 \| 680 MB/s \| 2220 MB/s \| 1.607 LZ4 fast 5 \| 475 MB/s \| 1920 MB/s \| 1.886 LZ4 default \| 385 MB/s \| 1850 MB/s \| 2.101 [1] http://fastcompression.blogspot.de/2015/04/sampling-or-faster-lz4.html [PATCH 1/5] lib: Update LZ4 compressor module [PATCH 2/5] lib/decompress_unlz4: Change module to work with new LZ4 module version [PATCH 3/5] crypto: Change LZ4 modules to work with new LZ4 module version [PATCH 4/5] fs/pstore: fs/squashfs: Change usage of LZ4 to work with new LZ4 version [PATCH 5/5] lib/lz4: Remove back-compat wrappers This patch (of 5): Update the LZ4 kernel module to LZ4 v1.7.3 by Yann Collet. The kernel module is inspired by the previous work by Chanho Min. The updated LZ4 module will not break existing code since the patchset contains appropriate changes. API changes: New method LZ4_compress_fast which differs from the variant available in kernel by the new acceleration parameter, allowing to trade compression ratio for more compression speed and vice versa. LZ4_decompress_fast is the respective decompression method, featuring a very fast decoder (multiple GB/s per core), able to reach RAM speed in multi-core systems. The decompressor allows to decompress data compressed with LZ4 fast as well as the LZ4 HC (high compression) algorithm. Also the useful functions LZ4_decompress_safe_partial and LZ4_compress_destsize were added. The latter reverses the logic by trying to compress as much data as possible from source to dest while the former aims to decompress partial blocks of data. A bunch of streaming functions were also added which allow compressig/decompressing data in multiple steps (so called "streaming mode"). The methods lz4_compress and lz4_decompress_unknownoutputsize are now known as LZ4_compress_default respectivley LZ4_decompress_safe. The old methods will be removed since there's no callers left in the code. [arnd@arndb.de: fix KERNEL_LZ4 support] Link: http://lkml.kernel.org/r/20170208211946.2839649-1-arnd@arndb.de [akpm@linux-foundation.org: simplify] [akpm@linux-foundation.org: fix the simplification] [4sschmid@informatik.uni-hamburg.de: fix performance regressions] Link: http://lkml.kernel.org/r/1486898178-17125-2-git-send-email-4sschmid@informatik.uni-hamburg.de [4sschmid@informatik.uni-hamburg.de: v8] Link: http://lkml.kernel.org/r/1487182598-15351-2-git-send-email-4sschmid@informatik.uni-hamburg.de Link: http://lkml.kernel.org/r/1486321748-19085-2-git-send-email-4sschmid@informatik.uni-hamburg.de Signed-off-by: Sven Schmidt <4sschmid@informatik.uni-hamburg.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Bongkyu Kim <bongkyu.kim@lge.com> Cc: Rui Salvaterra <rsalvaterra@gmail.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: David S. Miller <davem@davemloft.net> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Kees Cook <keescook@chromium.org> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:57 -08:00
Paul Gortmaker	8893f51933	lib/test_sort.c: make it explicitly non-modular The Kconfig currently controlling compilation of this code is: lib/Kconfig.debug:config TEST_SORT lib/Kconfig.debug: bool "Array-based sort test" ...meaning that it currently is not being built as a module by anyone. Lets remove the couple traces of modular infrastructure use, so that when reading the code there is no doubt it is builtin-only. Since module_init translates to device_initcall in the non-modular case, the init ordering becomes slightly earlier when we change it to use subsys_initcall as done here. However, since it is a self contained test, this shouldn't be an issue and subsys_initcall seems like a better fit for this particular case. We also delete the MODULE_LICENSE tag since that information is now contained at the top of the file in the comments. Link: http://lkml.kernel.org/r/20170124225608.7319-1-paul.gortmaker@windriver.com Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Kostenzer Felix <fkostenzer@live.at> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:57 -08:00
Kostenzer Felix	c5adae9583	lib: add CONFIG_TEST_SORT to enable self-test of sort() Along with the addition made to Kconfig.debug, the prior existing but permanently disabled test function has been slightly refactored. Patch has been tested using QEMU 2.1.2 with a .config obtained through 'make defconfig' (x86_64) and manually enabling the option. [arnd@arndb.de: move sort self-test into a separate file] Link: http://lkml.kernel.org/r/20170112110657.3123790-1-arnd@arndb.de Link: http://lkml.kernel.org/r/HE1PR09MB0394B0418D504DCD27167D4FD49B0@HE1PR09MB0394.eurprd09.prod.outlook.com Signed-off-by: Kostenzer Felix <fkostenzer@live.at> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:57 -08:00
Kees Cook	f231aebfc4	rbtree: use designated initializers Prepare to mark sensitive kernel structures for randomization by making sure they're using designated initializers. These were identified during allyesconfig builds of x86, arm, and arm64, with most initializer fixes extracted from grsecurity. Link: http://lkml.kernel.org/r/20161217010253.GA140470@beast Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David Howells <dhowells@redhat.com> Cc: Jie Chen <fykcee1@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:57 -08:00
Niklas Söderlund	4f5901f5a6	linux/kernel.h: fix DIV_ROUND_CLOSEST to support negative divisors While working on a thermal driver I encounter a scenario where the divisor could be negative, instead of adding local code to handle this I though I first try to add support for this in DIV_ROUND_CLOSEST. Add support to DIV_ROUND_CLOSEST for negative divisors if both dividend and divisor variable types are signed. This should not alter current behavior for users of the macro as previously negative divisors where not supported. Before: DIV_ROUND_CLOSEST( 59, 4) = 15 DIV_ROUND_CLOSEST( 59, -4) = -14 DIV_ROUND_CLOSEST( -59, 4) = -15 DIV_ROUND_CLOSEST( -59, -4) = 14 After: DIV_ROUND_CLOSEST( 59, 4) = 15 DIV_ROUND_CLOSEST( 59, -4) = -15 DIV_ROUND_CLOSEST( -59, 4) = -15 DIV_ROUND_CLOSEST( -59, -4) = 15 [akpm@linux-foundation.org: fix comment, per Guenter] Link: http://lkml.kernel.org/r/20161222102217.29011-1-niklas.soderlund+renesas@ragnatech.se Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:57 -08:00
Matthew Wilcox	e4afd2e556	lib/find_bit.c: micro-optimise find_next_*_bit This saves 32 bytes on my x86-64 build, mostly due to alignment considerations and sharing more code between find_next_bit and find_next_zero_bit, but it does save a couple of instructions. There's really two parts to this commit: - First, the first half of the test: (!nbits \|\| start >= nbits) is trivially a subset of the second half, since nbits and start are both unsigned - Second, while looking at the disassembly, I noticed that GCC was predicting the branch taken. Since this is a failure case, it's clearly the less likely of the two branches, so add an unlikely() to override GCC's heuristics. [mawilcox@microsoft.com: v2] Link: http://lkml.kernel.org/r/1483709016-1834-1-git-send-email-mawilcox@linuxonhyperv.com Link: http://lkml.kernel.org/r/1483709016-1834-1-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Acked-by: Yury Norov <ynorov@caviumnetworks.com> Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:57 -08:00
Geert Uytterhoeven	55ded9551f	lib: add module support to atomic64 tests Allow to compile the atomic64 test code either to a loadable module, or builtin into the kernel. Link: http://lkml.kernel.org/r/1483470276-10517-3-git-send-email-geert@linux-m68k.org Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:57 -08:00
Geert Uytterhoeven	ba95b045e9	lib: add module support to glob tests Extract the glob test code into its own source file, to allow to compile it either to a loadable module, or builtin into the kernel. Link: http://lkml.kernel.org/r/1483470276-10517-2-git-send-email-geert@linux-m68k.org Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:57 -08:00
Geert Uytterhoeven	5fb7f87408	lib: add module support to crc32 tests Extract the crc32 test code into its own source file, to allow to compile it either to a loadable module, or builtin into the kernel. Link: http://lkml.kernel.org/r/1483470276-10517-1-git-send-email-geert@linux-m68k.org Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:57 -08:00
Bhumika Goyal	738bc38d49	kernel/ksysfs.c: add __ro_after_init to bin_attribute structure The object notes_attr of type bin_attribute is not modified after getting initailized by ksysfs_init. Apart from initialization in ksysfs_init it is also passed as an argument to the function sysfs_create_bin_file but this argument is of type const. Therefore, add __ro_after_init to its declaration. Link: http://lkml.kernel.org/r/1486839969-16891-1-git-send-email-bhumirks@gmail.com Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Viresh Kumar	3e6daded1f	kernel/notifier.c: simplify expression NOTIFY_STOP_MASK (0x8000) has only one bit set and there is no need to compare output of "ret & NOTIFY_STOP_MASK" to NOTIFY_STOP_MASK. We just need to make sure the output is non-zero, that's it. Link: http://lkml.kernel.org/r/88ee58264a2bfab1c97ffc8ac753e25f55f57c10.1483593065.git.viresh.kumar@linaro.org Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Yisheng Xie	9c57b5808c	mm balloon: umount balloon_mnt when removing vb device With CONFIG_BALLOON_COMPACTION=y the kernel will mount balloon_mnt for balloon page migration when we probe a virtio_balloon device. However we do not unmount it when removing the device. Fix this. Fixes: `b1123ea6d3` ("mm: balloon: use general non-lru movable page feature") Link: http://lkml.kernel.org/r/1486531318-35189-1-git-send-email-xieyisheng1@huawei.com Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Rafael Aquini <aquini@redhat.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Gioh Kim <gi-oh.kim@profitbricks.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Kees Cook	85caa95b9f	bug: switch data corruption check to __must_check The CHECK_DATA_CORRUPTION() macro was designed to have callers do something meaningful/protective on failure. However, using "return false" in the macro too strictly limits the design patterns of callers. Instead, let callers handle the logic test directly, but make sure that the result IS checked by forcing __must_check (which appears to not be able to be used directly on macro expressions). Link: http://lkml.kernel.org/r/20170206204547.GA125312@beast Signed-off-by: Kees Cook <keescook@chromium.org> Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Gideon Israel Dsouza	849de0cd2c	m68k: replace gcc specific macros with ones from compiler.h There is <linux/compiler.h> which provides macros for various gcc specific constructs. Eg: __weak for __attribute__((weak)). I've cleaned all instances of gcc specific attributes with the right macros for all files under /arch/m68k Link: http://lkml.kernel.org/r/1485540901-1988-3-git-send-email-gidisrael@gmail.com Signed-off-by: Gideon Israel Dsouza <gidisrael@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Gideon Israel Dsouza	a3f0825e7e	compiler-gcc.h: add a new macro to wrap gcc attribute Add __mode(x) into compiler-gcc.h as part of a cleanup task I've taken up, to replace gcc specific attributes with macros. The next patch is a cleanup of the m68k subsystem and it requires a new macro to wrap __attribute__ ((mode (...))) Link: http://lkml.kernel.org/r/1485540901-1988-2-git-send-email-gidisrael@gmail.com Signed-off-by: Gideon Israel Dsouza <gidisrael@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Masahiro Yamada	16f4e31951	include/linux/iopoll.h: include <linux/ktime.h> instead of <linux/hrtimer.h> The timer APIs this header needs are ktime_get(), ktime_add_us(), and ktime_compare(). So, including <linux/ktime.h> seems enough. This commit will cut unnecessary header file parsing. Link: http://lkml.kernel.org/r/1481679225-10885-1-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Mike Frysinger	1ca5eebb89	uapi: mqueue.h: add missing linux/types.h include Commit `63159f5dcc` ("uapi: Use __kernel_long_t in struct mq_attr") changed the types from long to __kernel_long_t, but didn't add a linux/types.h include. Code that tries to include this header directly breaks: /usr/include/linux/mqueue.h:26:2: error: unknown type name '__kernel_long_t' __kernel_long_t mq_flags; /* message queue flags */ This also upsets configure tests for this header: checking linux/mqueue.h usability... no checking linux/mqueue.h presence... yes configure: WARNING: linux/mqueue.h: present but cannot be compiled configure: WARNING: linux/mqueue.h: check for missing prerequisite headers? configure: WARNING: linux/mqueue.h: see the Autoconf documentation configure: WARNING: linux/mqueue.h: section "Present But Cannot Be Compiled" configure: WARNING: linux/mqueue.h: proceeding with the compiler's result checking for linux/mqueue.h... no Link: http://lkml.kernel.org/r/20170119194644.4403-1-vapier@gentoo.org Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Lafcadio Wluiki	796f571b0c	procfs: use an enum for possible hidepid values Previously, the hidepid parameter was checked by comparing literal integers 0, 1, 2. Let's add a proper enum for this, to make the checking more expressive: 0 → HIDEPID_OFF 1 → HIDEPID_NO_ACCESS 2 → HIDEPID_INVISIBLE This changes the internal labelling only, the userspace-facing interface remains unmodified, and still works with literal integers 0, 1, 2. No functional changes. Link: http://lkml.kernel.org/r/1484572984-13388-2-git-send-email-djalal@gmail.com Signed-off-by: Lafcadio Wluiki <wluikil@gmail.com> Signed-off-by: Djalal Harouni <tixxdz@gmail.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Alexey Dobriyan	a0a07b87f3	proc: less code duplication in /proc/*/cmdline After staring at this code for a while I've figured using small 2-entry array describing ARGV and ENVP is the way to address code duplication critique. Link: http://lkml.kernel.org/r/20170105185724.GA12027@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Geliang Tang	4e4a7fb7b4	proc: use rb_entry() To make the code clearer, use rb_entry() instead of container_of() to deal with rbtree. Link: http://lkml.kernel.org/r/4fd1f82818665705ce75c5156a060ae7caa8e0a9.1482160150.git.geliangtang@gmail.com Signed-off-by: Geliang Tang <geliangtang@gmail.com> Cc: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "David S. Miller" <davem@davemloft.net> Cc: Juergen Gross <jgross@suse.com> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Davidlohr Bueso	35ca6953ca	alpha: use generic current.h Given that the arch does not add its own implementations, simply use the asm-generic/current.h (generic-y) header instead of duplicating code. Link: http://lkml.kernel.org/r/1485992878-4780-2-git-send-email-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Sudip Mukherjee	9ef5ea2013	arch/frv/mb93090-mb00/pci-frv.c: fix build warning The build of frv defconfig gives warning: arch/frv/mb93090-mb00/pci-frv.c:176:5: warning: ignoring return value of 'pci_assign_resource', declared with attribute warn_unused_result Just print an error message to silence the warning. We can not do much here on error. Link: http://lkml.kernel.org/r/1484256471-5379-1-git-send-email-sudipm.mukherjee@gmail.com Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Greg Thelen	0386bf385d	kasan: add memcg kmem_cache test Make a kasan test which uses a SLAB_ACCOUNT slab cache. If the test is run within a non default memcg, then it uncovers the bug fixed by "kasan: drain quarantine of memcg slab objects"[1]. If run without fix [1] it shows "Slab cache still has objects", and the kmem_cache structure is leaked. Here's an unpatched kernel test: $ dmesg -c > /dev/null $ mkdir /sys/fs/cgroup/memory/test $ echo $$ > /sys/fs/cgroup/memory/test/tasks $ modprobe test_kasan 2> /dev/null $ dmesg \| grep -B1 still [ 123.456789] kasan test: memcg_accounted_kmem_cache allocate memcg accounted object [ 124.456789] kmem_cache_destroy test_cache: Slab cache still has objects Kernels with fix [1] don't have the "Slab cache still has objects" warning or the underlying leak. The new test runs and passes in the default (root) memcg, though in the root memcg it won't uncover the problem fixed by [1]. Link: http://lkml.kernel.org/r/1482257462-36948-2-git-send-email-gthelen@google.com Signed-off-by: Greg Thelen <gthelen@google.com> Reviewed-by: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Greg Thelen	f9fa1d919c	kasan: drain quarantine of memcg slab objects Per memcg slab accounting and kasan have a problem with kmem_cache destruction. - kmem_cache_create() allocates a kmem_cache, which is used for allocations from processes running in root (top) memcg. - Processes running in non root memcg and allocating with either __GFP_ACCOUNT or from a SLAB_ACCOUNT cache use a per memcg kmem_cache. - Kasan catches use-after-free by having kfree() and kmem_cache_free() defer freeing of objects. Objects are placed in a quarantine. - kmem_cache_destroy() destroys root and non root kmem_caches. It takes care to drain the quarantine of objects from the root memcg's kmem_cache, but ignores objects associated with non root memcg. This causes leaks because quarantined per memcg objects refer to per memcg kmem cache being destroyed. To see the problem: 1) create a slab cache with kmem_cache_create(,,,SLAB_ACCOUNT,) 2) from non root memcg, allocate and free a few objects from cache 3) dispose of the cache with kmem_cache_destroy() kmem_cache_destroy() will trigger a "Slab cache still has objects" warning indicating that the per memcg kmem_cache structure was leaked. Fix the leak by draining kasan quarantined objects allocated from non root memcg. Racing memcg deletion is tricky, but handled. kmem_cache_destroy() => shutdown_memcg_caches() => __shutdown_memcg_cache() => shutdown_cache() flushes per memcg quarantined objects, even if that memcg has been rmdir'd and gone through memcg_deactivate_kmem_caches(). This leak only affects destroyed SLAB_ACCOUNT kmem caches when kasan is enabled. So I don't think it's worth patching stable kernels. Link: http://lkml.kernel.org/r/1482257462-36948-1-git-send-email-gthelen@google.com Signed-off-by: Greg Thelen <gthelen@google.com> Reviewed-by: Vladimir Davydov <vdavydov.dev@gmail.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Nathan Fontenot	dc18d706a4	memory-hotplug: use dev_online for memhp_auto_online Commit `31bc3858ea` ("add automatic onlining policy for the newly added memory") provides the capability to have added memory automatically onlined during add, but this appears to be slightly broken. The current implementation uses walk_memory_range() to call online_memory_block, which uses memory_block_change_state() to online the memory. Instead, we should be calling device_online() for the memory block in online_memory_block(). This would online the memory (the memory bus online routine memory_subsys_online() called from device_online calls memory_block_change_state()) and properly update the device struct offline flag. As a result of the current implementation, attempting to remove a memory block after adding it using auto online fails. This is because doing a remove, for instance echo offline > /sys/devices/system/memory/memoryXXX/state uses device_offline() which checks the dev->offline flag. Link: http://lkml.kernel.org/r/20170222220744.8119.19687.stgit@ltcalpine2-lp14.aus.stglabs.ibm.com Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Minchan Kim	dd8416c477	mm: do not access page->mapping directly on page_endio With rw_page, page_endio is used for completing IO on a page and it propagates write error to the address space if the IO fails. The problem is it accesses page->mapping directly which might be okay for file-backed pages but it shouldn't for anonymous page. Otherwise, it can corrupt one of field from anon_vma under us and system goes panic randomly. swap_writepage bdev_writepage ops->rw_page I encountered the BUG during developing new zram feature and it was really hard to figure it out because it made random crash, somtime mmap_sem lockdep, sometime other places where places never related to zram/zsmalloc, and not reproducible with some configuration. When I consider how that bug is subtle and people do fast-swap test with brd, it's worth to add stable mark, I think. Fixes: `dd6bd0d9c7` ("swap: use bdev_read_page() / bdev_write_page()") Signed-off-by: Minchan Kim <minchan@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Aneesh Kumar K.V	9a8b300f2f	mm/thp/autonuma: use TNF flag instead of vm fault We are using the wrong flag value in task_numa_falt function. This can result in us doing wrong numa fault statistics update, because we update num_pages_migrate and numa_fault_locality etc based on the flag argument passed. Fixes: `bae473a423` ("mm: introduce fault_env") Link: http://lkml.kernel.org/r/1487498395-9544-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Rik van Riel <riel@surriel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Aneesh Kumar K.V	db08f2030a	mm/gup: check for protnone only if it is a PTE entry Do the prot_none/FOLL_NUMA check after we are sure this is a THP pte. Archs can implement prot_none such that it can return true for regular pmd entries. Link: http://lkml.kernel.org/r/1487498326-8734-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Rik van Riel <riel@surriel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Miles Chen	199eaa05ad	mm: cleanups for printing phys_addr_t and dma_addr_t cleanup rest of dma_addr_t and phys_addr_t type casting in mm use %pad for dma_addr_t use %pa for phys_addr_t Link: http://lkml.kernel.org/r/1486618489-13912-1-git-send-email-miles.chen@mediatek.com Signed-off-by: Miles Chen <miles.chen@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Yisheng Xie	b538e422e4	mm/zsmalloc: fix comment in zsmalloc The class index and fullness group are not encoded in (first)page->mapping any more, after commit `3783689a1a` ("zsmalloc: introduce zspage structure"). Instead, they are store in struct zspage. Just delete this unneeded comment. Link: http://lkml.kernel.org/r/1486620822-36826-1-git-send-email-xieyisheng1@huawei.com Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com> Suggested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Wei Yang	ad69444e75	mm/page_alloc.c: remove redundant init code for ZONE_MOVABLE arch_zone_lowest/highest_possible_pfn[] is set to 0 and [ZONE_MOVABLE] is skipped in the loop. No need to reset them to 0 again. This patch just removes the redundant code. Link: http://lkml.kernel.org/r/20170209141731.60208-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Yisheng Xie	22c5cef162	mm/zsmalloc: remove redundant SetPagePrivate2 in create_page_chain We had used page->lru to link the component pages (except the first page) of a zspage, and used INIT_LIST_HEAD(&page->lru) to init it. Therefore, to get the last page's next page, which is NULL, we had to use page flag PG_Private_2 to identify it. But now, we use page->freelist to link all of the pages in zspage and init the page->freelist as NULL for last page, so no need to use PG_Private_2 anymore. This remove redundant SetPagePrivate2 in create_page_chain and ClearPagePrivate2 in reset_page(). Save a few cycles for migration of zsmalloc page :) Link: http://lkml.kernel.org/r/1487076509-49270-1-git-send-email-xieyisheng1@huawei.com Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Vinayak Menon	e1587a4945	mm: vmpressure: fix sending wrong events on underflow At the end of a window period, if the reclaimed pages is greater than scanned, an unsigned underflow can result in a huge pressure value and thus a critical event. Reclaimed pages is found to go higher than scanned because of the addition of reclaimed slab pages to reclaimed in shrink_node without a corresponding increment to scanned pages. Minchan Kim mentioned that this can also happen in the case of a THP page where the scanned is 1 and reclaimed could be 512. Link: http://lkml.kernel.org/r/1486641577-11685-1-git-send-email-vinmenon@codeaurora.org Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Anton Vorontsov <anton.vorontsov@linaro.org> Cc: Shiraz Hashim <shashim@codeaurora.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Hugh Dickins	3a4f8a0b3f	mm: remove shmem_mapping() shmem_zero_setup() duplicates Remove the prototypes for shmem_mapping() and shmem_zero_setup() from linux/mm.h, since they are already provided in linux/shmem_fs.h. But shmem_fs.h must then provide the inline stub for shmem_mapping() when CONFIG_SHMEM is not set, and a few more cfiles now need to #include it. Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1702081658250.1549@eggly.anvils Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Gavin Shan	e02dc017c3	mm/page_alloc: fix nodes for reclaim in fast path When @node_reclaim_node isn't 0, the page allocator tries to reclaim pages if the amount of free memory in the zones are below the low watermark. On Power platform, none of NUMA nodes are scanned for page reclaim because no nodes match the condition in zone_allows_reclaim(). On Power platform, RECLAIM_DISTANCE is set to 10 which is the distance of Node-A to Node-A. So the preferred node even won't be scanned for page reclaim. __alloc_pages_nodemask() get_page_from_freelist() zone_allows_reclaim() Anton proposed the test code as below: # cat alloc.c : int main(int argc, char argv[]) { void p; unsigned long size; unsigned long start, end; start = time(NULL); size = strtoul(argv[1], NULL, 0); printf("To allocate %ldGB memory\n", size); size <<= 30; p = malloc(size); assert(p); memset(p, 0, size); end = time(NULL); printf("Used time: %ld seconds\n", end - start); sleep(3600); return 0; } The system I use for testing has two NUMA nodes. Both have 128GB memory. In below scnario, the page caches on node#0 should be reclaimed when it encounters pressure to accommodate request of allocation. # echo 2 > /proc/sys/vm/zone_reclaim_mode; \ sync; \ echo 3 > /proc/sys/vm/drop_caches; \ # taskset -c 0 cat file.32G > /dev/null; \ grep FilePages /sys/devices/system/node/node0/meminfo Node 0 FilePages: 33619712 kB # taskset -c 0 ./alloc 128 # grep FilePages /sys/devices/system/node/node0/meminfo Node 0 FilePages: 33619840 kB # grep MemFree /sys/devices/system/node/node0/meminfo Node 0 MemFree: 186816 kB With the patch applied, the pagecache on node-0 is reclaimed when its free memory is running out. It's the expected behaviour. # echo 2 > /proc/sys/vm/zone_reclaim_mode; \ sync; \ echo 3 > /proc/sys/vm/drop_caches # taskset -c 0 cat file.32G > /dev/null; \ grep FilePages /sys/devices/system/node/node0/meminfo Node 0 FilePages: 33605568 kB # taskset -c 0 ./alloc 128 # grep FilePages /sys/devices/system/node/node0/meminfo Node 0 FilePages: 1379520 kB # grep MemFree /sys/devices/system/node/node0/meminfo Node 0 MemFree: 317120 kB Fixes: `5f7a75acdb` ("mm: page_alloc: do not cache reclaim distances") Link: http://lkml.kernel.org/r/1486532455-29613-1-git-send-email-gwshan@linux.vnet.ibm.com Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Anton Blanchard <anton@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: <stable@vger.kernel.org> [3.16+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
zhong jiang	d6d8c8a482	mm/memory_hotplug.c: fix overflow in test_pages_in_a_zone() When mainline introduced commit `a96dfddbcc` ("base/memory, hotplug: fix a kernel oops in show_valid_zones()"), it obtained the valid start and end pfn from the given pfn range. The valid start pfn can fix the actual issue, but it introduced another issue. The valid end pfn will may exceed the given end_pfn. Although the incorrect overflow will not result in actual problem at present, but I think it need to be fixed. [toshi.kani@hpe.com: remove assumption that end_pfn is aligned by MAX_ORDER_NR_PAGES] Fixes: `a96dfddbcc` ("base/memory, hotplug: fix a kernel oops in show_valid_zones()") Link: http://lkml.kernel.org/r/1486467299-22648-1-git-send-email-zhongjiang@huawei.com Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
zhouxianrong	8e19d540d1	zram: extend zero pages to same element pages The idea is that without doing more calculations we extend zero pages to same element pages for zram. zero page is special case of same element page with zero element. 1. the test is done under android 7.0 2. startup too many applications circularly 3. sample the zero pages, same pages (none-zero element) and total pages in function page_zero_filled the result is listed as below: ZERO SAME TOTAL 36214 17842 598196 ZERO/TOTAL SAME/TOTAL (ZERO+SAME)/TOTAL ZERO/SAME AVERAGE 0.060631909 0.024990816 0.085622726 2.663825038 STDEV 0.00674612 0.005887625 0.009707034 2.115881328 MAX 0.069698422 0.030046087 0.094975336 7.56043956 MIN 0.03959586 0.007332205 0.056055193 1.928985507 from the above data, the benefit is about 2.5% and up to 3% of total swapout pages. The defect of the patch is that when we recovery a page from non-zero element the operations are low efficient for partial read. This patch extends zero_page to same_page so if there is any user to have monitored zero_pages, he will be surprised if the number is increased but it's not harmful, I believe. [minchan@kernel.org: do not free same element pages in zram_meta_free] Link: http://lkml.kernel.org/r/20170207065741.GA2567@bbox Link: http://lkml.kernel.org/r/1483692145-75357-1-git-send-email-zhouxianrong@huawei.com Link: http://lkml.kernel.org/r/1486307804-27903-1-git-send-email-minchan@kernel.org Signed-off-by: zhouxianrong <zhouxianrong@huawei.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Steven Rostedt (VMware)	517663edd6	mm/page-writeback.c: place "not" inside of unlikely() statement in wb_domain_writeout_inc() The likely/unlikely profiler noticed that the unlikely statement in wb_domain_writeout_inc() is constantly wrong. This is due to the "not" (!) being outside the unlikely statement. It is likely that dom->period_time will be set, but unlikely that it wont be. Move the not into the unlikely statement. Link: http://lkml.kernel.org/r/20170206120035.3c2e2b91@gandalf.local.home Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Aneesh Kumar K.V	c137a2757b	powerpc/mm/autonuma: switch ppc64 to its own implementation of saved write With this our protnone becomes a present pte with READ/WRITE/EXEC bit cleared. By default we also set _PAGE_PRIVILEGED on such pte. This is now used to help us identify a protnone pte that as saved write bit. For such pte, we will clear the _PAGE_PRIVILEGED bit. The pte still remain non-accessible from both user and kernel. [aneesh.kumar@linux.vnet.ibm.com: v3] Link: http://lkml.kernel.org/r/1487498625-10891-4-git-send-email-aneesh.kumar@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/1487050314-3892-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Michael Neuling <mikey@neuling.org> Cc: Rik van Riel <riel@surriel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <michaele@au1.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Aneesh Kumar K.V	595cd8f256	mm/ksm: handle protnone saved writes when making page write protect Without this KSM will consider the page write protected, but a numa fault can later mark the page writable. This can result in memory corruption. Link: http://lkml.kernel.org/r/1487498625-10891-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Aneesh Kumar K.V	288bc54949	mm/autonuma: let architecture override how the write bit should be stashed in a protnone pte. Patch series "Numabalancing preserve write fix", v2. This patch series address an issue w.r.t THP migration and autonuma preserve write feature. migrate_misplaced_transhuge_page() cannot deal with concurrent modification of the page. It does a page copy without following the migration pte sequence. IIUC, this was done to keep the migration simpler and at the time of implemenation we didn't had THP page cache which would have required a more elaborate migration scheme. That means thp autonuma migration expect the protnone with saved write to be done such that both kernel and user cannot update the page content. This patch series enables archs like ppc64 to do that. We are good with the hash translation mode with the current code, because we never create a hardware page table entry for a protnone pte. This patch (of 2): Autonuma preserves the write permission across numa fault to avoid taking a writefault after a numa fault (Commit: `b191f9b106` " mm: numa: preserve PTE write permissions across a NUMA hinting fault"). Architecture can implement protnone in different ways and some may choose to implement that by clearing Read/ Write/Exec bit of pte. Setting the write bit on such pte can result in wrong behaviour. Fix this up by allowing arch to override how to save the write bit on a protnone pte. [aneesh.kumar@linux.vnet.ibm.com: don't mark pte saved write in case of dirty_accountable] Link: http://lkml.kernel.org/r/1487942884-16517-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com [aneesh.kumar@linux.vnet.ibm.com: v3] Link: http://lkml.kernel.org/r/1487498625-10891-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/1487050314-3892-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Michael Neuling <mikey@neuling.org> Cc: Rik van Riel <riel@surriel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <michaele@au1.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Aneesh Kumar K.V	cee216a696	mm/autonuma: don't use set_pte_at when updating protnone ptes Architectures like ppc64, use privilege access bit to mark pte non accessible. This implies that kernel can do a copy_to_user to an address marked for numa fault. This also implies that there can be a parallel hardware update for the pte. set_pte_at cannot be used in such scenarios. Hence switch the pte update to use ptep_get_and_clear and set_pte_at combination. [akpm@linux-foundation.org: remove unwanted ppc change, per Aneesh] Link: http://lkml.kernel.org/r/1486400776-28114-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Steven Rostedt (VMware)	3f472cc978	mm/shmem.c: fix unlikely() test of info->seals to test only for WRITE and GROW Running my likely/unlikely profiler, I discovered that the test in shmem_write_begin() that tests for info->seals as unlikely, is always incorrect. This is because shmem_get_inode() sets info->seals to have F_SEAL_SEAL set by default, and it is unlikely to be cleared when shmem_write_begin() is called. Thus, the if statement is very likely. But as the if statement block only cares about F_SEAL_WRITE and F_SEAL_GROW, change the test to only test those two bits. Link: http://lkml.kernel.org/r/20170203105656.7aec6237@gandalf.local.home Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Hugh Dickins <hughd@google.com> Cc: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:56 -08:00
Mel Gorman	c2f83143f1	mm, vmscan: clear PGDAT_WRITEBACK when zone is balanced Hillf Danton pointed out that since commit `1d82de618d` ("mm, vmscan: make kswapd reclaim in terms of nodes") that PGDAT_WRITEBACK is no longer cleared. It was not noticed as triggering it requires pages under writeback to cycle twice through the LRU and before kswapd gets stalled. Historically, such issues tended to occur on small machines writing heavily to slow storage such as a USB stick. Once kswapd stalls, direct reclaim stalls may be higher but due to the fact that memory pressure is required, it would not be very noticable. Michal Hocko suggested removing the flag entirely but the conservative fix is to restore the intended PGDAT_WRITEBACK behaviour and clear the flag when a suitable zone is balanced. Fixes: `1d82de618d` ("mm, vmscan: make kswapd reclaim in terms of nodes") Link: http://lkml.kernel.org/r/20170203203222.gq7hk66yc36lpgtb@suse.de Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:55 -08:00

1 2 3 4 5 ...

660128 Commits All Branches Search

660128 Commits

All Branches