mirror of https://gitee.com/openkylin/linux.git
264 Commits
Author | SHA1 | Message | Date |
---|---|---|---|
Linus Torvalds | b212921b13 |
elf: don't use MAP_FIXED_NOREPLACE for elf executable mappings
In commit |
|
Kees Cook | 7be3cb019d |
binfmt_elf: Do not move brk for INTERP-less ET_EXEC
When brk was moved for binaries without an interpreter, it should have
been limited to ET_DYN only. In other words, the special case was an
ET_DYN that lacks an INTERP, not just an executable that lacks INTERP.
The bug manifested for giant static executables, where the brk would end
up in the middle of the text area on 32-bit architectures.
Reported-and-tested-by: Richard Kojedzinszky <richard@kojedz.in>
Fixes:
|
|
Alexandre Ghiti | 649775be63 |
mm, fs: move randomize_stack_top from fs to mm
Patch series "Provide generic top-down mmap layout functions", v6. This series introduces generic functions to make top-down mmap layout easily accessible to architectures, in particular riscv which was the initial goal of this series. The generic implementation was taken from arm64 and used successively by arm, mips and finally riscv. Note that in addition the series fixes 2 issues: - stack randomization was taken into account even if not necessary. - [1] fixed an issue with mmap base which did not take into account randomization but did not report it to arm and mips, so by moving arm64 into a generic library, this problem is now fixed for both architectures. This work is an effort to factorize architecture functions to avoid code duplication and oversights as in [1]. [1]: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1429066.html This patch (of 14): This preparatory commit moves this function so that further introduction of generic topdown mmap layout is contained only in mm/util.c. Link: http://lkml.kernel.org/r/20190730055113.23635-2-alex@ghiti.fr Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@mips.com> Cc: James Hogan <jhogan@kernel.org> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Alexey Dobriyan | aa94b1dc5b |
fs/binfmt_elf.c: delete stale comment
"passed_fileno" variable was deleted 11 years ago in 2.6.25.
Link: http://lkml.kernel.org/r/20190529201747.GA23248@avx2
Fixes:
|
|
Thomas Gleixner | 09c434b8a0 |
treewide: Add SPDX license identifier for more missed files
Add SPDX license identifiers to all files which: - Have no license information of any form - Have MODULE_LICENCE("GPL*") inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
Kees Cook | bbdc6076d2 |
binfmt_elf: move brk out of mmap when doing direct loader exec
Commmit |
|
Alexey Dobriyan | 249b08e4e5 |
elf: init pt_regs pointer later
Get "current_pt_regs" pointer right before usage. Space savings on x86_64: add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-180 (-180) Function old new delta load_elf_binary 5806 5626 -180 !!! Looks like the compiler doesn't know that "current_pt_regs" is stable pointer (because it doesn't know ->stack isn't) even though it knows that "current" is stable pointer. So it saves it in the very beginning and then tries to carry it through a lot of code. Here is what happens here: load_elf_binary() ... mov rax,QWORD PTR gs:0x14c00 mov r13,QWORD PTR [rax+0x18] r13 = current->stack call kmem_cache_alloc # first kmalloc [980 bytes later!] # let's spill that sucker because we need a register # for "load_bias" calculations at # # if (interpreter) { # load_bias = ELF_ET_DYN_BASE; # if (current->flags & PF_RANDOMIZE) # load_bias += arch_mmap_rnd(); # elf_flags |= elf_fixed; # } mov QWORD PTR [rsp+0x68],r13 If this is not _the_ root cause it is still eeeeh. After the patch things become much simpler: mov rax, QWORD PTR gs:0x14c00 # current mov rdx, QWORD PTR [rax+0x18] # current->stack movq [rdx+0x3fb8], 0 # fill pt_regs ... call finalize_exec Link: http://lkml.kernel.org/r/20190419200343.GA19788@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Tested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Alexey Dobriyan | d8e7cb39ac |
fs/binfmt_elf.c: extract PROT_* calculations
There are two places where mapping protections are calculated: one for executable, another one for interpreter -- take them out. ELF read and execute permissions are interchanged with Linux PROT_READ and PROT_EXEC, microoptimizations are welcome! Link: http://lkml.kernel.org/r/20190417213413.GB26474@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Alexey Dobriyan | 852643165a |
fs//binfmt_elf.c: move variables initialization closer to their usage
Link: http://lkml.kernel.org/r/20190416202002.GB24304@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Alexey Dobriyan | be0deb585e |
fs/binfmt_elf.c: save 1 indent level
Rewrite for (...) { if (->p_type == PT_INTERP) { ... break; } } loop into for (...) { if (->p_type != PT_INTERP) continue; ... break; } Link: http://lkml.kernel.org/r/20190416201906.GA24304@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Alexey Dobriyan | ba0f6b88a8 |
fs/binfmt_elf.c: delete trailing "return;" in functions returning "void"
Link: http://lkml.kernel.org/r/20190314205042.GE18143@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Alexey Dobriyan | cc338010a2 |
fs/binfmt_elf.c: free PT_INTERP filename ASAP
There is no reason for PT_INTERP filename to linger till the end of the whole loading process. Link: http://lkml.kernel.org/r/20190314204953.GD18143@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Nikitas Angelinas <nikitas.angelinas@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mukesh Ojha <mojha@codeaurora.org> [nikitas.angelinas@gmail.com: fix GPF when dereferencing invalid interpreter] Link: http://lkml.kernel.org/r/20190330140032.GA1527@vostro Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Alexey Dobriyan | 5cf4a36382 |
fs/binfmt_elf.c: make scope of "pos" variable smaller
Link: http://lkml.kernel.org/r/20190314204707.GC18143@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Andrew Morton | 22f084dbc1 |
fs/binfmt_elf.c: remove unneeded initialization of mm->start_stack
As pointed out by zoujc@lenovo.com, setup_arg_pages() already initialized current->mm->start_stack. Link: https://bugzilla.kernel.org/show_bug.cgi?id=202881 Reported-by: <zoujc@lenovo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Alexey Dobriyan | 49ac981965 |
fs/binfmt_elf.c: spread const a little
Link: http://lkml.kernel.org/r/20190204202830.GC27482@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Alexey Dobriyan | 93f044e282 |
fs/binfmt_elf.c: use list_for_each_entry()
[adobriyan@gmail.com: fixup compilation] Link: http://lkml.kernel.org/r/20190205064334.GA2152@avx2 Link: http://lkml.kernel.org/r/20190204202800.GB27482@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Alexey Dobriyan | faf1c31520 |
fs/binfmt_elf.c: don't be afraid of overflow
Number of ELF program headers is 16-bit by spec, so total size comfortably fits into "unsigned int". Space savings: 7 bytes! add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-7 (-7) Function old new delta load_elf_phdrs 137 130 -7 Link: http://lkml.kernel.org/r/20190204202715.GA27482@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Eric W. Biederman | ae7795bc61 |
signal: Distinguish between kernel_siginfo and siginfo
Linus recently observed that if we did not worry about the padding member in struct siginfo it is only about 48 bytes, and 48 bytes is much nicer than 128 bytes for allocating on the stack and copying around in the kernel. The obvious thing of only adding the padding when userspace is including siginfo.h won't work as there are sigframe definitions in the kernel that embed struct siginfo. So split siginfo in two; kernel_siginfo and siginfo. Keeping the traditional name for the userspace definition. While the version that is used internally to the kernel and ultimately will not be padded to 128 bytes is called kernel_siginfo. The definition of struct kernel_siginfo I have put in include/signal_types.h A set of buildtime checks has been added to verify the two structures have the same field offsets. To make it easy to verify the change kernel_siginfo retains the same size as siginfo. The reduction in size comes in a following change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> |
|
Linus Torvalds | e5a32b5b21 |
Here are the main MIPS changes for 4.19.
An overview of the general architecture changes: - Massive DMA ops refactoring from Christoph Hellwig (huzzah for deleting crufty code!). - We introduce NT_MIPS_DSP & NT_MIPS_FP_MODE ELF notes & corresponding regsets to expose DSP ASE & floating point mode state respectively, both for live debugging & core dumps. - We better optimize our code by hard-coding cpu_has_* macros at compile time where their values are known due to the ISA revision that the kernel build is targeting. - The EJTAG exception handler now better handles SMP systems, where it was previously possible for CPUs to clobber a register value saved by another CPU. - Our implementation of memset() gained a couple of fixes for MIPSr6 systems to return correct values in some cases where stores fault. - We now implement ioremap_wc() using the uncached-accelerated cache coherency attribute where supported, which is detected during boot, and fall back to plain uncached access where necessary. The MIPS-specific (and unused in tree) ioremap_uncached_accelerated() & ioremap_cacheable_cow() are removed. - The prctl(PR_SET_FP_MODE, ...) syscall is better supported for SMP systems by reworking the way we ensure remote CPUs that may be running threads within the affected process switch mode. - Systems using the MIPS Coherence Manager will now set the MIPS_IC_SNOOPS_REMOTE flag to avoid some unnecessary cache maintenance overhead when flushing the icache. - A few fixes were made for building with clang/LLVM, which now sucessfully builds kernels for many of our platforms. - Miscellaneous cleanups all over. And some platform-specific changes: - ar7 gained stubs for a few clock API functions to fix build failures for some drivers. - ath79 gained support for a few new SoCs, a few fixes & better gpio-keys support. - Ci20 now exposes its SPI bus using the spi-gpio driver. - The generic platform can now auto-detect a suitable value for PHYS_OFFSET based upon the memory map described by the device tree, allowing us to avoid wasting memory on page book-keeping for systems where RAM starts at a non-zero physical address. - Ingenic systems using the jz4740 platform code now link their vmlinuz higher to allow for kernels of a realistic size. - Loongson32 now builds the kernel targeting MIPSr1 rather than MIPSr2 to avoid CPU errata. - Loongson64 gains a couple of fixes, a workaround for a write buffering issue & support for the Loongson 3A R3.1 CPU. - Malta now uses the piix4-poweroff driver to handle powering down. - Microsemi Ocelot gained support for its SPI bus & NOR flash, its second MDIO bus and can now be supported by a FIT/.itb image. - Octeon saw a bunch of header cleanups which remove a lot of duplicate or unused code. -----BEGIN PGP SIGNATURE----- iIsEABYIADMWIQRgLjeFAZEXQzy86/s+p5+stXUA3QUCW3G6JxUccGF1bC5idXJ0 b25AbWlwcy5jb20ACgkQPqefrLV1AN0n/gD/Rpdgay31G/4eTTKBmBrcaju6Shjt /2Iu6WC5Sj4hDHUBAJSbuI+B9YjcNsjekBYxB/LLD7ImcLBl6nLMIvKmXLAL =cUiF -----END PGP SIGNATURE----- Merge tag 'mips_4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS updates from Paul Burton: "Here are the main MIPS changes for 4.19. An overview of the general architecture changes: - Massive DMA ops refactoring from Christoph Hellwig (huzzah for deleting crufty code!). - We introduce NT_MIPS_DSP & NT_MIPS_FP_MODE ELF notes & corresponding regsets to expose DSP ASE & floating point mode state respectively, both for live debugging & core dumps. - We better optimize our code by hard-coding cpu_has_* macros at compile time where their values are known due to the ISA revision that the kernel build is targeting. - The EJTAG exception handler now better handles SMP systems, where it was previously possible for CPUs to clobber a register value saved by another CPU. - Our implementation of memset() gained a couple of fixes for MIPSr6 systems to return correct values in some cases where stores fault. - We now implement ioremap_wc() using the uncached-accelerated cache coherency attribute where supported, which is detected during boot, and fall back to plain uncached access where necessary. The MIPS-specific (and unused in tree) ioremap_uncached_accelerated() & ioremap_cacheable_cow() are removed. - The prctl(PR_SET_FP_MODE, ...) syscall is better supported for SMP systems by reworking the way we ensure remote CPUs that may be running threads within the affected process switch mode. - Systems using the MIPS Coherence Manager will now set the MIPS_IC_SNOOPS_REMOTE flag to avoid some unnecessary cache maintenance overhead when flushing the icache. - A few fixes were made for building with clang/LLVM, which now sucessfully builds kernels for many of our platforms. - Miscellaneous cleanups all over. And some platform-specific changes: - ar7 gained stubs for a few clock API functions to fix build failures for some drivers. - ath79 gained support for a few new SoCs, a few fixes & better gpio-keys support. - Ci20 now exposes its SPI bus using the spi-gpio driver. - The generic platform can now auto-detect a suitable value for PHYS_OFFSET based upon the memory map described by the device tree, allowing us to avoid wasting memory on page book-keeping for systems where RAM starts at a non-zero physical address. - Ingenic systems using the jz4740 platform code now link their vmlinuz higher to allow for kernels of a realistic size. - Loongson32 now builds the kernel targeting MIPSr1 rather than MIPSr2 to avoid CPU errata. - Loongson64 gains a couple of fixes, a workaround for a write buffering issue & support for the Loongson 3A R3.1 CPU. - Malta now uses the piix4-poweroff driver to handle powering down. - Microsemi Ocelot gained support for its SPI bus & NOR flash, its second MDIO bus and can now be supported by a FIT/.itb image. - Octeon saw a bunch of header cleanups which remove a lot of duplicate or unused code" * tag 'mips_4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (123 commits) MIPS: Remove remnants of UASM_ISA MIPS: netlogic: xlr: Remove erroneous check in nlm_fmn_send() MIPS: VDSO: Force link endianness MIPS: Always specify -EB or -EL when using clang MIPS: Use dins to simplify __write_64bit_c0_split() MIPS: Use read-write output operand in __write_64bit_c0_split() MIPS: Avoid using array as parameter to write_c0_kpgd() MIPS: vdso: Allow clang's --target flag in VDSO cflags MIPS: genvdso: Remove GOT checks MIPS: Remove obsolete MIPS checks for DST node "chosen@0" MIPS: generic: Remove input symbols from defconfig MIPS: Delete unused code in linux32.c MIPS: Remove unused sys_32_mmap2 MIPS: Remove nabi_no_regargs mips: dts: mscc: enable spi and NOR flash support on ocelot PCB123 mips: dts: mscc: Add spi on Ocelot MIPS: Loongson: Merge load addresses MIPS: Loongson: Set Loongson32 to MIPS32R1 MIPS: mscc: ocelot: add interrupt controller properties to GPIO controller MIPS: generic: Select MIPS_AUTO_PFN_OFFSET ... |
|
Maciej W. Rozycki |
2f819db565
|
binfmt_elf: Respect error return from `regset->active'
The regset API documented in <linux/regset.h> defines -ENODEV as the
result of the `->active' handler to be used where the feature requested
is not available on the hardware found. However code handling core file
note generation in `fill_thread_core_info' interpretes any non-zero
result from the `->active' handler as the regset requested being active.
Consequently processing continues (and hopefully gracefully fails later
on) rather than being abandoned right away for the regset requested.
Fix the problem then by making the code proceed only if a positive
result is returned from the `->active' handler.
Signed-off-by: Maciej W. Rozycki <macro@mips.com>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Fixes:
|
|
Oscar Salvador | 24962af7e1 |
fs, elf: make sure to page align bss in load_elf_library
The current code does not make sure to page align bss before calling vm_brk(), and this can lead to a VM_BUG_ON() in __mm_populate() due to the requested lenght not being correctly aligned. Let us make sure to align it properly. Kees: only applicable to CONFIG_USELIB kernels: 32-bit and configured for libc5. Link: http://lkml.kernel.org/r/20180705145539.9627-1-osalvador@techadventures.net Signed-off-by: Oscar Salvador <osalvador@suse.de> Reported-by: syzbot+5dcb560fe12aa5091c06@syzkaller.appspotmail.com Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Acked-by: Kees Cook <keescook@chromium.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Alexey Dobriyan | 86a2bb5ad8 |
coredump: fix spam with zero VMA process
Nobody ever tried to self destruct by unmapping whole address space at once: munmap((void *)0, (1ULL << 47) - 4096); Doing this produces 2 warnings for zero-length vmalloc allocations: a.out[1353]: segfault at 7f80bcc4b757 ip 00007f80bcc4b757 sp 00007fff683939b8 error 14 a.out: vmalloc: allocation failure: 0 bytes, mode:0xcc0(GFP_KERNEL), nodemask=(null) ... a.out: vmalloc: allocation failure: 0 bytes, mode:0xcc0(GFP_KERNEL), nodemask=(null) ... Fix is to switch to kvmalloc(). Steps to reproduce: // vsyscall=none #include <sys/mman.h> #include <sys/resource.h> int main(void) { setrlimit(RLIMIT_CORE, &(struct rlimit){RLIM_INFINITY, RLIM_INFINITY}); munmap((void *)0, (1ULL << 47) - 4096); return 0; } Link: http://lkml.kernel.org/r/20180410180353.GA2515@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Kees Cook | 42bc47b353 |
treewide: Use array_size() in vmalloc()
The vmalloc() function has no 2-factor argument form, so multiplication factors need to be wrapped in array_size(). This patch replaces cases of: vmalloc(a * b) with: vmalloc(array_size(a, b)) as well as handling cases of: vmalloc(a * b * c) with: vmalloc(array3_size(a, b, c)) This does, however, attempt to ignore constant size factors like: vmalloc(4 * 1024) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( vmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | vmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( vmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(u8) * COUNT + COUNT , ...) | vmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | vmalloc( - sizeof(char) * COUNT + COUNT , ...) | vmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( vmalloc( - sizeof(TYPE) * (COUNT_ID) + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT_ID + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT_CONST + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vmalloc( - sizeof(THING) * (COUNT_ID) + array_size(COUNT_ID, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT_ID + array_size(COUNT_ID, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT_CONST + array_size(COUNT_CONST, sizeof(THING)) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ vmalloc( - SIZE * COUNT + array_size(COUNT, SIZE) , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( vmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( vmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | vmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( vmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( vmalloc(C1 * C2 * C3, ...) | vmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants. @@ expression E1, E2; constant C1, C2; @@ ( vmalloc(C1 * C2, ...) | vmalloc( - E1 * E2 + array_size(E1, E2) , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org> |
|
Kees Cook | 6da2ec5605 |
treewide: kmalloc() -> kmalloc_array()
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This patch replaces cases of: kmalloc(a * b, gfp) with: kmalloc_array(a * b, gfp) as well as handling cases of: kmalloc(a * b * c, gfp) with: kmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The tools/ directory was manually excluded, since it has its own implementation of kmalloc(). The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(char) * COUNT + COUNT , ...) | kmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kmalloc + kmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kmalloc(C1 * C2 * C3, ...) | kmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kmalloc(sizeof(THING) * C2, ...) | kmalloc(sizeof(TYPE) * C2, ...) | kmalloc(C1 * C2 * C3, ...) | kmalloc(C1 * C2, ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - (E1) * E2 + E1, E2 , ...) | - kmalloc + kmalloc_array ( - (E1) * (E2) + E1, E2 , ...) | - kmalloc + kmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org> |
|
Tetsuo Handa | d23a61ee90 |
fs, elf: don't complain MAP_FIXED_NOREPLACE unless -EEXIST error
Commit |
|
Michal Hocko | ad55eac74f |
elf: enforce MAP_FIXED on overlaying elf segments
Anshuman has reported that with "fs, elf: drop MAP_FIXED usage from elf_map" applied, some ELF binaries in his environment fail to start with [ 23.423642] 9148 (sed): Uhuuh, elf segment at 0000000010030000 requested but the memory is mapped already [ 23.423706] requested [10030000, 10040000] mapped [10030000, 10040000] 100073 anon The reason is that the above binary has overlapping elf segments: LOAD 0x0000000000000000 0x0000000010000000 0x0000000010000000 0x0000000000013a8c 0x0000000000013a8c R E 10000 LOAD 0x000000000001fd40 0x000000001002fd40 0x000000001002fd40 0x00000000000002c0 0x00000000000005e8 RW 10000 LOAD 0x0000000000020328 0x0000000010030328 0x0000000010030328 0x0000000000000384 0x00000000000094a0 RW 10000 That binary has two RW LOAD segments, the first crosses a page border into the second 0x1002fd40 (LOAD2-vaddr) + 0x5e8 (LOAD2-memlen) == 0x10030328 (LOAD3-vaddr) Handle this situation by enforcing MAP_FIXED when we establish a temporary brk VMA to handle overlapping segments. All other mappings will still use MAP_FIXED_NOREPLACE. Link: http://lkml.kernel.org/r/20180213100440.GM3443@dhcp22.suse.cz Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Cc: Andrei Vagin <avagin@openvz.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Kees Cook <keescook@chromium.org> Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: Joel Stanley <joel@jms.id.au> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Mark Brown <broonie@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Michal Hocko | 4ed2863951 |
fs, elf: drop MAP_FIXED usage from elf_map
Both load_elf_interp and load_elf_binary rely on elf_map to map segments on a controlled address and they use MAP_FIXED to enforce that. This is however dangerous thing prone to silent data corruption which can be even exploitable. Let's take CVE-2017-1000253 as an example. At the time (before commit eab09532d400: "binfmt_elf: use ELF_ET_DYN_BASE only for PIE") ELF_ET_DYN_BASE was at TASK_SIZE / 3 * 2 which is not that far away from the stack top on 32b (legacy) memory layout (only 1GB away). Therefore we could end up mapping over the existing stack with some luck. The issue has been fixed since then (a87938b2e246: "fs/binfmt_elf.c: fix bug in loading of PIE binaries"), ELF_ET_DYN_BASE moved moved much further from the stack ( |
|
Kees Cook | b838383133 |
exec: introduce finalize_exec() before start_thread()
Provide a final callback into fs/exec.c before start_thread() takes over, to handle any last-minute changes, like the coming restoration of the stack limit. Link: http://lkml.kernel.org/r/1518638796-20819-3-git-send-email-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Cc: Brad Spengler <spender@grsecurity.net> Cc: Greg KH <greg@kroah.com> Cc: Hugh Dickins <hughd@google.com> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Alexey Dobriyan | 60c9d92f88 |
elf: fix NT_FILE integer overflow
If vm.max_map_count bumped above 2^26 (67+ mil) and system has enough RAM to allocate all the VMAs (~12.8 GB on Fedora 27 with 200-byte VMAs), then it should be possible to overflow 32-bit "size", pass paranoia check, allocate very little vmalloc space and oops while writing into vmalloc guard page... But I didn't test this, only coredump of regular process. Link: http://lkml.kernel.org/r/20180112203427.GA9109@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Linus Torvalds | 441692aafc |
Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM updates from Russell King: - add support for ELF fdpic binaries on both MMU and noMMU platforms - linker script cleanups - support for compressed .data section for XIP images - discard memblock arrays when possible - various cleanups - atomic DMA pool updates - better diagnostics of missing/corrupt device tree - export information to allow userspace kexec tool to place images more inteligently, so that the device tree isn't overwritten by the booting kernel - make early_printk more efficient on semihosted systems - noMMU cleanups - SA1111 PCMCIA update in preparation for further cleanups * 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: (38 commits) ARM: 8719/1: NOMMU: work around maybe-uninitialized warning ARM: 8717/2: debug printch/printascii: translate '\n' to "\r\n" not "\n\r" ARM: 8713/1: NOMMU: Support MPU in XIP configuration ARM: 8712/1: NOMMU: Use more MPU regions to cover memory ARM: 8711/1: V7M: Add support for MPU to M-class ARM: 8710/1: Kconfig: Kill CONFIG_VECTORS_BASE ARM: 8709/1: NOMMU: Disallow MPU for XIP ARM: 8708/1: NOMMU: Rework MPU to be mostly done in C ARM: 8707/1: NOMMU: Update MPU accessors to use cp15 helpers ARM: 8706/1: NOMMU: Move out MPU setup in separate module ARM: 8702/1: head-common.S: Clear lr before jumping to start_kernel() ARM: 8705/1: early_printk: use printascii() rather than printch() ARM: 8703/1: debug.S: move hexbuf to a writable section ARM: add additional table to compressed kernel ARM: decompressor: fix BSS size calculation pcmcia: sa1111: remove special sa1111 mmio accessors pcmcia: sa1111: use sa1111_get_irq() to obtain IRQ resources ARM: better diagnostics with missing/corrupt dtb ARM: 8699/1: dma-mapping: Remove init_dma_coherent_pool_size() ARM: 8698/1: dma-mapping: Mark atomic_pool as __ro_after_init .. |
|
Dave Martin | 27e64b4be4 |
regset: Add support for dynamically sized regsets
Currently the regset API doesn't allow for the possibility that regsets (or at least, the amount of meaningful data in a regset) may change in size. In particular, this results in useless padding being added to coredumps if a regset's current size is smaller than its theoretical maximum size. This patch adds a get_size() function to struct user_regset. Individual regset implementations can implement this function to return the current size of the regset data. A regset_size() function is added to provide callers with an abstract interface for determining the size of a regset without needing to know whether the regset is dynamically sized or not. The only affected user of this interface is the ELF coredump code: This patch ports ELF coredump to dump regsets with their actual size in the coredump. This has no effect except for new regsets that are dynamically sized and provide a get_size() implementation. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: H. J. Lu <hjl.tools@gmail.com> Signed-off-by: Will Deacon <will.deacon@arm.com> |
|
Russell King | 1bb078330b |
Merge branch 'fdpic' of http://git.linaro.org/people/nicolas.pitre/linux into devel-stable
This series provides the needed changes to suport the ELF_FDPIC binary format on ARM. Both MMU and non-MMU systems are supported. This format has many advantages over the BFLT format used on MMU-less systems, such as being real ELF that can be parsed by standard tools, can support shared dynamic libs, etc. |
|
Linus Torvalds | 581bfce969 |
Merge branch 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more set_fs removal from Al Viro: "Christoph's 'use kernel_read and friends rather than open-coding set_fs()' series" * 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: unexport vfs_readv and vfs_writev fs: unexport vfs_read and vfs_write fs: unexport __vfs_read/__vfs_write lustre: switch to kernel_write gadget/f_mass_storage: stop messing with the address limit mconsole: switch to kernel_read btrfs: switch write_buf to kernel_write net/9p: switch p9_fd_read to kernel_write mm/nommu: switch do_mmap_private to kernel_read serial2002: switch serial2002_tty_write to kernel_{read/write} fs: make the buf argument to __kernel_write a void pointer fs: fix kernel_write prototype fs: fix kernel_read prototype fs: move kernel_read to fs/read_write.c fs: move kernel_write to fs/read_write.c autofs4: switch autofs4_write to __kernel_write ashmem: switch to ->read_iter |
|
Nicolas Pitre | 4755200b6b |
binfmt_elf: don't attempt to load FDPIC binaries
On platforms where both ELF and ELF-FDPIC variants are available, the regular ELF loader will happily identify FDPIC binaries as proper ELF and load them without the necessary FDPIC fixups, resulting in an immediate user space crash. Let's prevent binflt_elf from loading those binaries so binfmt_elf_fdpic has a chance to pick them up. For those architectures that don't define elf_check_fdpic(), a default version returning false is provided. Signed-off-by: Nicolas Pitre <nico@linaro.org> Acked-by: Mickael GUENE <mickael.guene@st.com> Tested-by: Vincent Abriou <vincent.abriou@st.com> Tested-by: Andras Szemzo <szemzo.andras@gmail.com> |
|
Linus Torvalds | 828f4257d1 |
This series has the ultimate goal of providing a sane stack rlimit when
running set*id processes. To do this, the bprm_secureexec LSM hook is collapsed into the bprm_set_creds hook so the secureexec-ness of an exec can be determined early enough to make decisions about rlimits and the resulting memory layouts. Other logic acting on the secureexec-ness of an exec is similarly consolidated. Capabilities needed some special handling, but the refactoring removed other special handling, so that was a wash. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Kees Cook <kees@outflux.net> iQIcBAABCgAGBQJZrwRKAAoJEIly9N/cbcAmhboP/iwLbYfWngIJdu3pYKrW+CEg uUVY6RNnsumJ5yEhD/yQKXSPmZ8PkC8vexPYvf8TcPOlMRQuhVvdiR0FfSUvkMWy pB8ZVCyAV1uSnW4BH61FCxHInrahy8jlvQwnAujvw+FNxhcQjyEGKupOLIMGLioQ 8G5Ihf+hOjiXRhKbXueQi89n8i4jEI5YTH1RnC+Gsy8jG11EC9BhPddKSMaUKZA3 HYYqUyV0daYpGuxTOxaRdDO5wb6rlS+B46hqtOsSsIBOQkCjnLCRcdeMCqvXjQmv kyZj03cPlUjEHqh3d3nB6utvVWReGf/p986//kQjT1OZPhATbySAu7wUHoLik3dU zuexudNTBROf6YXahMxSJp348GS++xoBFARa78402E++U7C4/eoclbLCWAylBwVA H+QAHFYRC2WFoskejSYBRPz6HLr1SIaSYMsKbkHqP07zi6p3ic2Uq3XvOP2zL/5p l/mXa1Fs2vcDOWPER8a8b9mVkJDvuXj6J11lG+q80UWAWC3sd9GkSwOen80ps3Xo /7dd+h2BAJSSVxZQFxd5YCx99mT0ntQZ797PhjxOY6SX/xUdOCAp9x1zDU5OUovP q2ty3UTd7tq8h1RnHOnrn9cKmMmI7kpBvEfPGM507cEVjyfsMu2jJtUxN9dXOAkB aebEsg3C8M6z5OdGVpWH =Yva4 -----END PGP SIGNATURE----- Merge tag 'secureexec-v4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull secureexec update from Kees Cook: "This series has the ultimate goal of providing a sane stack rlimit when running set*id processes. To do this, the bprm_secureexec LSM hook is collapsed into the bprm_set_creds hook so the secureexec-ness of an exec can be determined early enough to make decisions about rlimits and the resulting memory layouts. Other logic acting on the secureexec-ness of an exec is similarly consolidated. Capabilities needed some special handling, but the refactoring removed other special handling, so that was a wash" * tag 'secureexec-v4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: exec: Consolidate pdeath_signal clearing exec: Use sane stack rlimit under secureexec exec: Consolidate dumpability logic smack: Remove redundant pdeath_signal clearing exec: Use secureexec for clearing pdeath_signal exec: Use secureexec for setting dumpability LSM: drop bprm_secureexec hook commoncap: Move cap_elevated calculation into bprm_set_creds commoncap: Refactor to remove bprm_secureexec hook smack: Refactor to remove bprm_secureexec hook selinux: Refactor to remove bprm_secureexec hook apparmor: Refactor to remove bprm_secureexec hook binfmt: Introduce secureexec flag exec: Correct comments about "point of no return" exec: Rename bprm->cred_prepared to called_set_creds |
|
Christoph Hellwig | bdd1d2d3d2 |
fs: fix kernel_read prototype
Use proper ssize_t and size_t types for the return value and count argument, move the offset last and make it an in/out argument like all other read/write helpers, and make the buf argument a void pointer to get rid of lots of casts in the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
|
Oleg Nesterov | 01578e3616 |
x86/elf: Remove the unnecessary ADDR_NO_RANDOMIZE checks
The ADDR_NO_RANDOMIZE checks in stack_maxrandom_size() and randomize_stack_top() are not required. PF_RANDOMIZE is set by load_elf_binary() only if ADDR_NO_RANDOMIZE is not set, no need to re-check after that. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: stable@vger.kernel.org Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: http://lkml.kernel.org/r/20170815154011.GB1076@redhat.com |
|
Kees Cook | c425e189ff |
binfmt: Introduce secureexec flag
The bprm_secureexec hook can be moved earlier. Right now, it is called during create_elf_tables(), via load_binary(), via search_binary_handler(), via exec_binprm(). Nearly all (see exception below) state used by bprm_secureexec is created during the bprm_set_creds hook, called from prepare_binprm(). For all LSMs (except commoncaps described next), only the first execution of bprm_set_creds takes any effect (they all check bprm->called_set_creds which prepare_binprm() sets after the first call to the bprm_set_creds hook). However, all these LSMs also only do anything with bprm_secureexec when they detected a secure state during their first run of bprm_set_creds. Therefore, it is functionally identical to move the detection into bprm_set_creds, since the results from secureexec here only need to be based on the first call to the LSM's bprm_set_creds hook. The single exception is that the commoncaps secureexec hook also examines euid/uid and egid/gid differences which are controlled by bprm_fill_uid(), via prepare_binprm(), which can be called multiple times (e.g. binfmt_script, binfmt_misc), and may clear the euid/egid for the final load (i.e. the script interpreter). However, while commoncaps specifically ignores bprm->cred_prepared, and runs its bprm_set_creds hook each time prepare_binprm() may get called, it needs to base the secureexec decision on the final call to bprm_set_creds. As a result, it will need special handling. To begin this refactoring, this adds the secureexec flag to the bprm struct, and calls the secureexec hook during setup_new_exec(). This is safe since all the cred work is finished (and past the point of no return). This explicit call will be removed in later patches once the hook has been removed. Cc: David Howells <dhowells@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: John Johansen <john.johansen@canonical.com> Acked-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: James Morris <james.l.morris@oracle.com> |
|
Kees Cook | 67c6777a5d |
binfmt_elf: safely increment argv pointers
When building the argv/envp pointers, the envp is needlessly pre-incremented instead of just continuing after the argv pointers are finished. In some (likely impossible) race where the strings could be changed from userspace between copy_strings() and here, it might be possible to confuse the envp position. Instead, just use sp like everything else. Link: http://lkml.kernel.org/r/20170622173838.GA43308@beast Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Rik van Riel <riel@redhat.com> Cc: Daniel Micay <danielmicay@gmail.com> Cc: Qualys Security Advisory <qsa@qualys.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Kees Cook | eab09532d4 |
binfmt_elf: use ELF_ET_DYN_BASE only for PIE
The ELF_ET_DYN_BASE position was originally intended to keep loaders
away from ET_EXEC binaries. (For example, running "/lib/ld-linux.so.2
/bin/cat" might cause the subsequent load of /bin/cat into where the
loader had been loaded.)
With the advent of PIE (ET_DYN binaries with an INTERP Program Header),
ELF_ET_DYN_BASE continued to be used since the kernel was only looking
at ET_DYN. However, since ELF_ET_DYN_BASE is traditionally set at the
top 1/3rd of the TASK_SIZE, a substantial portion of the address space
is unused.
For 32-bit tasks when RLIMIT_STACK is set to RLIM_INFINITY, programs are
loaded above the mmap region. This means they can be made to collide
(CVE-2017-1000370) or nearly collide (CVE-2017-1000371) with
pathological stack regions.
Lowering ELF_ET_DYN_BASE solves both by moving programs below the mmap
region in all cases, and will now additionally avoid programs falling
back to the mmap region by enforcing MAP_FIXED for program loads (i.e.
if it would have collided with the stack, now it will fail to load
instead of falling back to the mmap region).
To allow for a lower ELF_ET_DYN_BASE, loaders (ET_DYN without INTERP)
are loaded into the mmap region, leaving space available for either an
ET_EXEC binary with a fixed location or PIE being loaded into mmap by
the loader. Only PIE programs are loaded offset from ELF_ET_DYN_BASE,
which means architectures can now safely lower their values without risk
of loaders colliding with their subsequently loaded programs.
For 64-bit, ELF_ET_DYN_BASE is best set to 4GB to allow runtimes to use
the entire 32-bit address space for 32-bit pointers.
Thanks to PaX Team, Daniel Micay, and Rik van Riel for inspiration and
suggestions on how to implement this solution.
Fixes:
|
|
Ingo Molnar | 32ef5517c2 |
sched/headers: Prepare to move cputime functionality from <linux/sched.h> into <linux/sched/cputime.h>
Introduce a trivial, mostly empty <linux/sched/cputime.h> header to prepare for the moving of cputime functionality out of sched.h. Update all code that relies on these facilities. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Ingo Molnar | 68db0cf106 |
sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task_stack.h>
We are going to split <linux/sched/task_stack.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/task_stack.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Ingo Molnar | 5b825c3af1 |
sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h>
Add #include <linux/cred.h> dependencies to all .c files rely on sched.h doing that for them. Note that even if the count where we need to add extra headers seems high, it's still a net win, because <linux/sched.h> is included in over 2,200 files ... Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Ingo Molnar | f7ccbae45c |
sched/headers: Prepare for new header dependencies before moving code to <linux/sched/coredump.h>
We are going to split <linux/sched/coredump.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/coredump.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Denys Vlasenko | 16e72e9b30 |
powerpc: do not make the entire heap executable
On 32-bit powerpc the ELF PLT sections of binaries (built with --bss-plt, or with a toolchain which defaults to it) look like this: [17] .sbss NOBITS 0002aff8 01aff8 000014 00 WA 0 0 4 [18] .plt NOBITS 0002b00c 01aff8 000084 00 WAX 0 0 4 [19] .bss NOBITS 0002b090 01aff8 0000a4 00 WA 0 0 4 Which results in an ELF load header: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x019c70 0x00029c70 0x00029c70 0x01388 0x014c4 RWE 0x10000 This is all correct, the load region containing the PLT is marked as executable. Note that the PLT starts at 0002b00c but the file mapping ends at 0002aff8, so the PLT falls in the 0 fill section described by the load header, and after a page boundary. Unfortunately the generic ELF loader ignores the X bit in the load headers when it creates the 0 filled non-file backed mappings. It assumes all of these mappings are RW BSS sections, which is not the case for PPC. gcc/ld has an option (--secure-plt) to not do this, this is said to incur a small performance penalty. Currently, to support 32-bit binaries with PLT in BSS kernel maps *entire brk area* with executable rights for all binaries, even --secure-plt ones. Stop doing that. Teach the ELF loader to check the X bit in the relevant load header and create 0 filled anonymous mappings that are executable if the load header requests that. Test program showing the difference in /proc/$PID/maps: int main() { char buf[16*1024]; char *p = malloc(123); /* make "[heap]" mapping appear */ int fd = open("/proc/self/maps", O_RDONLY); int len = read(fd, buf, sizeof(buf)); write(1, buf, len); printf("%p\n", p); return 0; } Compiled using: gcc -mbss-plt -m32 -Os test.c -otest Unpatched ppc64 kernel: 00100000-00120000 r-xp 00000000 00:00 0 [vdso] 0fe10000-0ffd0000 r-xp 00000000 fd:00 67898094 /usr/lib/libc-2.17.so 0ffd0000-0ffe0000 r--p 001b0000 fd:00 67898094 /usr/lib/libc-2.17.so 0ffe0000-0fff0000 rw-p 001c0000 fd:00 67898094 /usr/lib/libc-2.17.so 10000000-10010000 r-xp 00000000 fd:00 100674505 /home/user/test 10010000-10020000 r--p 00000000 fd:00 100674505 /home/user/test 10020000-10030000 rw-p 00010000 fd:00 100674505 /home/user/test 10690000-106c0000 rwxp 00000000 00:00 0 [heap] f7f70000-f7fa0000 r-xp 00000000 fd:00 67898089 /usr/lib/ld-2.17.so f7fa0000-f7fb0000 r--p 00020000 fd:00 67898089 /usr/lib/ld-2.17.so f7fb0000-f7fc0000 rw-p 00030000 fd:00 67898089 /usr/lib/ld-2.17.so ffa90000-ffac0000 rw-p 00000000 00:00 0 [stack] 0x10690008 Patched ppc64 kernel: 00100000-00120000 r-xp 00000000 00:00 0 [vdso] 0fe10000-0ffd0000 r-xp 00000000 fd:00 67898094 /usr/lib/libc-2.17.so 0ffd0000-0ffe0000 r--p 001b0000 fd:00 67898094 /usr/lib/libc-2.17.so 0ffe0000-0fff0000 rw-p 001c0000 fd:00 67898094 /usr/lib/libc-2.17.so 10000000-10010000 r-xp 00000000 fd:00 100674505 /home/user/test 10010000-10020000 r--p 00000000 fd:00 100674505 /home/user/test 10020000-10030000 rw-p 00010000 fd:00 100674505 /home/user/test 10180000-101b0000 rw-p 00000000 00:00 0 [heap] ^^^^ this has changed f7c60000-f7c90000 r-xp 00000000 fd:00 67898089 /usr/lib/ld-2.17.so f7c90000-f7ca0000 r--p 00020000 fd:00 67898089 /usr/lib/ld-2.17.so f7ca0000-f7cb0000 rw-p 00030000 fd:00 67898089 /usr/lib/ld-2.17.so ff860000-ff890000 rw-p 00000000 00:00 0 [stack] 0x10180008 The patch was originally posted in 2012 by Jason Gunthorpe and apparently ignored: https://lkml.org/lkml/2012/9/30/138 Lightly run-tested. Link: http://lkml.kernel.org/r/20161215131950.23054-1-dvlasenk@redhat.com Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Florian Weimer <fweimer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Frederic Weisbecker | cd19c364b3 |
fs/binfmt: Convert obsolete cputime type to nsecs
Use the new nsec based cputime accessors as part of the whole cputime conversion from cputime_t to nsecs. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1485832191-26889-12-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Frederic Weisbecker | 5613fda9a5 |
sched/cputime: Convert task/group cputime to nsecs
Now that most cputime readers use the transition API which return the task cputime in old style cputime_t, we can safely store the cputime in nsecs. This will eventually make cputime statistics less opaque and more granular. Back and forth convertions between cputime_t and nsecs in order to deal with cputime_t random granularity won't be needed anymore. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1485832191-26889-8-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Frederic Weisbecker | a1cecf2ba7 |
sched/cputime: Introduce special task_cputime_t() API to return old-typed cputime
This API returns a task's cputime in cputime_t in order to ease the conversion of cputime internals to use nsecs units instead. Blindly converting all cputime readers to use this API now will later let us convert more smoothly and step by step all these places to use the new nsec based cputime. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1485832191-26889-7-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Dave Kleikamp | 4d22c75d4c |
coredump: Ensure proper size of sparse core files
If the last section of a core file ends with an unmapped or zero page, the size of the file does not correspond with the last dump_skip() call. gdb complains that the file is truncated and can be confusing to users. After all of the vma sections are written, make sure that the file size is no smaller than the current file position. This problem can be demonstrated with gdb's bigcore testcase on the sparc architecture. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
|
Linus Torvalds | 7c0f6ba682 |
Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |