mirror of https://gitee.com/openkylin/linux.git
582 Commits
Author | SHA1 | Message | Date |
---|---|---|---|
H.J. Lu | a980ce352f |
x86/build: Build compressed x86 kernels as PIE when !CONFIG_RELOCATABLE as well
Since the bootloader may load the compressed x86 kernel at any address, it should always be built as PIE, not just when CONFIG_RELOCATABLE=y. Otherwise, linker in binutils 2.27 will optimize GOT load into the absolute address when building the compressed x86 kernel as a non-PIE executable. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org [ Small wording changes. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Andy Lutomirski | ed68d7e9b9 |
x86/boot: Fail the boot if !M486 and CPUID is missing
Linux will have all kinds of sporadic problems on systems that don't
have the CPUID instruction unless CONFIG_M486=y. In particular,
sync_core() will explode.
I believe that these kernels had a better chance of working before
commit
|
|
Ingo Molnar | 5465fe0fc3 |
* Refactor the EFI memory map code into architecture neutral files
and allow drivers to permanently reserve EFI boot services regions on x86, as well as ARM/arm64 - Matt Fleming * Add ARM support for the EFI esrt driver - Ard Biesheuvel * Make the EFI runtime services and efivar API interruptible by swapping spinlocks for semaphores - Sylvain Chouleur * Provide the EFI identity mapping for kexec which allows kexec to work on SGI/UV platforms with requiring the "noefi" kernel command line parameter - Alex Thorlton * Add debugfs node to dump EFI page tables on arm64 - Ard Biesheuvel * Merge the EFI test driver being carried out of tree until now in the FWTS project - Ivan Hu * Expand the list of flags for classifying EFI regions as "RAM" on arm64 so we align with the UEFI spec - Ard Biesheuvel * Optimise out the EFI mixed mode if it's unsupported (CONFIG_X86_32) or disabled (CONFIG_EFI_MIXED=n) and switch the early EFI boot services function table for direct calls, alleviating us from having to maintain the custom function table - Lukas Wunner * Miscellaneous cleanups and fixes -----BEGIN PGP SIGNATURE----- iQI2BAABCAAgBQJX0tCTGRxtYXR0QGNvZGVibHVlcHJpbnQuY28udWsACgkQLzhZ wI0jPVWLVBAAn/iM91Vmhggdk3t0wCMJzrNGonw61VJ9TZJVbCUJyiH0qdDUThhj R4rO+6Vf6yOuyswu+mGmae61tfsjwJHH+IPpB8nRLIGQRwzoxk+aGC7FzmQ0ISVO wIdv5shsmeWhFAyNB1D4hzlp1NxOZaqcU/0cfUVGe4HmK0Js3tUpWWx8VgJ7yvW+ X1PBbfyChArGqiwV6FJz/mJxRAgByUfhvYMcX9HhQkou6F4U5Y8l3vlhUMbuAZAi ZfG2LWGYCQ+F4XKxMq2QDAtdUgBzlYWw6W60o55x9WO4cEVSzNVRgedto5o1Zea9 2QGEr94gim+e5cJ/HeDIEmbWZhAqIdcNDqXSSBd1CDVQytp4PNAn6rxk+2S9kxoe T9Mk523HEabo+AZvDAPPJlzcsnIe83JYy69M1xFvcP25ebk7y2BwQtd1jwWPrPDQ Q/llzF93aezUFR/guvIw0oHckhQl0ZkNedL9Tq4+UKL0ibp2X4gSX636/x4PkBSP 5+pyfmO1SAqTiiMQGQMnp4+ngPQeQrxkmVnh1P7cKlTNXg1IoS03t46Xn2Pj10cd 3KneVDeN9DKIAOn7wPKuPnjTho+9FH36xbwTaIgbt0cWuFFfu090rmqOQfjAJEDN foHzsMZ7S6CmeOJnj97NNR8sMQDcc+p9bh1KXpJIHaZAgrKmvqPZpMk= =G7L6 -----END PGP SIGNATURE----- Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into efi/core Pull EFI updates from Matt Fleming: "* Refactor the EFI memory map code into architecture neutral files and allow drivers to permanently reserve EFI boot services regions on x86, as well as ARM/arm64 - Matt Fleming * Add ARM support for the EFI esrt driver - Ard Biesheuvel * Make the EFI runtime services and efivar API interruptible by swapping spinlocks for semaphores - Sylvain Chouleur * Provide the EFI identity mapping for kexec which allows kexec to work on SGI/UV platforms with requiring the "noefi" kernel command line parameter - Alex Thorlton * Add debugfs node to dump EFI page tables on arm64 - Ard Biesheuvel * Merge the EFI test driver being carried out of tree until now in the FWTS project - Ivan Hu * Expand the list of flags for classifying EFI regions as "RAM" on arm64 so we align with the UEFI spec - Ard Biesheuvel * Optimise out the EFI mixed mode if it's unsupported (CONFIG_X86_32) or disabled (CONFIG_EFI_MIXED=n) and switch the early EFI boot services function table for direct calls, alleviating us from having to maintain the custom function table - Lukas Wunner * Miscellaneous cleanups and fixes" Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Lukas Wunner | 0a637ee612 |
x86/efi: Allow invocation of arbitrary boot services
We currently allow invocation of 8 boot services with efi_call_early(). Not included are LocateHandleBuffer and LocateProtocol in particular. For graphics output or to retrieve PCI ROMs and Apple device properties, we're thus forced to use the LocateHandle + AllocatePool + LocateHandle combo, which is cumbersome and needs more code. The ARM folks allow invocation of the full set of boot services but are restricted to our 8 boot services in functions shared across arches. Thus, rather than adding just LocateHandleBuffer and LocateProtocol to struct efi_config, let's rework efi_call_early() to allow invocation of arbitrary boot services by selecting the 64 bit vs 32 bit code path in the macro itself. When compiling for 32 bit or for 64 bit without mixed mode, the unused code path is optimized away and the binary code is the same as before. But on 64 bit with mixed mode enabled, this commit adds one compare instruction to each invocation of a boot service and, depending on the code path selected, two jump instructions. (Most of the time gcc arranges the jumps in the 32 bit code path.) The result is a minuscule performance penalty and the binary code becomes slightly larger and more difficult to read when disassembled. This isn't a hot path, so these drawbacks are arguably outweighed by the attainable simplification of the C code. We have some overhead anyway for thunking or conversion between calling conventions. The 8 boot services can consequently be removed from struct efi_config. No functional change intended (for now). Example -- invocation of free_pool before (64 bit code path): 0x2d4 movq %ds:efi_early, %rdx ; efi_early 0x2db movq %ss:arg_0-0x20(%rsp), %rsi 0x2e0 xorl %eax, %eax 0x2e2 movq %ds:0x28(%rdx), %rdi ; efi_early->free_pool 0x2e6 callq *%ds:0x58(%rdx) ; efi_early->call() Example -- invocation of free_pool after (64 / 32 bit mixed code path): 0x0dc movq %ds:efi_early, %rax ; efi_early 0x0e3 cmpb $0, %ds:0x28(%rax) ; !efi_early->is64 ? 0x0e7 movq %ds:0x20(%rax), %rdx ; efi_early->call() 0x0eb movq %ds:0x10(%rax), %rax ; efi_early->boot_services 0x0ef je $0x150 0x0f1 movq %ds:0x48(%rax), %rdi ; free_pool (64 bit) 0x0f5 xorl %eax, %eax 0x0f7 callq *%rdx ... 0x150 movl %ds:0x30(%rax), %edi ; free_pool (32 bit) 0x153 jmp $0x0f5 Size of eboot.o text section: CONFIG_X86_32: 6464 before, 6318 after CONFIG_X86_64 && !CONFIG_EFI_MIXED: 7670 before, 7573 after CONFIG_X86_64 && CONFIG_EFI_MIXED: 7670 before, 8319 after Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> |
|
Lukas Wunner | 15cf7cae08 |
x86/efi: Remove unused find_bits() function
Left behind by commit
|
|
Colin Ian King | ac0e94b63e |
x86/efi: Initialize status to ensure garbage is not returned on small size
Although very unlikey, if size is too small or zero, then we end up with status not being set and returning garbage. Instead, initializing status to EFI_INVALID_PARAMETER to indicate that size is invalid in the calls to setup_uga32 and setup_uga64. Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> |
|
Jeffrey Hugo | d64934019f |
x86/efi: Use efi_exit_boot_services()
The eboot code directly calls ExitBootServices. This is inadvisable as the UEFI spec details a complex set of errors, race conditions, and API interactions that the caller of ExitBootServices must get correct. The eboot code attempts allocations after calling ExitBootSerives which is not permitted per the spec. Call the efi_exit_boot_services() helper intead, which handles the allocation scenario properly. Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Leif Lindholm <leif.lindholm@linaro.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> |
|
Jeffrey Hugo | dadb57abc3 |
efi/libstub: Allocate headspace in efi_get_memory_map()
efi_get_memory_map() allocates a buffer to store the memory map that it retrieves. This buffer may need to be reused by the client after ExitBootServices() is called, at which point allocations are not longer permitted. To support this usecase, provide the allocated buffer size back to the client, and allocate some additional headroom to account for any reasonable growth in the map that is likely to happen between the call to efi_get_memory_map() and the client reusing the buffer. Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Leif Lindholm <leif.lindholm@linaro.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> |
|
Linus Torvalds | f716a85cd6 |
Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kbuild updates from Michal Marek: - GCC plugin support by Emese Revfy from grsecurity, with a fixup from Kees Cook. The plugins are meant to be used for static analysis of the kernel code. Two plugins are provided already. - reduction of the gcc commandline by Arnd Bergmann. - IS_ENABLED / IS_REACHABLE macro enhancements by Masahiro Yamada - bin2c fix by Michael Tautschnig - setlocalversion fix by Wolfram Sang * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: gcc-plugins: disable under COMPILE_TEST kbuild: Abort build on bad stack protector flag scripts: Fix size mismatch of kexec_purgatory_size kbuild: make samples depend on headers_install Kbuild: don't add obj tree in additional includes Kbuild: arch: look for generated headers in obtree Kbuild: always prefix objtree in LINUXINCLUDE Kbuild: avoid duplicate include path Kbuild: don't add ../../ to include path vmlinux.lds.h: replace config_enabled() with IS_ENABLED() kconfig.h: allow to use IS_{ENABLE,REACHABLE} in macro expansion kconfig.h: use already defined macros for IS_REACHABLE() define export.h: use __is_defined() to check if __KSYM_* is defined kconfig.h: use __is_defined() to check if MODULE is defined kbuild: setlocalversion: print error to STDERR Add sancov plugin Add Cyclomatic complexity GCC plugin GCC plugin infrastructure Shared library support |
|
Linus Torvalds | 77cd3d0c43 |
Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar: "The main changes: - add initial commits to randomize kernel memory section virtual addresses, enabled via a new kernel option: RANDOMIZE_MEMORY (Thomas Garnier, Kees Cook, Baoquan He, Yinghai Lu) - enhance KASLR (RANDOMIZE_BASE) physical memory randomization (Kees Cook) - EBDA/BIOS region boot quirk cleanups (Andy Lutomirski, Ingo Molnar) - misc cleanups/fixes" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Simplify EBDA-vs-BIOS reservation logic x86/boot: Clarify what x86_legacy_features.reserve_bios_regions does x86/boot: Reorganize and clean up the BIOS area reservation code x86/mm: Do not reference phys addr beyond kernel x86/mm: Add memory hotplug support for KASLR memory randomization x86/mm: Enable KASLR for vmalloc memory regions x86/mm: Enable KASLR for physical mapping memory regions x86/mm: Implement ASLR for kernel memory regions x86/mm: Separate variable for trampoline PGD x86/mm: Add PUD VA support for physical mapping x86/mm: Update physical mapping variable names x86/mm: Refactor KASLR entropy functions x86/KASLR: Fix boot crash with certain memory configurations x86/boot/64: Add forgotten end of function marker x86/KASLR: Allow randomization below the load address x86/KASLR: Extend kernel image physical address randomization to addresses larger than 4G x86/KASLR: Randomize virtual address separately x86/KASLR: Clarify identity map interface x86/boot: Refuse to build with data relocations x86/KASLR, x86/power: Remove x86 hibernation restrictions |
|
Linus Torvalds | 0f657262d5 |
Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar: "Various x86 low level modifications: - preparatory work to support virtually mapped kernel stacks (Andy Lutomirski) - support for 64-bit __get_user() on 32-bit kernels (Benjamin LaHaise) - (involved) workaround for Knights Landing CPU erratum (Dave Hansen) - MPX enhancements (Dave Hansen) - mremap() extension to allow remapping of the special VDSO vma, for purposes of user level context save/restore (Dmitry Safonov) - hweight and entry code cleanups (Borislav Petkov) - bitops code generation optimizations and cleanups with modern GCC (H. Peter Anvin) - syscall entry code optimizations (Paolo Bonzini)" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits) x86/mm/cpa: Add missing comment in populate_pdg() x86/mm/cpa: Fix populate_pgd(): Stop trying to deallocate failed PUDs x86/syscalls: Add compat_sys_preadv64v2/compat_sys_pwritev64v2 x86/smp: Remove unnecessary initialization of thread_info::cpu x86/smp: Remove stack_smp_processor_id() x86/uaccess: Move thread_info::addr_limit to thread_struct x86/dumpstack: Rename thread_struct::sig_on_uaccess_error to sig_on_uaccess_err x86/uaccess: Move thread_info::uaccess_err and thread_info::sig_on_uaccess_err to thread_struct x86/dumpstack: When OOPSing, rewind the stack before do_exit() x86/mm/64: In vmalloc_fault(), use CR3 instead of current->active_mm x86/dumpstack/64: Handle faults when printing the "Stack: " part of an OOPS x86/dumpstack: Try harder to get a call trace on stack overflow x86/mm: Remove kernel_unmap_pages_in_pgd() and efi_cleanup_page_tables() x86/mm/cpa: In populate_pgd(), don't set the PGD entry until it's populated x86/mm/hotplug: Don't remove PGD entries in remove_pagetable() x86/mm: Use pte_none() to test for empty PTE x86/mm: Disallow running with 32-bit PTEs to work around erratum x86/mm: Ignore A/D bits in pte/pmd/pud_none() x86/mm: Move swap offset/type up in PTE to work around erratum x86/entry: Inline enter_from_user_mode() ... |
|
Arnd Bergmann | 58ab5e0c2c |
Kbuild: arch: look for generated headers in obtree
There are very few files that need add an -I$(obj) gcc for the preprocessor or the assembler. For C files, we add always these for both the objtree and srctree, but for the other ones we require the Makefile to add them, and Kbuild then adds it for both trees. As a preparation for changing the meaning of the -I$(obj) directive to only refer to the srctree, this changes the two instances in arch/x86 to use an explictit $(objtree) prefix where needed, otherwise we won't find the headers any more, as reported by the kbuild 0day builder. arch/x86/realmode/rm/realmode.lds.S:75:20: fatal error: pasyms.h: No such file or directory Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michal Marek <mmarek@suse.com> |
|
Ingo Molnar | 38452af242 |
Merge branch 'x86/asm' into x86/mm, to resolve conflicts
Conflicts: tools/testing/selftests/x86/Makefile Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Dave Hansen | e4a84be6f0 |
x86/mm: Disallow running with 32-bit PTEs to work around erratum
The Intel(R) Xeon Phi(TM) Processor x200 Family (codename: Knights Landing) has an erratum where a processor thread setting the Accessed or Dirty bits may not do so atomically against its checks for the Present bit. This may cause a thread (which is about to page fault) to set A and/or D, even though the Present bit had already been atomically cleared. These bits are truly "stray". In the case of the Dirty bit, the thread associated with the stray set was *not* allowed to write to the page. This means that we do not have to launder the bit(s); we can simply ignore them. If the PTE is used for storing a swap index or a NUMA migration index, the A bit could be misinterpreted as part of the swap type. The stray bits being set cause a software-cleared PTE to be interpreted as a swap entry. In some cases (like when the swap index ends up being for a non-existent swapfile), the kernel detects the stray value and WARN()s about it, but there is no guarantee that the kernel can always detect it. When we have 64-bit PTEs (64-bit mode or 32-bit PAE), we were able to move the swap PTE format around to avoid these troublesome bits. But, 32-bit non-PAE is tight on bits. So, disallow it from running on this hardware. I can't imagine anyone wanting to run 32-bit non-highmem kernels on this hardware, but disallowing them from running entirely is surely the safe thing to do. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: dave.hansen@intel.com Cc: linux-mm@kvack.org Cc: mhocko@suse.com Link: http://lkml.kernel.org/r/20160708001914.D0B50110@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Thomas Garnier | 021182e52f |
x86/mm: Enable KASLR for physical mapping memory regions
Add the physical mapping in the list of randomized memory regions. The physical memory mapping holds most allocations from boot and heap allocators. Knowing the base address and physical memory size, an attacker can deduce the PDE virtual address for the vDSO memory page. This attack was demonstrated at CanSecWest 2016, in the following presentation: "Getting Physical: Extreme Abuse of Intel Based Paged Systems": https://github.com/n3k/CansecWest2016_Getting_Physical_Extreme_Abuse_of_Intel_Based_Paging_Systems/blob/master/Presentation/CanSec2016_Presentation.pdf (See second part of the presentation). The exploits used against Linux worked successfully against 4.6+ but fail with KASLR memory enabled: https://github.com/n3k/CansecWest2016_Getting_Physical_Extreme_Abuse_of_Intel_Based_Paging_Systems/tree/master/Demos/Linux/exploits Similar research was done at Google leading to this patch proposal. Variants exists to overwrite /proc or /sys objects ACLs leading to elevation of privileges. These variants were tested against 4.6+. The page offset used by the compressed kernel retains the static value since it is not yet randomized during this boot stage. Signed-off-by: Thomas Garnier <thgarnie@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Alexander Popov <alpopov@ptsecurity.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Baoquan He <bhe@redhat.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lv Zheng <lv.zheng@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: kernel-hardening@lists.openwall.com Cc: linux-doc@vger.kernel.org Link: http://lkml.kernel.org/r/1466556426-32664-7-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Thomas Garnier | d899a7d146 |
x86/mm: Refactor KASLR entropy functions
Move the KASLR entropy functions into arch/x86/lib to be used in early kernel boot for KASLR memory randomization. Signed-off-by: Thomas Garnier <thgarnie@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Alexander Popov <alpopov@ptsecurity.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Baoquan He <bhe@redhat.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lv Zheng <lv.zheng@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: kernel-hardening@lists.openwall.com Cc: linux-doc@vger.kernel.org Link: http://lkml.kernel.org/r/1466556426-32664-2-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Baoquan He | 6daa2ec0b3 |
x86/KASLR: Fix boot crash with certain memory configurations
Ye Xiaolong reported this boot crash: | | XZ-compressed data is corrupt | | -- System halted | Fix the bug in mem_avoid_overlap() of finding the earliest overlap. Reported-and-tested-by: Ye Xiaolong <xiaolong.ye@intel.com> Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Colin Ian King | f6d1747f89 |
x86/efi: Remove unused variable 'efi'
Remove unused variable 'efi', it is never used. This fixes the following clang build warning: arch/x86/boot/compressed/eboot.c:803:2: warning: Value stored to 'efi' is never read Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1466839230-12781-4-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Yinghai Lu | e066cc4777 |
x86/KASLR: Allow randomization below the load address
Currently the kernel image physical address randomization's lower boundary is the original kernel load address. For bootloaders that load kernels into very high memory (e.g. kexec), this means randomization takes place in a very small window at the top of memory, ignoring the large region of physical memory below the load address. Since mem_avoid[] is already correctly tracking the regions that must be avoided, this patch changes the minimum address to whatever is less: 512M (to conservatively avoid unknown things in lower memory) or the load address. Now, for example, if the kernel is loaded at 8G, [512M, 8G) will be added to the list of possible physical memory positions. Signed-off-by: Yinghai Lu <yinghai@kernel.org> [ Rewrote the changelog, refactored the code to use min(). ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1464216334-17200-6-git-send-email-keescook@chromium.org [ Edited the changelog some more, plus the code comment as well. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Kees Cook | ed9f007ee6 |
x86/KASLR: Extend kernel image physical address randomization to addresses larger than 4G
We want the physical address to be randomized anywhere between 16MB and the top of physical memory (up to 64TB). This patch exchanges the prior slots[] array for the new slot_areas[] array, and lifts the limitation of KERNEL_IMAGE_SIZE on the physical address offset for 64-bit. As before, process_e820_entry() walks memory and populates slot_areas[], splitting on any detected mem_avoid collisions. Finally, since the slots[] array and its associated functions are not needed any more, so they are removed. Based on earlier patches by Baoquan He. Originally-from: Baoquan He <bhe@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1464216334-17200-5-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Baoquan He | 8391c73c96 |
x86/KASLR: Randomize virtual address separately
The current KASLR implementation randomizes the physical and virtual addresses of the kernel together (both are offset by the same amount). It calculates the delta of the physical address where vmlinux was linked to load and where it is finally loaded. If the delta is not equal to 0 (i.e. the kernel was relocated), relocation handling needs be done. On 64-bit, this patch randomizes both the physical address where kernel is decompressed and the virtual address where kernel text is mapped and will execute from. We now have two values being chosen, so the function arguments are reorganized to pass by pointer so they can be directly updated. Since relocation handling only depends on the virtual address, we must check the virtual delta, not the physical delta for processing kernel relocations. This also populates the page table for the new virtual address range. 32-bit does not support a separate virtual address, so it continues to use the physical offset for its virtual offset. Additionally updates the sanity checks done on the resulting kernel addresses since they are potentially separate now. [kees: rewrote changelog, limited virtual split to 64-bit only, update checks] [kees: fix CONFIG_RANDOMIZE_BASE=n boot failure] Signed-off-by: Baoquan He <bhe@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1464216334-17200-4-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Kees Cook | 11fdf97a3c |
x86/KASLR: Clarify identity map interface
This extracts the call to prepare_level4() into a top-level function that the user of the pagetable.c interface must call to initialize the new page tables. For clarity and to match the "finalize" function, it has been renamed to initialize_identity_maps(). This function also gains the initialization of mapping_info so we don't have to do it each time in add_identity_map(). Additionally add copyright notice to the top, to make it clear that the bulk of the pagetable.c code was written by Yinghai, and that I just added bugs later. :) Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1464216334-17200-3-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Kees Cook | 98f7852537 |
x86/boot: Refuse to build with data relocations
The compressed kernel is built with -fPIC/-fPIE so that it can run in any location a bootloader happens to put it. However, since ELF relocation processing is not happening (and all the relocation information has already been stripped at link time), none of the code can use data relocations (e.g. static assignments of pointers). This is already noted in a warning comment at the top of misc.c, but this adds an explicit check for the condition during the linking stage to block any such bugs from appearing. If this was in place with the earlier bug in pagetable.c, the build would fail like this: ... CC arch/x86/boot/compressed/pagetable.o DATAREL arch/x86/boot/compressed/vmlinux error: arch/x86/boot/compressed/pagetable.o has data relocations! make[2]: *** [arch/x86/boot/compressed/vmlinux] Error 1 ... A clean build shows: ... CC arch/x86/boot/compressed/pagetable.o DATAREL arch/x86/boot/compressed/vmlinux LD arch/x86/boot/compressed/vmlinux ... Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1464216334-17200-2-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Kees Cook | 65fe935dd2 |
x86/KASLR, x86/power: Remove x86 hibernation restrictions
With the following fix: 70595b479ce1 ("x86/power/64: Fix crash whan the hibernation code passes control to the image kernel") ... there is no longer a problem with hibernation resuming a KASLR-booted kernel image, so remove the restriction. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Linux PM list <linux-pm@vger.kernel.org> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-doc@vger.kernel.org Link: http://lkml.kernel.org/r/20160613221002.GA29719@www.outflux.net Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Ingo Molnar | 50c0587eed |
Merge branch 'linus' into x86/asm, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
H. Peter Anvin | 66928b4eb9 |
x86, asm, boot: Use CC_SET()/CC_OUT() in arch/x86/boot/boot.h
Remove open-coded uses of set instructions to use CC_SET()/CC_OUT() in arch/x86/boot/boot.h. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1465414726-197858-10-git-send-email-hpa@linux.intel.com Reviewed-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> |
|
H. Peter Anvin | 117780eef7 |
x86, asm: use bool for bitops and other assembly outputs
The gcc people have confirmed that using "bool" when combined with inline assembly always is treated as a byte-sized operand that can be assumed to be 0 or 1, which is exactly what the SET instruction emits. Change the output types and intermediate variables of as many operations as practical to "bool". Signed-off-by: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/r/1465414726-197858-3-git-send-email-hpa@linux.intel.com Reviewed-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> |
|
H. Peter Anvin | 9c77679cad |
x86, build: copy ldlinux.c32 to image.iso
For newer versions of Syslinux, we need ldlinux.c32 in addition to isolinux.bin to reside on the boot disk, so if the latter is found, copy it, too, to the isoimage tree. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Linux Stable Tree <stable@vger.kernel.org> |
|
Linus Torvalds | 5b26fc8824 |
Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kbuild updates from Michal Marek: - new option CONFIG_TRIM_UNUSED_KSYMS which does a two-pass build and unexports symbols which are not used in the current config [Nicolas Pitre] - several kbuild rule cleanups [Masahiro Yamada] - warning option adjustments for gcov etc [Arnd Bergmann] - a few more small fixes * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (31 commits) kbuild: move -Wunused-const-variable to W=1 warning level kbuild: fix if_change and friends to consider argument order kbuild: fix adjust_autoksyms.sh for modules that need only one symbol kbuild: fix ksym_dep_filter when multiple EXPORT_SYMBOL() on the same line gcov: disable -Wmaybe-uninitialized warning gcov: disable tree-loop-im to reduce stack usage gcov: disable for COMPILE_TEST Kbuild: disable 'maybe-uninitialized' warning for CONFIG_PROFILE_ALL_BRANCHES Kbuild: change CC_OPTIMIZE_FOR_SIZE definition kbuild: forbid kernel directory to contain spaces and colons kbuild: adjust ksym_dep_filter for some cmd_* renames kbuild: Fix dependencies for final vmlinux link kbuild: better abstract vmlinux sequential prerequisites kbuild: fix call to adjust_autoksyms.sh when output directory specified kbuild: Get rid of KBUILD_STR kbuild: rename cmd_as_s_S to cmd_cpp_s_S kbuild: rename cmd_cc_i_c to cmd_cpp_i_c kbuild: drop redundant "PHONY += FORCE" kbuild: delete unnecessary "@:" kbuild: mark help target as PHONY ... |
|
Linus Torvalds | 9a45f036af |
Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar: "The biggest changes in this cycle were: - prepare for more KASLR related changes, by restructuring, cleaning up and fixing the existing boot code. (Kees Cook, Baoquan He, Yinghai Lu) - simplifly/concentrate subarch handling code, eliminate paravirt_enabled() usage. (Luis R Rodriguez)" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits) x86/KASLR: Clarify purpose of each get_random_long() x86/KASLR: Add virtual address choosing function x86/KASLR: Return earliest overlap when avoiding regions x86/KASLR: Add 'struct slot_area' to manage random_addr slots x86/boot: Add missing file header comments x86/KASLR: Initialize mapping_info every time x86/boot: Comment what finalize_identity_maps() does x86/KASLR: Build identity mappings on demand x86/boot: Split out kernel_ident_mapping_init() x86/boot: Clean up indenting for asm/boot.h x86/KASLR: Improve comments around the mem_avoid[] logic x86/boot: Simplify pointer casting in choose_random_location() x86/KASLR: Consolidate mem_avoid[] entries x86/boot: Clean up pointer casting x86/boot: Warn on future overlapping memcpy() use x86/boot: Extract error reporting functions x86/boot: Correctly bounds-check relocations x86/KASLR: Clean up unused code from old 'run_size' and rename it to 'kernel_total_size' x86/boot: Fix "run_size" calculation x86/boot: Calculate decompression size during boot not build ... |
|
Kees Cook | d2d3462f9f |
x86/KASLR: Clarify purpose of each get_random_long()
KASLR will be calling get_random_long() twice, but the debug output won't distinguishing between them. This patch adds a report on when it is fetching the physical vs virtual address. With this, once the virtual offset is separate, the report changes from: KASLR using RDTSC... KASLR using RDTSC... into: Physical KASLR using RDTSC... Virtual KASLR using RDTSC... Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: kernel-hardening@lists.openwall.com Cc: lasse.collin@tukaani.org Link: http://lkml.kernel.org/r/1462825332-10505-7-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Baoquan He | 071a74930e |
x86/KASLR: Add virtual address choosing function
To support randomizing the kernel virtual address separately from the physical address, this patch adds find_random_virt_addr() to choose a slot anywhere between LOAD_PHYSICAL_ADDR and KERNEL_IMAGE_SIZE. Since this address is virtual, not physical, we can place the kernel anywhere in this region, as long as it is aligned and (in the case of kernel being larger than the slot size) placed with enough room to load the entire kernel image. For clarity and readability, find_random_addr() is renamed to find_random_phys_addr() and has "size" renamed to "image_size" to match find_random_virt_addr(). Signed-off-by: Baoquan He <bhe@redhat.com> [ Rewrote changelog, refactored slot calculation for readability. ] [ Renamed find_random_phys_addr() and size argument. ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: kernel-hardening@lists.openwall.com Cc: lasse.collin@tukaani.org Link: http://lkml.kernel.org/r/1462825332-10505-6-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Kees Cook | 06486d6c97 |
x86/KASLR: Return earliest overlap when avoiding regions
In preparation for being able to detect where to split up contiguous memory regions that overlap with memory regions to avoid, we need to pass back what the earliest overlapping region was. This modifies the overlap checker to return that information. Based on a separate mem_min_overlap() implementation by Baoquan He. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: kernel-hardening@lists.openwall.com Cc: lasse.collin@tukaani.org Link: http://lkml.kernel.org/r/1462825332-10505-5-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Baoquan He | c401cf1524 |
x86/KASLR: Add 'struct slot_area' to manage random_addr slots
In order to support KASLR moving the kernel anywhere in physical memory (which could be up to 64TB), we need to handle counting the potential randomization locations in a more efficient manner. In the worst case with 64TB, there could be roughly 32 * 1024 * 1024 randomization slots if CONFIG_PHYSICAL_ALIGN is 0x1000000. Currently the starting address of candidate positions is stored into the slots[] array, one at a time. This method would cost too much memory and it's also very inefficient to get and save the slot information into the slot array one by one. This patch introduces 'struct slot_area' to manage each contiguous region of randomization slots. Each slot_area will contain the starting address and how many available slots are in this area. As with the original code, the slot_areas[] will avoid the mem_avoid[] regions. Since setup_data is a linked list, it could contain an unknown number of memory regions to be avoided, which could cause us to fragment the contiguous memory that the slot_area array is tracking. In normal operation this level of fragmentation will be extremely rare, but we choose a suitably large value (100) for the array. If setup_data forces the slot_area array to become highly fragmented and there are more slots available beyond the first 100 found, the rest will be ignored for KASLR selection. The function store_slot_info() is used to calculate the number of slots available in the passed-in memory region and stores it into slot_areas[] after adjusting for alignment and size requirements. Signed-off-by: Baoquan He <bhe@redhat.com> [ Rewrote changelog, squashed with new functions. ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: kernel-hardening@lists.openwall.com Cc: lasse.collin@tukaani.org Link: http://lkml.kernel.org/r/1462825332-10505-4-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Kees Cook | cb18ef0da2 |
x86/boot: Add missing file header comments
There were some files with missing header comments. Since they are included from both compressed and regular kernels, make note of that. Also corrects a typo in the mem_avoid comments. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: kernel-hardening@lists.openwall.com Cc: lasse.collin@tukaani.org Link: http://lkml.kernel.org/r/1462825332-10505-3-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Kees Cook | 434a6c9f90 |
x86/KASLR: Initialize mapping_info every time
As it turns out, mapping_info DOES need to be initialized every time, because pgt_data address could be changed during kernel relocation. So it can not be build time assigned. Without this, page tables were not being corrected updated, which could cause reboots when a physical address beyond 2G was chosen. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: kernel-hardening@lists.openwall.com Cc: lasse.collin@tukaani.org Link: http://lkml.kernel.org/r/1462825332-10505-2-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Borislav Petkov | 36a39ac967 |
x86/boot: Comment what finalize_identity_maps() does
So it is not really obvious that finalize_identity_maps() doesn't do any finalization but it *actually* writes CR3 with the ident PGD. Comment that at the call site. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: bhe@redhat.com Cc: dyoung@redhat.com Cc: jkosina@suse.cz Cc: linux-tip-commits@vger.kernel.org Cc: luto@kernel.org Cc: vgoyal@redhat.com Cc: yinghai@kernel.org Link: http://lkml.kernel.org/r/20160507100541.GA24613@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Kees Cook | 3a94707d7a |
x86/KASLR: Build identity mappings on demand
Currently KASLR only supports relocation in a small physical range (from 16M to 1G), due to using the initial kernel page table identity mapping. To support ranges above this, we need to have an identity mapping for the desired memory range before we can decompress (and later run) the kernel. 32-bit kernels already have the needed identity mapping. This patch adds identity mappings for the needed memory ranges on 64-bit kernels. This happens in two possible boot paths: If loaded via startup_32(), we need to set up the needed identity map. If loaded from a 64-bit bootloader, the bootloader will have already set up an identity mapping, and we'll start via the compressed kernel's startup_64(). In this case, the bootloader's page tables need to be avoided while selecting the new uncompressed kernel location. If not, the decompressor could overwrite them during decompression. To accomplish this, we could walk the pagetable and find every page that is used, and add them to mem_avoid, but this needs extra code and will require increasing the size of the mem_avoid array. Instead, we can create a new set of page tables for our own identity mapping instead. The pages for the new page table will come from the _pagetable section of the compressed kernel, which means they are already contained by in mem_avoid array. To do this, we reuse the code from the uncompressed kernel's identity mapping routines. The _pgtable will be shared by both the 32-bit and 64-bit paths to reduce init_size, as now the compressed kernel's _rodata to _end will contribute to init_size. To handle the possible mappings, we need to increase the existing page table buffer size: When booting via startup_64(), we need to cover the old VO, params, cmdline and uncompressed kernel. In an extreme case we could have them all beyond the 512G boundary, which needs (2+2)*4 pages with 2M mappings. And we'll need 2 for first 2M for VGA RAM. One more is needed for level4. This gets us to 19 pages total. When booting via startup_32(), KASLR could move the uncompressed kernel above 4G, so we need to create extra identity mappings, which should only need (2+2) pages at most when it is beyond the 512G boundary. So 19 pages is sufficient for this case as well. The resulting BOOT_*PGT_SIZE defines use the "_SIZE" suffix on their names to maintain logical consistency with the existing BOOT_HEAP_SIZE and BOOT_STACK_SIZE defines. This patch is based on earlier patches from Yinghai Lu and Baoquan He. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: kernel-hardening@lists.openwall.com Cc: lasse.collin@tukaani.org Link: http://lkml.kernel.org/r/1462572095-11754-4-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Kees Cook | ed09acde44 |
x86/KASLR: Improve comments around the mem_avoid[] logic
This attempts to improve the comments that describe how the memory range used for decompression is avoided. Additionally uses an enum instead of raw numbers for the mem_avoid[] indexing. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20160506194459.GA16480@www.outflux.net Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Borislav Petkov | 549f90db68 |
x86/boot: Simplify pointer casting in choose_random_location()
Pass them down as 'unsigned long' directly and get rid of more casting and assignments. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: bhe@redhat.com Cc: dyoung@redhat.com Cc: linux-tip-commits@vger.kernel.org Cc: luto@kernel.org Cc: vgoyal@redhat.com Cc: yinghai@kernel.org Link: http://lkml.kernel.org/r/20160506115015.GI24044@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Yinghai Lu | 9dc1969c24 |
x86/KASLR: Consolidate mem_avoid[] entries
The mem_avoid[] array is used to track positions that should be avoided (like the compressed kernel, decompression code, etc) when selecting a memory position for the randomly relocated kernel. Since ZO is now at the end of the decompression buffer and the decompression code (and its heap and stack) are at the front, we can safely consolidate the decompression entry, the heap entry, and the stack entry. The boot_params memory, however, could be elsewhere, so it should be explicitly included. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Baoquan He <bhe@redhat.com> [ Rwrote changelog, cleaned up code comments. ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: kernel-hardening@lists.openwall.com Cc: lasse.collin@tukaani.org Link: http://lkml.kernel.org/r/1462486436-3707-3-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Kees Cook | 2bc1cd39fa |
x86/boot: Clean up pointer casting
Currently extract_kernel() defines the input and output buffer pointers as "unsigned char *" since that's effectively what they are. It passes these to the decompressor routine and to the ELF parser, which both logically deal with buffer pointers too. There is some casting ("unsigned long") done to validate the numerical value of the pointers, but it is relatively limited. However, choose_random_location() operates almost exclusively on the numerical representation of these pointers, so it ended up carrying a lot of "unsigned long" casts. With the future physical/virtual split these casts were going to multiply, so this attempts to solve the problem by doing all the casting in choose_random_location()'s entry and return instead of through-out the code. Adjusts argument names to be more meaningful, and changes one us of "choice" to "output" to make the future physical/virtual split more clear (i.e. "choice" should be strictly a function return value and not used as an intermediate). Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: kernel-hardening@lists.openwall.com Cc: lasse.collin@tukaani.org Link: http://lkml.kernel.org/r/1462486436-3707-2-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Kees Cook | 00ec2c3703 |
x86/boot: Warn on future overlapping memcpy() use
If an overlapping memcpy() is ever attempted, we should at least report it, in case it might lead to problems, so it could be changed to a memmove() call instead. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Lasse Collin <lasse.collin@tukaani.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1462229461-3370-3-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Kees Cook | dc425a6e14 |
x86/boot: Extract error reporting functions
Currently to use warn(), a caller would need to include misc.h. However, this means they would get the (unavailable during compressed boot) gcc built-in memcpy family of functions. But since string.c is defining these memcpy functions for use by misc.c, we end up in a weird circular dependency. To break this loop, move the error reporting functions outside of misc.c with their own header so that they can be independently included by other sources. Since the screen-writing routines use memmove(), keep the low-level *_putstr() functions in misc.c. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Lasse Collin <lasse.collin@tukaani.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1462229461-3370-2-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Yinghai Lu | 4abf061bf8 |
x86/boot: Correctly bounds-check relocations
Relocation handling performs bounds checking on the resulting calculated addresses. The existing code uses output_len (VO size plus relocs size) as the max address. This is not right since the max_addr check should stop at the end of VO and exclude bss, brk, etc, which follows. The valid range should be VO [_text, __bss_start] in the loaded physical address space. This patch adds an export for __bss_start in voffset.h and uses it to set the correct limit for max_addr. Signed-off-by: Yinghai Lu <yinghai@kernel.org> [ Rewrote the changelog. ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: lasse.collin@tukaani.org Link: http://lkml.kernel.org/r/1461888548-32439-7-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Yinghai Lu | 4d2d542482 |
x86/KASLR: Clean up unused code from old 'run_size' and rename it to 'kernel_total_size'
Since 'run_size' is now calculated in misc.c, the old script and associated argument passing is no longer needed. This patch removes them, and renames 'run_size' to the more descriptive 'kernel_total_size'. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Baoquan He <bhe@redhat.com> [ Rewrote the changelog, renamed 'run_size' to 'kernel_total_size' ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Junjie Mao <eternal.n08@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: lasse.collin@tukaani.org Link: http://lkml.kernel.org/r/1461888548-32439-6-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Yinghai Lu | 67b6662559 |
x86/boot: Fix "run_size" calculation
Currently, the "run_size" variable holds the total kernel size
(size of code plus brk and bss) and is calculated via the shell script
arch/x86/tools/calc_run_size.sh. It gets the file offset and mem size
of the .bss and .brk sections from the vmlinux, and adds them as follows:
run_size = $(( $offsetA + $sizeA + $sizeB ))
However, this is not correct (it is too large). To illustrate, here's
a walk-through of the script's calculation, compared to the correct way
to find it.
First, offsetA is found as the starting address of the first .bss or
.brk section seen in the ELF file. The sizeA and sizeB values are the
respective section sizes.
[bhe@x1 linux]$ objdump -h vmlinux
vmlinux: file format elf64-x86-64
Sections:
Idx Name Size VMA LMA File off Algn
27 .bss 00170000 ffffffff81ec8000 0000000001ec8000 012c8000 2**12
ALLOC
28 .brk 00027000 ffffffff82038000 0000000002038000 012c8000 2**0
ALLOC
Here, offsetA is 0x012c8000, with sizeA at 0x00170000 and sizeB at
0x00027000. The resulting run_size is 0x145f000:
0x012c8000 + 0x00170000 + 0x00027000 = 0x145f000
However, if we instead examine the ELF LOAD program headers, we see a
different picture.
[bhe@x1 linux]$ readelf -l vmlinux
Elf file type is EXEC (Executable file)
Entry point 0x1000000
There are 5 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000200000 0xffffffff81000000 0x0000000001000000
0x0000000000b5e000 0x0000000000b5e000 R E 200000
LOAD 0x0000000000e00000 0xffffffff81c00000 0x0000000001c00000
0x0000000000145000 0x0000000000145000 RW 200000
LOAD 0x0000000001000000 0x0000000000000000 0x0000000001d45000
0x0000000000018158 0x0000000000018158 RW 200000
LOAD 0x000000000115e000 0xffffffff81d5e000 0x0000000001d5e000
0x000000000016a000 0x0000000000301000 RWE 200000
NOTE 0x000000000099bcac 0xffffffff8179bcac 0x000000000179bcac
0x00000000000001bc 0x00000000000001bc 4
Section to Segment mapping:
Segment Sections...
00 .text .notes __ex_table .rodata __bug_table .pci_fixup .tracedata
__ksymtab __ksymtab_gpl __ksymtab_strings __init_rodata __param
__modver
01 .data .vvar
02 .data..percpu
03 .init.text .init.data .x86_cpu_dev.init .parainstructions
.altinstructions .altinstr_replacement .iommu_table .apicdrivers
.exit.text .smp_locks .bss .brk
04 .notes
As mentioned, run_size needs to be the size of the running kernel
including .bss and .brk. We can see from the Section/Segment mapping
above that .bss and .brk are included in segment 03 (which corresponds
to the final LOAD program header). To find the run_size, we calculate
the end of the LOAD segment from its PhysAddr start (0x0000000001d5e000)
and its MemSiz (0x0000000000301000), minus the physical load address of
the kernel (the first LOAD segment's PhysAddr: 0x0000000001000000). The
resulting run_size is 0x105f000:
0x0000000001d5e000 + 0x0000000000301000 - 0x0000000001000000 = 0x105f000
So, from this we can see that the existing run_size calculation is
0x400000 too high. And, as it turns out, the correct run_size is
actually equal to VO_end - VO_text, which is certainly easier to calculate.
_end: 0xffffffff8205f000
_text:0xffffffff81000000
0xffffffff8205f000 - 0xffffffff81000000 = 0x105f000
As a result, run_size is a simple constant, so we don't need to pass it
around; we already have voffset.h for such things. We can share voffset.h
between misc.c and header.S instead of getting run_size in other ways.
This patch moves voffset.h creation code to boot/compressed/Makefile,
and switches misc.c to use the VO_end - VO_text calculation for run_size.
Dependence before:
boot/header.S ==> boot/voffset.h ==> vmlinux
boot/header.S ==> compressed/vmlinux ==> compressed/misc.c
Dependence after:
boot/header.S ==> compressed/vmlinux ==> compressed/misc.c ==> boot/voffset.h ==> vmlinux
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
[ Rewrote the changelog. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Junjie Mao <eternal.n08@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: lasse.collin@tukaani.org
Fixes:
|
|
Yinghai Lu | d607251ba9 |
x86/boot: Calculate decompression size during boot not build
Currently z_extract_offset is calculated in boot/compressed/mkpiggy.c. This doesn't work well because mkpiggy.c doesn't know the details of the decompressor in use. As a result, it can only make an estimation, which has risks: - output + output_len (VO) could be much bigger than input + input_len (ZO). In this case, the decompressed kernel plus relocs could overwrite the decompression code while it is running. - The head code of ZO could be bigger than z_extract_offset. In this case an overwrite could happen when the head code is running to move ZO to the end of buffer. Though currently the size of the head code is very small it's still a potential risk. Since there is no rule to limit the size of the head code of ZO, it runs the risk of suddenly becoming a (hard to find) bug. Instead, this moves the z_extract_offset calculation into header.S, and makes adjustments to be sure that the above two cases can never happen, and further corrects the comments describing the calculations. Since we have (in the previous patch) made ZO always be located against the end of decompression buffer, z_extract_offset is only used here to calculate an appropriate buffer size (INIT_SIZE), and is not longer used elsewhere. As such, it can be removed from voffset.h. Additionally clean up #if/#else #define to improve readability. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Baoquan He <bhe@redhat.com> [ Rewrote the changelog and comments. ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: lasse.collin@tukaani.org Link: http://lkml.kernel.org/r/1461888548-32439-4-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Yinghai Lu | 974f221c84 |
x86/boot: Move compressed kernel to the end of the decompression buffer
This change makes later calculations about where the kernel is located easier to reason about. To better understand this change, we must first clarify what 'VO' and 'ZO' are. These values were introduced in commits by hpa: |
|
Baoquan He | 6f9af75faa |
x86/KASLR: Handle kernel relocations above 2G correctly
When processing the relocation table, the offset used to calculate the relocation is an 'int'. This is sufficient for calculating the physical address of the relocs entry on 32-bit systems and on 64-bit systems when the relocation is under 2G. To handle relocations above 2G (seen in situations like kexec, netboot, etc), this offset needs to be calculated using a 'long' to avoid wrapping and miscalculating the relocation. Signed-off-by: Baoquan He <bhe@redhat.com> [ Rewrote the changelog. ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: lasse.collin@tukaani.org Link: http://lkml.kernel.org/r/1461888548-32439-2-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> |