From cf8178f78645148dc38b28263b9d3f604148e3a8 Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Wed, 15 Mar 2017 17:10:10 +0000 Subject: [PATCH 01/11] x86/microcode/AMD: Remove redundant NULL check on mc mc is a pointer to the static u8 array amd_ucode_patch and therefore can never be null, so the check is redundant. Remove it. Detected by CoverityScan, CID#1372871 ("Logically Dead Code") Signed-off-by: Colin Ian King Cc: kernel-janitors@vger.kernel.org Cc: x86-ml Link: http://lkml.kernel.org/r/20170315171010.17536-1-colin.king@canonical.com Signed-off-by: Borislav Petkov Signed-off-by: Thomas Gleixner --- arch/x86/kernel/cpu/microcode/amd.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/x86/kernel/cpu/microcode/amd.c b/arch/x86/kernel/cpu/microcode/amd.c index 7889ae492af0..1d38e53c2d5c 100644 --- a/arch/x86/kernel/cpu/microcode/amd.c +++ b/arch/x86/kernel/cpu/microcode/amd.c @@ -352,8 +352,6 @@ void reload_ucode_amd(void) u32 rev, dummy; mc = (struct microcode_amd *)amd_ucode_patch; - if (!mc) - return; rdmsr(MSR_AMD64_PATCH_LEVEL, rev, dummy); From 42fc6c6cb1662ba2fa727dd01c9473c63be4e3b6 Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Thu, 4 May 2017 09:51:40 -0500 Subject: [PATCH 02/11] x86/asm: Don't use RBP as a temporary register in csum_partial_copy_generic() Andrey Konovalov reported the following warning while fuzzing the kernel with syzkaller: WARNING: kernel stack regs at ffff8800686869f8 in a.out:4933 has bad 'bp' value c3fc855a10167ec0 The unwinder dump revealed that RBP had a bad value when an interrupt occurred in csum_partial_copy_generic(). That function saves RBP on the stack and then overwrites it, using it as a scratch register. That's problematic because it breaks stack traces if an interrupt occurs in the middle of the function. Replace the usage of RBP with another callee-saved register (R15) so stack traces are no longer affected. Reported-by: Andrey Konovalov Tested-by: Andrey Konovalov Signed-off-by: Josh Poimboeuf Cc: Cong Wang Cc: David S . Miller Cc: Dmitry Vyukov Cc: Eric Dumazet Cc: Kostya Serebryany Cc: Linus Torvalds Cc: Marcelo Ricardo Leitner Cc: Neil Horman Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Vlad Yasevich Cc: linux-sctp@vger.kernel.org Cc: netdev Cc: syzkaller Link: http://lkml.kernel.org/r/4b03a961efda5ec9bfe46b7b9c9ad72d1efad343.1493909486.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar --- arch/x86/lib/csum-copy_64.S | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/x86/lib/csum-copy_64.S b/arch/x86/lib/csum-copy_64.S index 7e48807b2fa1..45a53dfe1859 100644 --- a/arch/x86/lib/csum-copy_64.S +++ b/arch/x86/lib/csum-copy_64.S @@ -55,7 +55,7 @@ ENTRY(csum_partial_copy_generic) movq %r12, 3*8(%rsp) movq %r14, 4*8(%rsp) movq %r13, 5*8(%rsp) - movq %rbp, 6*8(%rsp) + movq %r15, 6*8(%rsp) movq %r8, (%rsp) movq %r9, 1*8(%rsp) @@ -74,7 +74,7 @@ ENTRY(csum_partial_copy_generic) /* main loop. clear in 64 byte blocks */ /* r9: zero, r8: temp2, rbx: temp1, rax: sum, rcx: saved length */ /* r11: temp3, rdx: temp4, r12 loopcnt */ - /* r10: temp5, rbp: temp6, r14 temp7, r13 temp8 */ + /* r10: temp5, r15: temp6, r14 temp7, r13 temp8 */ .p2align 4 .Lloop: source @@ -89,7 +89,7 @@ ENTRY(csum_partial_copy_generic) source movq 32(%rdi), %r10 source - movq 40(%rdi), %rbp + movq 40(%rdi), %r15 source movq 48(%rdi), %r14 source @@ -103,7 +103,7 @@ ENTRY(csum_partial_copy_generic) adcq %r11, %rax adcq %rdx, %rax adcq %r10, %rax - adcq %rbp, %rax + adcq %r15, %rax adcq %r14, %rax adcq %r13, %rax @@ -121,7 +121,7 @@ ENTRY(csum_partial_copy_generic) dest movq %r10, 32(%rsi) dest - movq %rbp, 40(%rsi) + movq %r15, 40(%rsi) dest movq %r14, 48(%rsi) dest @@ -203,7 +203,7 @@ ENTRY(csum_partial_copy_generic) movq 3*8(%rsp), %r12 movq 4*8(%rsp), %r14 movq 5*8(%rsp), %r13 - movq 6*8(%rsp), %rbp + movq 6*8(%rsp), %r15 addq $7*8, %rsp ret From fc5f9d5f151c9fff21d3d1d2907b888a5aec3ff7 Mon Sep 17 00:00:00 2001 From: Baoquan He Date: Thu, 4 May 2017 10:25:47 +0800 Subject: [PATCH 03/11] x86/mm: Fix boot crash caused by incorrect loop count calculation in sync_global_pgds() Jeff Moyer reported that on his system with two memory regions 0~64G and 1T~1T+192G, and kernel option "memmap=192G!1024G" added, enabling KASLR will make the system hang intermittently during boot. While adding 'nokaslr' won't. The back trace is: Oops: 0000 [#1] SMP RIP: memcpy_erms() [ .... ] Call Trace: pmem_rw_page() bdev_read_page() do_mpage_readpage() mpage_readpages() blkdev_readpages() __do_page_cache_readahead() force_page_cache_readahead() page_cache_sync_readahead() generic_file_read_iter() blkdev_read_iter() __vfs_read() vfs_read() SyS_read() entry_SYSCALL_64_fastpath() This crash happens because the for loop count calculation in sync_global_pgds() is not correct. When a mapping area crosses PGD entries, we should calculate the starting address of region which next PGD covers and assign it to next for loop count, but not add PGDIR_SIZE directly. The old code works right only if the mapping area is an exact multiple of PGDIR_SIZE, otherwize the end region could be skipped so that it can't be synchronized to all other processes from kernel PGD init_mm.pgd. In Jeff's system, emulated pmem area [1024G, 1216G) is smaller than PGDIR_SIZE. While 'nokaslr' works because PAGE_OFFSET is 1T aligned, it makes this area be mapped inside one PGD entry. With KASLR enabled, this area could cross two PGD entries, then the next PGD entry won't be synced to all other processes. That is why we saw empty PGD. Fix it. Reported-by: Jeff Moyer Signed-off-by: Baoquan He Cc: Andrew Morton Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dan Williams Cc: Dave Hansen Cc: Dave Young Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Jinbum Park Cc: Josh Poimboeuf Cc: Kees Cook Cc: Kirill A. Shutemov Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Garnier Cc: Thomas Gleixner Cc: Yasuaki Ishimatsu Cc: Yinghai Lu Link: http://lkml.kernel.org/r/1493864747-8506-1-git-send-email-bhe@redhat.com Signed-off-by: Ingo Molnar --- arch/x86/mm/init_64.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 745e5e183169..97fe88749e18 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -94,10 +94,10 @@ __setup("noexec32=", nonx32_setup); */ void sync_global_pgds(unsigned long start, unsigned long end) { - unsigned long address; + unsigned long addr; - for (address = start; address <= end; address += PGDIR_SIZE) { - pgd_t *pgd_ref = pgd_offset_k(address); + for (addr = start; addr <= end; addr = ALIGN(addr + 1, PGDIR_SIZE)) { + pgd_t *pgd_ref = pgd_offset_k(addr); const p4d_t *p4d_ref; struct page *page; @@ -106,7 +106,7 @@ void sync_global_pgds(unsigned long start, unsigned long end) * handle synchonization on p4d level. */ BUILD_BUG_ON(pgd_none(*pgd_ref)); - p4d_ref = p4d_offset(pgd_ref, address); + p4d_ref = p4d_offset(pgd_ref, addr); if (p4d_none(*p4d_ref)) continue; @@ -117,8 +117,8 @@ void sync_global_pgds(unsigned long start, unsigned long end) p4d_t *p4d; spinlock_t *pgt_lock; - pgd = (pgd_t *)page_address(page) + pgd_index(address); - p4d = p4d_offset(pgd, address); + pgd = (pgd_t *)page_address(page) + pgd_index(addr); + p4d = p4d_offset(pgd, addr); /* the pgt_lock only for Xen */ pgt_lock = &pgd_page_get_mm(page)->page_table_lock; spin_lock(pgt_lock); From 121843eb02a6e2fa30aefab64bfe183c97230c75 Mon Sep 17 00:00:00 2001 From: Matthias Kaehlcke Date: Mon, 1 May 2017 15:47:41 -0700 Subject: [PATCH 04/11] x86/mm/kaslr: Use the _ASM_MUL macro for multiplication to work around Clang incompatibility The constraint "rm" allows the compiler to put mix_const into memory. When the input operand is a memory location then MUL needs an operand size suffix, since Clang can't infer the multiplication width from the operand. Add and use the _ASM_MUL macro which determines the operand size and resolves to the NUL instruction with the corresponding suffix. This fixes the following error when building with clang: CC arch/x86/lib/kaslr.o /tmp/kaslr-dfe1ad.s: Assembler messages: /tmp/kaslr-dfe1ad.s:182: Error: no instruction mnemonic suffix given and no register operands; can't size instruction Signed-off-by: Matthias Kaehlcke Cc: Grant Grundler Cc: Greg Hackmann Cc: Kees Cook Cc: Linus Torvalds Cc: Michael Davidson Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20170501224741.133938-1-mka@chromium.org Signed-off-by: Ingo Molnar --- arch/x86/include/asm/asm.h | 1 + arch/x86/lib/kaslr.c | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h index 7acb51c49fec..7a9df3beb89b 100644 --- a/arch/x86/include/asm/asm.h +++ b/arch/x86/include/asm/asm.h @@ -32,6 +32,7 @@ #define _ASM_ADD __ASM_SIZE(add) #define _ASM_SUB __ASM_SIZE(sub) #define _ASM_XADD __ASM_SIZE(xadd) +#define _ASM_MUL __ASM_SIZE(mul) #define _ASM_AX __ASM_REG(ax) #define _ASM_BX __ASM_REG(bx) diff --git a/arch/x86/lib/kaslr.c b/arch/x86/lib/kaslr.c index 5761a4f19455..ab2d1d73e9e7 100644 --- a/arch/x86/lib/kaslr.c +++ b/arch/x86/lib/kaslr.c @@ -5,6 +5,7 @@ * kernel starts. This file is included in the compressed kernel and * normally linked in the regular. */ +#include #include #include #include @@ -79,7 +80,7 @@ unsigned long kaslr_get_random_long(const char *purpose) } /* Circular multiply for better bit diffusion */ - asm("mul %3" + asm(_ASM_MUL "%3" : "=a" (random), "=d" (raw) : "a" (random), "rm" (mix_const)); random += raw; From 60854a12d281e2fa25662fa32ac8022bbff17432 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Fri, 5 May 2017 21:51:16 -0700 Subject: [PATCH 05/11] x86/boot: Declare error() as noreturn MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The compressed boot function error() is used to halt execution, but it wasn't marked with "noreturn". This fixes that in preparation for supporting kernel FORTIFY_SOURCE, which uses the noreturn annotation on panic, and calls error(). GCC would warn about a noreturn function calling a non-noreturn function: arch/x86/boot/compressed/misc.c: In function ‘fortify_panic’: arch/x86/boot/compressed/misc.c:416:1: warning: ‘noreturn’ function does return } ^ Signed-off-by: Kees Cook Cc: Daniel Micay Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: H. Peter Anvin Link: http://lkml.kernel.org/r/20170506045116.GA2879@beast Signed-off-by: Ingo Molnar --- arch/x86/boot/compressed/error.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/boot/compressed/error.h b/arch/x86/boot/compressed/error.h index 2e59dac07f9e..d732e608e3af 100644 --- a/arch/x86/boot/compressed/error.h +++ b/arch/x86/boot/compressed/error.h @@ -1,7 +1,9 @@ #ifndef BOOT_COMPRESSED_ERROR_H #define BOOT_COMPRESSED_ERROR_H +#include + void warn(char *m); -void error(char *m); +void error(char *m) __noreturn; #endif /* BOOT_COMPRESSED_ERROR_H */ From 66aad4fdf2bf0af29c7decb4433dc5ec6c7c5451 Mon Sep 17 00:00:00 2001 From: Xunlei Pang Date: Thu, 4 May 2017 09:42:50 +0800 Subject: [PATCH 06/11] x86/mm: Add support for gbpages to kernel_ident_mapping_init() Kernel identity mappings on x86-64 kernels are created in two ways: by the early x86 boot code, or by kernel_ident_mapping_init(). Native kernels (which is the dominant usecase) use the former, but the kexec and the hibernation code uses kernel_ident_mapping_init(). There's a subtle difference between these two ways of how identity mappings are created, the current kernel_ident_mapping_init() code creates identity mappings always using 2MB page(PMD level) - while the native kernel boot path also utilizes gbpages where available. This difference is suboptimal both for performance and for memory usage: kernel_ident_mapping_init() needs to allocate pages for the page tables when creating the new identity mappings. This patch adds 1GB page(PUD level) support to kernel_ident_mapping_init() to address these concerns. The primary advantage would be better TLB coverage/performance, because we'd utilize 1GB TLBs instead of 2MB ones. It is also useful for machines with large number of memory to save paging structure allocations(around 4MB/TB using 2MB page) when setting identity mappings for all the memory, after using 1GB page it will consume only 8KB/TB. ( Note that this change alone does not activate gbpages in kexec, we are doing that in a separate patch. ) Signed-off-by: Xunlei Pang Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Young Cc: Denys Vlasenko Cc: Eric Biederman Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Yinghai Lu Cc: akpm@linux-foundation.org Cc: kexec@lists.infradead.org Link: http://lkml.kernel.org/r/1493862171-8799-1-git-send-email-xlpang@redhat.com Signed-off-by: Ingo Molnar --- arch/x86/boot/compressed/pagetable.c | 2 +- arch/x86/include/asm/init.h | 3 ++- arch/x86/kernel/machine_kexec_64.c | 2 +- arch/x86/mm/ident_map.c | 14 +++++++++++++- arch/x86/power/hibernate_64.c | 2 +- 5 files changed, 18 insertions(+), 5 deletions(-) diff --git a/arch/x86/boot/compressed/pagetable.c b/arch/x86/boot/compressed/pagetable.c index 56589d0a804b..1d78f1739087 100644 --- a/arch/x86/boot/compressed/pagetable.c +++ b/arch/x86/boot/compressed/pagetable.c @@ -70,7 +70,7 @@ static unsigned long level4p; * Due to relocation, pointers must be assigned at run time not build time. */ static struct x86_mapping_info mapping_info = { - .pmd_flag = __PAGE_KERNEL_LARGE_EXEC, + .page_flag = __PAGE_KERNEL_LARGE_EXEC, }; /* Locates and clears a region for a new top level page table. */ diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h index 737da62bfeb0..474eb8c66fee 100644 --- a/arch/x86/include/asm/init.h +++ b/arch/x86/include/asm/init.h @@ -4,8 +4,9 @@ struct x86_mapping_info { void *(*alloc_pgt_page)(void *); /* allocate buf for page table */ void *context; /* context for alloc_pgt_page */ - unsigned long pmd_flag; /* page flag for PMD entry */ + unsigned long page_flag; /* page flag for PMD or PUD entry */ unsigned long offset; /* ident mapping offset */ + bool direct_gbpages; /* PUD level 1GB page support */ }; int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page, diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c index 085c3b300d32..1d4f2b076545 100644 --- a/arch/x86/kernel/machine_kexec_64.c +++ b/arch/x86/kernel/machine_kexec_64.c @@ -113,7 +113,7 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable) struct x86_mapping_info info = { .alloc_pgt_page = alloc_pgt_page, .context = image, - .pmd_flag = __PAGE_KERNEL_LARGE_EXEC, + .page_flag = __PAGE_KERNEL_LARGE_EXEC, }; unsigned long mstart, mend; pgd_t *level4p; diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c index 04210a29dd60..adab1595f4bd 100644 --- a/arch/x86/mm/ident_map.c +++ b/arch/x86/mm/ident_map.c @@ -13,7 +13,7 @@ static void ident_pmd_init(struct x86_mapping_info *info, pmd_t *pmd_page, if (pmd_present(*pmd)) continue; - set_pmd(pmd, __pmd((addr - info->offset) | info->pmd_flag)); + set_pmd(pmd, __pmd((addr - info->offset) | info->page_flag)); } } @@ -30,6 +30,18 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page, if (next > end) next = end; + if (info->direct_gbpages) { + pud_t pudval; + + if (pud_present(*pud)) + continue; + + addr &= PUD_MASK; + pudval = __pud((addr - info->offset) | info->page_flag); + set_pud(pud, pudval); + continue; + } + if (pud_present(*pud)) { pmd = pmd_offset(pud, 0); ident_pmd_init(info, pmd, addr, next); diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c index 6a61194ffd58..a6e21fee22ea 100644 --- a/arch/x86/power/hibernate_64.c +++ b/arch/x86/power/hibernate_64.c @@ -104,7 +104,7 @@ static int set_up_temporary_mappings(void) { struct x86_mapping_info info = { .alloc_pgt_page = alloc_pgt_page, - .pmd_flag = __PAGE_KERNEL_LARGE_EXEC, + .page_flag = __PAGE_KERNEL_LARGE_EXEC, .offset = __PAGE_OFFSET, }; unsigned long mstart, mend; From 8638100c52bb7782462b14aad102a4aaf0c7094c Mon Sep 17 00:00:00 2001 From: Xunlei Pang Date: Thu, 4 May 2017 09:42:51 +0800 Subject: [PATCH 07/11] x86/kexec/64: Use gbpages for identity mappings if available Kexec sets up all identity mappings before booting into the new kernel, and this will cause extra memory consumption for paging structures which is quite considerable on modern machines with huge memory sizes. E.g. on a 32TB machine that is kdumping, it could waste around 128MB (around 4MB/TB) from the reserved memory after kexec sets all the identity mappings using the current 2MB page. Add to that the memory needed for the loaded kdump kernel, initramfs, etc., and it causes a kexec syscall -NOMEM failure. As a result, we had to enlarge reserved memory via "crashkernel=X" to work around this problem. This causes some trouble for distributions that use policies to evaluate the proper "crashkernel=X" value for users. So enable gbpages for kexec mappings. Signed-off-by: Xunlei Pang Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Young Cc: Denys Vlasenko Cc: Eric Biederman Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Yinghai Lu Cc: akpm@linux-foundation.org Cc: kexec@lists.infradead.org Link: http://lkml.kernel.org/r/1493862171-8799-2-git-send-email-xlpang@redhat.com Signed-off-by: Ingo Molnar --- arch/x86/kernel/machine_kexec_64.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c index 1d4f2b076545..c25d277d7d7e 100644 --- a/arch/x86/kernel/machine_kexec_64.c +++ b/arch/x86/kernel/machine_kexec_64.c @@ -122,6 +122,10 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable) level4p = (pgd_t *)__va(start_pgtable); clear_page(level4p); + + if (direct_gbpages) + info.direct_gbpages = true; + for (i = 0; i < nr_pfn_mapped; i++) { mstart = pfn_mapped[i].start << PAGE_SHIFT; mend = pfn_mapped[i].end << PAGE_SHIFT; From 861ce4a3244c21b0af64f880d5bfe5e6e2fb9e4a Mon Sep 17 00:00:00 2001 From: Laura Abbott Date: Mon, 8 May 2017 14:23:16 -0700 Subject: [PATCH 08/11] x86/mm/32: Set the '__vmalloc_start_set' flag in initmem_init() '__vmalloc_start_set' currently only gets set in initmem_init() when !CONFIG_NEED_MULTIPLE_NODES. This breaks detection of vmalloc address with virt_addr_valid() with CONFIG_NEED_MULTIPLE_NODES=y, causing a kernel crash: [mm/usercopy] 517e1fbeb6: kernel BUG at arch/x86/mm/physaddr.c:78! Set '__vmalloc_start_set' appropriately for that case as well. Reported-by: kbuild test robot Signed-off-by: Laura Abbott Reviewed-by: Kees Cook Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Fixes: dc16ecf7fd1f ("x86-32: use specific __vmalloc_start_set flag in __virt_addr_valid") Link: http://lkml.kernel.org/r/1494278596-30373-1-git-send-email-labbott@redhat.com Signed-off-by: Ingo Molnar --- arch/x86/mm/numa_32.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/mm/numa_32.c b/arch/x86/mm/numa_32.c index 6b7ce6279133..aca6295350f3 100644 --- a/arch/x86/mm/numa_32.c +++ b/arch/x86/mm/numa_32.c @@ -100,5 +100,6 @@ void __init initmem_init(void) printk(KERN_DEBUG "High memory starts at vaddr %08lx\n", (ulong) pfn_to_kaddr(highstart_pfn)); + __vmalloc_start_set = true; setup_bootmem_allocator(); } From d2b6dc61a8dd3c429609b993778cb54e75a5c5f0 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Mon, 8 May 2017 17:09:10 -0700 Subject: [PATCH 09/11] x86/boot/32: Fix UP boot on Quark and possibly other platforms This partially reverts commit: 23b2a4ddebdd17f ("x86/boot/32: Defer resyncing initial_page_table until per-cpu is set up") That commit had one definite bug and one potential bug. The definite bug is that setup_per_cpu_areas() uses a differnet generic implementation on UP kernels, so initial_page_table never got resynced. This was fine for access to percpu data (it's in the identity map on UP), but it breaks other users of initial_page_table. The potential bug is that helpers like efi_init() would be called before the tables were synced. Avoid both problems by just syncing the page tables in setup_arch() *and* setup_per_cpu_areas(). Reported-by: Jan Kiszka Signed-off-by: Andy Lutomirski Cc: Andy Shevchenko Cc: Ard Biesheuvel Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Matt Fleming Cc: Peter Zijlstra Cc: Thomas Garnier Cc: Thomas Gleixner Cc: linux-efi@vger.kernel.org Signed-off-by: Ingo Molnar --- arch/x86/kernel/setup.c | 15 +++++++++++++++ arch/x86/kernel/setup_percpu.c | 10 +++++----- 2 files changed, 20 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 603a1669a2ec..0b4d3c686b1e 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -1225,6 +1225,21 @@ void __init setup_arch(char **cmdline_p) kasan_init(); +#ifdef CONFIG_X86_32 + /* sync back kernel address range */ + clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY, + swapper_pg_dir + KERNEL_PGD_BOUNDARY, + KERNEL_PGD_PTRS); + + /* + * sync back low identity map too. It is used for example + * in the 32-bit EFI stub. + */ + clone_pgd_range(initial_page_table, + swapper_pg_dir + KERNEL_PGD_BOUNDARY, + min(KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY)); +#endif + tboot_probe(); map_vsyscall(); diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c index bb1e8cc0bc84..10edd1e69a68 100644 --- a/arch/x86/kernel/setup_percpu.c +++ b/arch/x86/kernel/setup_percpu.c @@ -291,11 +291,11 @@ void __init setup_per_cpu_areas(void) #ifdef CONFIG_X86_32 /* - * Sync back kernel address range. We want to make sure that - * all kernel mappings, including percpu mappings, are available - * in the smpboot asm. We can't reliably pick up percpu - * mappings using vmalloc_fault(), because exception dispatch - * needs percpu data. + * Sync back kernel address range again. We already did this in + * setup_arch(), but percpu data also needs to be available in + * the smpboot asm. We can't reliably pick up percpu mappings + * using vmalloc_fault(), because exception dispatch needs + * percpu data. */ clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY, swapper_pg_dir + KERNEL_PGD_BOUNDARY, From 4a1bec4605838fd7872ec41677585e241e256785 Mon Sep 17 00:00:00 2001 From: Nick Desaulniers Date: Mon, 8 May 2017 20:29:46 -0700 Subject: [PATCH 10/11] x86/build: Don't add -maccumulate-outgoing-args w/o compiler support Clang does not support this machine dependent option. Older versions of GCC (pre 3.0) may not support this option, added in 2000, but it's unlikely they can still compile a working kernel. Signed-off-by: Nick Desaulniers Reviewed-by: Josh Poimboeuf Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20170509032946.20444-1-nick.desaulniers@gmail.com Signed-off-by: Ingo Molnar --- arch/x86/Makefile | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 4430dd489620..5851411e60fb 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -179,7 +179,8 @@ ifdef CONFIG_JUMP_LABEL endif ifeq ($(ACCUMULATE_OUTGOING_ARGS), 1) - KBUILD_CFLAGS += -maccumulate-outgoing-args + # This compiler flag is not supported by Clang: + KBUILD_CFLAGS += $(call cc-option,-maccumulate-outgoing-args,) endif # Stackpointer is addressed different for 32 bit and 64 bit x86 From fb8fb46c56289b3f34b5d90a4ec65e9e4e4544a5 Mon Sep 17 00:00:00 2001 From: Xiaochen Shen Date: Wed, 3 May 2017 11:15:56 +0800 Subject: [PATCH 11/11] x86/intel_rdt: Fix a typo in Documentation Example 3 contains a typo: "C0" in "# echo C0 > p0/cpus" is wrong because it specifies core 6-7 instead of wanted core 4-7. Correct this typo to avoid confusion. Signed-off-by: Xiaochen Shen Acked-by: Fenghua Yu Cc: vikas.shivappa@linux.intel.com Cc: tony.luck@intel.com Link: http://lkml.kernel.org/r/1493781356-24229-1-git-send-email-xiaochen.shen@intel.com Signed-off-by: Thomas Gleixner --- Documentation/x86/intel_rdt_ui.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/x86/intel_rdt_ui.txt b/Documentation/x86/intel_rdt_ui.txt index 0f6d8477b66c..c491a1b82de2 100644 --- a/Documentation/x86/intel_rdt_ui.txt +++ b/Documentation/x86/intel_rdt_ui.txt @@ -295,7 +295,7 @@ kernel and the tasks running there get 50% of the cache. They should also get 50% of memory bandwidth assuming that the cores 4-7 are SMT siblings and only the real time threads are scheduled on the cores 4-7. -# echo C0 > p0/cpus +# echo F0 > p0/cpus 4) Locking between applications