Commit Graph

16651 Commits

Author SHA1 Message Date
Fenghua Yu ec400ddeff x86/microcode_intel_early.c: Early update ucode on Intel's CPU
Implementation of early update ucode on Intel's CPU.

load_ucode_intel_bsp() scans ucode in initrd image file which is a cpio format
ucode followed by ordinary initrd image file. The binary ucode file is stored
in kernel/x86/microcode/GenuineIntel.bin in the cpio data. All ucode
patches with the same model as BSP are saved in memory. A matching ucode patch
is updated on BSP.

load_ucode_intel_ap() reads saved ucoded patches and updates ucode on AP.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1356075872-3054-9-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-31 13:19:18 -08:00
Fenghua Yu 086fc8f803 x86/tlbflush.h: Define __native_flush_tlb_global_irq_disabled()
This function is called in __native_flush_tlb_global() and after
apply_microcode_early().

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1356075872-3054-8-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-31 13:19:16 -08:00
Fenghua Yu e666dfa273 x86/microcode_intel_lib.c: Early update ucode on Intel's CPU
Define interfaces microcode_sanity_check() and get_matching_microcode(). They
are called both in early boot time and in microcode Intel driver.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1356075872-3054-7-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-31 13:19:14 -08:00
Fenghua Yu a8ebf6d1d6 x86/microcode_core_early.c: Define interfaces for early loading ucode
Define interfaces load_ucode_bsp() and load_ucode_ap() to load ucode on BSP and
AP in early boot time. These are generic interfaces. Internally they call
vendor specific implementations.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1356075872-3054-6-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-31 13:19:12 -08:00
Fenghua Yu e6ebf5deaa x86/common.c: load ucode in 64 bit or show loading ucode info in 32 bit on AP
In 64 bit, load ucode on AP in cpu_init().

In 32 bit, show ucode loading info on AP in cpu_init(). Microcode has been
loaded earlier before paging. Now it is safe to show the loading microcode
info on this AP.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1356075872-3054-5-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-31 13:19:06 -08:00
Fenghua Yu d288e1cf8e x86/common.c: Make have_cpuid_p() a global function
Remove static declaration in have_cpuid_p() to make it a global function. The
function will be called in early loading microcode.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1356075872-3054-4-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-31 13:18:58 -08:00
Fenghua Yu 9cd4d78e21 x86/microcode_intel.h: Define functions and macros for early loading ucode
Define some functions and macros that will be used in early loading ucode. Some
of them are moved from microcode_intel.c driver in order to be called in early
boot phase before module can be called.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1356075872-3054-3-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-31 13:18:50 -08:00
Yinghai Lu 8b78c21d72 x86, 64bit, mm: hibernate use generic mapping_init
We should set mappings only for usable memory ranges under max_pfn
Otherwise causes same problem that is fixed by

	x86, mm: Only direct map addresses that are marked as E820_RAM

Make it only map range in pfn_mapped array.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-34-git-send-email-yinghai@kernel.org
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: linux-pm@vger.kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 19:32:59 -08:00
Yinghai Lu 72212675d1 x86, 64bit, mm: Mark data/bss/brk to nx
HPA said, we should not have RW and +x set at the time.

for kernel layout:
[    0.000000] Kernel Layout:
[    0.000000]   .text: [0x01000000-0x021434f8]
[    0.000000] .rodata: [0x02200000-0x02a13fff]
[    0.000000]   .data: [0x02c00000-0x02dc763f]
[    0.000000]   .init: [0x02dc9000-0x0312cfff]
[    0.000000]    .bss: [0x0313b000-0x03dd6fff]
[    0.000000]    .brk: [0x03dd7000-0x03dfffff]

before the patch, we have
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffff81000000          16M                           pmd
0xffffffff81000000-0xffffffff82200000          18M     ro         PSE GLB x  pmd
0xffffffff82200000-0xffffffff82c00000          10M     ro         PSE GLB NX pmd
0xffffffff82c00000-0xffffffff82dc9000        1828K     RW             GLB x  pte
0xffffffff82dc9000-0xffffffff82e00000         220K     RW             GLB NX pte
0xffffffff82e00000-0xffffffff83000000           2M     RW         PSE GLB NX pmd
0xffffffff83000000-0xffffffff8313a000        1256K     RW             GLB NX pte
0xffffffff8313a000-0xffffffff83200000         792K     RW             GLB x  pte
0xffffffff83200000-0xffffffff83e00000          12M     RW         PSE GLB x  pmd
0xffffffff83e00000-0xffffffffa0000000         450M                           pmd

after patch,, we get
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffff81000000          16M                           pmd
0xffffffff81000000-0xffffffff82200000          18M     ro         PSE GLB x  pmd
0xffffffff82200000-0xffffffff82c00000          10M     ro         PSE GLB NX pmd
0xffffffff82c00000-0xffffffff82e00000           2M     RW             GLB NX pte
0xffffffff82e00000-0xffffffff83000000           2M     RW         PSE GLB NX pmd
0xffffffff83000000-0xffffffff83200000           2M     RW             GLB NX pte
0xffffffff83200000-0xffffffff83e00000          12M     RW         PSE GLB NX pmd
0xffffffff83e00000-0xffffffffa0000000         450M                           pmd

so data, bss, brk get NX ...

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-33-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 19:32:58 -08:00
Yinghai Lu 6c902b656c x86: Merge early kernel reserve for 32bit and 64bit
They are the same, and we could move them out from head32/64.c to setup.c.

We are using memblock, and it could handle overlapping properly, so
we don't need to reserve some at first to hold the location, and just
need to make sure we reserve them before we are using memblock to find
free mem to use.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-32-git-send-email-yinghai@kernel.org
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 19:32:58 -08:00
Yinghai Lu 0212f91596 x86: Add Crash kernel low reservation
During kdump kernel's booting stage, it need to find low ram for
swiotlb buffer when system does not support intel iommu/dmar remapping.

kexed-tools is appending memmap=exactmap and range from /proc/iomem
with "Crash kernel", and that range is above 4G for 64bit after boot
protocol 2.12.

We need to add another range in /proc/iomem like "Crash kernel low",
so kexec-tools could find that info and append to kdump kernel
command line.

Try to reserve some under 4G if the normal "Crash kernel" is above 4G.

User could specify the size with crashkernel_low=XX[KMG].

-v2: fix warning that is found by Fengguang's test robot.
-v3: move out get_mem_size change to another patch, to solve compiling
     warning that is found by Borislav Petkov <bp@alien8.de>
-v4: user must specify crashkernel_low if system does not support
     intel or amd iommu.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-31-git-send-email-yinghai@kernel.org
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Rob Landley <rob@landley.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 19:32:58 -08:00
Yinghai Lu 7d41a8a4a2 x86, kdump: Remove crashkernel range find limit for 64bit
Now kexeced kernel/ramdisk could be above 4g, so remove 896 limit for
64bit.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-30-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 19:32:58 -08:00
Yinghai Lu 595ad9af85 memblock: Add memblock_mem_size()
Use it to get mem size under the limit_pfn.
to replace local version in x86 reserved_initrd.

-v2: remove not needed cast that is pointed out by HPA.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-29-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 19:32:57 -08:00
Yinghai Lu d1af6d045f x86, boot: Not need to check setup_header version for setup_data
That is for bootloaders.

setup_data is in setup_header, and bootloader is copying that from bzImage.
So for old bootloader should keep that as 0 already.

old kexec-tools till now for elf image set setup_data to 0, so it is ok.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-28-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 19:32:57 -08:00
Yinghai Lu 8ee2f2dfdb x86, boot: Update comments about entries for 64bit image
Now 64bit entry is fixed on 0x200, can not be changed anymore.

Update the comments to reflect that.

Also put info about it in boot.txt

-v2: fix some grammar error

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-27-git-send-email-yinghai@kernel.org
Cc: Rob Landley <rob@landley.net>
Cc: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 19:32:57 -08:00
Yinghai Lu ee92d81502 x86, boot: Support loading bzImage, boot_params and ramdisk above 4G
xloadflags bit 1 indicates that we can load the kernel and all data
structures above 4G; it is set if kernel is relocatable and 64bit.

bootloader will check if xloadflags bit 1 is set to decide if
it could load ramdisk and kernel high above 4G.

bootloader will fill value to ext_ramdisk_image/size for high 32bits
when it load ramdisk above 4G.
kernel use get_ramdisk_image/size to use ext_ramdisk_image/size to get
right positon for ramdisk.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Rob Landley <rob@landley.net>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Gokul Caushik <caushik1@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joe Millenbach <jmillenbach@gmail.com>
Link: http://lkml.kernel.org/r/1359058816-7615-26-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 19:32:33 -08:00
Yinghai Lu 0e691cf824 x86, kexec, 64bit: Only set ident mapping for ram.
We should set mappings only for usable memory ranges under max_pfn
Otherwise causes same problem that is fixed by

	x86, mm: Only direct map addresses that are marked as E820_RAM

This patch exposes pfn_mapped array, and only sets ident mapping for ranges
in that array.

This patch relies on new kernel_ident_mapping_init that could handle existing
pgd/pud between different calls.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-25-git-send-email-yinghai@kernel.org
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:26:35 -08:00
Yinghai Lu 9ebdc79f7a x86, kexec: Replace ident_mapping_init and init_level4_page
Now ident_mapping_init is checking if pgd/pud is present for every 2M,
so several 2Ms are in same PUD, it will keep checking if pud is there
with same pud.

init_level4_page just does not check existing pgd/pud.

We could use generic mapping_init with different settings in info to
replace those two local grown version functions.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-24-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:26:29 -08:00
Yinghai Lu 084d128398 x86, kexec: Set ident mapping for kernel that is above max_pfn
When first kernel is booted with memmap= or mem=  to limit max_pfn.
kexec can load second kernel above that max_pfn.

We need to set ident mapping for whole image in this case instead of just
for first 2M.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-23-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:26:26 -08:00
Yinghai Lu 577af55d80 x86, kexec: Remove 1024G limitation for kexec buffer on 64bit
Now 64bit kernel supports more than 1T ram and kexec tools
could find buffer above 1T, remove that obsolete limitation.
and use MAXMEM instead.

Tested on system with more than 1024G ram.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-22-git-send-email-yinghai@kernel.org
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:26:23 -08:00
Yinghai Lu d3c433bf9a x86, boot: Move lldt/ltr out of 64bit code section
commit 08da5a2ca

    x86_64: Early segment setup for VT

sets up LDT and TR into a valid state in order to speed up boot
decompression under VT.

Those code are put in code64, and it is using GDT that is only
loaded from code32 path.

That breaks booting with 64bit bootloader that does not go through
code32 path and jump to startup_64 directly, and it has different
GDT.

Move those lines into code32 after their GDT is loaded.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-21-git-send-email-yinghai@kernel.org
Cc: Zachary Amsden <zamsden@gmail.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:26:19 -08:00
Yinghai Lu 187a8a73ce x86, boot: Move verify_cpu.S and no_longmode down
We need to move some code to 32bit section in following patch:

   x86, boot: Move lldt/ltr out of 64bit code section

but that will push startup_64 down from 0x200.

According to hpa, we can not change startup_64 position and that
is an ABI.

We could move function verify_cpu and no_longmode down, because
verify_cpu is used via function call and no_longmode will not
return, then we don't need to add extra code for jumping back.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-20-git-send-email-yinghai@kernel.org
Cc: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:26:15 -08:00
Yinghai Lu 3db07e70f0 x86, boot: Pass cmd_line_ptr with unsigned long instead
boot/compressed/misc.c is used for bzImage in 64bit and 32bit, and
cmd_line_ptr could point to buffer that is above 4g, cmd_line_ptr
should be 64bit otherwise high 32bit will be capped out.

So need to change data type to unsigned long, that will be 64bit get
correct address of command line buffer.

And it is still ok with 32bit bzImage, because unsigned long on 32bit kernel
is still 32bit.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-19-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:26:09 -08:00
Yinghai Lu 16a4baa642 x86, boot: Move checking of cmd_line_ptr out of common path
cmdline.c::__cmdline_find_option... are shared between 16-bit setup code
and 32/64 bit decompressor code.

for 32/64 only path via kexec, we should not check if ptr is less 1M.
as those cmdline could be put above 1M, or even 4G.

Move out accessible checking out of __cmdline_find_option()
So decompressor in misc.c can parse cmdline correctly.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-18-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:26:01 -08:00
Yinghai Lu f1da834cd9 x86, boot: Add get_cmd_line_ptr()
Add an accessor function for the command line address.
Later we will add support for holding a 64-bit address via ext_cmd_line_ptr.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-17-git-send-email-yinghai@kernel.org
Cc: Gokul Caushik <caushik1@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joe Millenbach <jmillenbach@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:25:45 -08:00
Yinghai Lu a8a51a88d5 x86: Add get_ramdisk_image/size()
There are several places to find ramdisk information early for reserving
and relocating.

Use accessor functions to make code more readable and consistent.

Later will add ext_ramdisk_image/size in those functions to support
loading ramdisk above 4g.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-16-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:21:05 -08:00
Yinghai Lu 1b8c78be01 x86: Merge early_reserve_initrd for 32bit and 64bit
They are the same, could move them out from head32/64.c to setup.c.

We are using memblock, and it could handle overlapping properly, so
we don't need to reserve some at first to hold the location, and just
need to make sure we reserve them before we are using memblock to find
free mem to use.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-15-git-send-email-yinghai@kernel.org
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:20:41 -08:00
Yinghai Lu 100542306f x86, 64bit: Don't set max_pfn_mapped wrong value early on native path
We are not having max_pfn_mapped set correctly until init_memory_mapping.
So don't print its initial value for 64bit

Also need to use KERNEL_IMAGE_SIZE directly for highmap cleanup.

-v2: update comments about max_pfn_mapped according to Stefano Stabellini.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-14-git-send-email-yinghai@kernel.org
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:20:16 -08:00
Yinghai Lu 6b9c75aca6 x86, 64bit: #PF handler set page to cover only 2M per #PF
We only map a single 2 MiB page per #PF, even though we should be able
to do this a full gigabyte at a time with no additional memory cost.
This is a workaround for a broken AMD reference BIOS (and its
derivatives in shipping system) which maps a large chunk of memory as
WB in the MTRR system but will #MC if the processor wanders off and
tries to prefetch that memory, which can happen any time the memory is
mapped in the TLB.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-13-git-send-email-yinghai@kernel.org
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
[ hpa: rewrote the patch description ]
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:20:13 -08:00
H. Peter Anvin 8170e6bed4 x86, 64bit: Use a #PF handler to materialize early mappings on demand
Linear mode (CR0.PG = 0) is mutually exclusive with 64-bit mode; all
64-bit code has to use page tables.  This makes it awkward before we
have first set up properly all-covering page tables to access objects
that are outside the static kernel range.

So far we have dealt with that simply by mapping a fixed amount of
low memory, but that fails in at least two upcoming use cases:

1. We will support load and run kernel, struct boot_params, ramdisk,
   command line, etc. above the 4 GiB mark.
2. need to access ramdisk early to get microcode to update that as
   early possible.

We could use early_iomap to access them too, but it will make code to
messy and hard to be unified with 32 bit.

Hence, set up a #PF table and use a fixed number of buffers to set up
page tables on demand.  If the buffers fill up then we simply flush
them and start over.  These buffers are all in __initdata, so it does
not increase RAM usage at runtime.

Thus, with the help of the #PF handler, we can set the final kernel
mapping from blank, and switch to init_level4_pgt later.

During the switchover in head_64.S, before #PF handler is available,
we use three pages to handle kernel crossing 1G, 512G boundaries with
sharing page by playing games with page aliasing: the same page is
mapped twice in the higher-level tables with appropriate wraparound.
The kernel region itself will be properly mapped; other mappings may
be spurious.

early_make_pgtable is using kernel high mapping address to access pages
to set page table.

-v4: Add phys_base offset to make kexec happy, and add
	init_mapping_kernel()   - Yinghai
-v5: fix compiling with xen, and add back ident level3 and level2 for xen
     also move back init_level4_pgt from BSS to DATA again.
     because we have to clear it anyway.  - Yinghai
-v6: switch to init_level4_pgt in init_mem_mapping. - Yinghai
-v7: remove not needed clear_page for init_level4_page
     it is with fill 512,8,0 already in head_64.S  - Yinghai
-v8: we need to keep that handler alive until init_mem_mapping and don't
     let early_trap_init to trash that early #PF handler.
     So split early_trap_pf_init out and move it down. - Yinghai
-v9: switchover only cover kernel space instead of 1G so could avoid
     touch possible mem holes. - Yinghai
-v11: change far jmp back to far return to initial_code, that is needed
     to fix failure that is reported by Konrad on AMD systems.  - Yinghai

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-12-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:20:06 -08:00
Yinghai Lu 4f7b92263a x86, realmode: Separate real_mode reserve and setup
After we switch to use #PF handler help to set page table, init_level4_pgt
will only have entries set after init_mem_mapping().
We need to move copying init_level4_pgt to trampoline_pgd after that.

So split reserve and setup, and move the setup after init_mem_mapping()

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-11-git-send-email-yinghai@kernel.org
Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
Acked-by: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:13:24 -08:00
Yinghai Lu 9735e91e9c x86, 64bit, realmode: Use init_level4_pgt to set trampoline_pgd directly
with #PF handler way to set early page table, level3_ident will go away with
64bit native path.

So just use entries in init_level4_pgt to set them in trampoline_pgd.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-10-git-send-email-yinghai@kernel.org
Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
Acked-by: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:12:30 -08:00
Yinghai Lu fa2bbce985 x86, 64bit: Copy struct boot_params early
We want to support struct boot_params (formerly known as the
zero-page, or real-mode data) above the 4 GiB mark.  We will have #PF
handler to set page table for not accessible ram early, but want to
limit it before x86_64_start_reservations to limit the code change to
native path only.

Also we will need the ramdisk info in struct boot_params to access the microcode
blob in ramdisk in x86_64_start_kernel, so copy struct boot_params early makes
it accessing ramdisk info simple.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-9-git-send-email-yinghai@kernel.org
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:12:26 -08:00
Yinghai Lu aece27851d x86, 64bit, mm: Add generic kernel/ident mapping helper
It is simple version for kernel_physical_mapping_init.
it will work to build one page table that will be used later.

Use mapping_info to control
        1. alloc_pg_page method
        2. if PMD is EXEC,
        3. if pgd is with kernel low mapping or ident mapping.

Will use to replace some local versions in kexec, hibernation and etc.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-8-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:12:25 -08:00
Yinghai Lu 231b3642a3 x86, realmode: Set real_mode permissions early
Trampoline code is executed by APs with kernel low mapping on 64bit.
We need to set trampoline code to EXEC early before we boot APs.

Found the problem after switching to #PF handler set page table,
and we do not set initial kernel low mapping with EXEC anymore in
arch/x86/kernel/head_64.S.

Change to use early_initcall instead that will make sure trampoline
will have EXEC set.

-v2: Merge two comments according to Borislav Petkov <bp@alien8.de>

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-7-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:12:25 -08:00
Yinghai Lu c2bdee594e x86, 64bit, mm: Make pgd next calculation consistent with pud/pmd
Just like the way we calculate next for pud and pmd, aka round down and
add size.

Also, do not do boundary-checking with 'next', and just pass 'end' down
to phys_pud_init() instead. Because the loop in phys_pud_init() stops at
PTRS_PER_PUD and thus can handle a possibly bigger 'end' properly.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-6-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:12:24 -08:00
Yinghai Lu b422a30917 x86: Factor out e820_add_kernel_range()
Separate out the reservation of the kernel static memory areas into a
separate function.

Also add support for case when memmap=xxM$yyM is used without exactmap.
Need to remove reserved range at first before we add E820_RAM
range, otherwise added E820_RAM range will be ignored.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-5-git-send-email-yinghai@kernel.org
Cc: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:12:24 -08:00
Yinghai Lu c9b3234a6a x86, mm: Fix page table early allocation offset checking
During debugging loading kernel above 4G, found that one page is not used
in pre-allocated BRK area for early page allocation.
pgt_buf_top is address that can not be used, so should check if that new
end is above that top, otherwise last page will not be used.

Fix that checking and also add print out for allocation from pre-allocated
BRK area to catch possible bugs later.

But after we get back that page for pgt, it tiggers one bug in pgt allocation
with xen: We need to avoid to use page as pgt to map range that is
overlapping with that pgt page.

Add checking about overlapping, when it happens, use memblock allocation
instead.  That fixes crash on Xen PV guest with 2G that Stefan found.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-2-git-send-email-yinghai@kernel.org
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Tested-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:12:23 -08:00
H. Peter Anvin de65d816aa Merge remote-tracking branch 'origin/x86/boot' into x86/mm2
Coming patches to x86/mm2 require the changes and advanced baseline in
x86/boot.

Resolved Conflicts:
	arch/x86/kernel/setup.c
	mm/nobootmem.c

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:10:15 -08:00
H. Peter Anvin 5dcd14ecd4 x86, boot: Sanitize boot_params if not zeroed on creation
Use the new sentinel field to detect bootloaders which fail to follow
protocol and don't initialize fields in struct boot_params that they
do not explicitly initialize to zero.

Based on an original patch and research by Yinghai Lu.
Changed by hpa to be invoked both in the decompression path and in the
kernel proper; the latter for the case where a bootloader takes over
decompression.

Originally-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-26-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 01:22:17 -08:00
H. Peter Anvin 09c205afde x86, boot: Define the 2.12 bzImage boot protocol
Define the 2.12 bzImage boot protocol: add xloadflags and additional
fields to allow the command line, initramfs and struct boot_params to
live above the 4 GiB mark.

The xloadflags now communicates if this is a 64-bit kernel with the
legacy 64-bit entry point and which of the EFI handover entry points
are supported.

Avoid adding new read flags to loadflags because of claimed
bootloaders testing the whole byte for == 1 to determine bzImageness
at least until the issue can be researched further.

This is based on patches by Yinghai Lu and David Woodhouse.

Originally-by: Yinghai Lu <yinghai@kernel.org>
Originally-by: David Woodhouse <dwmw2@infradead.org>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: David Woodhouse <dwmw2@infradead.org>
Acked-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1359058816-7615-26-git-send-email-yinghai@kernel.org
Cc: Rob Landley <rob@landley.net>
Cc: Gokul Caushik <caushik1@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joe Millenbach <jmillenbach@gmail.com>
2013-01-27 15:56:37 -08:00
Cong Ding 65315d4889 x86/boot: Fix minor fd leakage in tools/relocs.c
The opened file should be closed.

Signed-off-by: Cong Ding <dinggnu@gmail.com>
Cc: Kusanagi Kouichi <slash@ac.auone-net.jp>
Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/1358183628-27784-1-git-send-email-dinggnu@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-27 10:24:28 -08:00
Linus Torvalds ed06ef318a perf/urgent fixes:
. revert 20b279 - require exclude_guest to use PEBS - kernel side,
   now older binaries will continue working for things like cycles:pp
   without needing to pass extra modifiers, from David Ahern.
 
 . Fix building from 'make perf-*-src-pkg' tarballs, broken by UAPI, from
   Sebastian Andrzej Siewior
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJQ7yQhAAoJENZQFvNTUqpAjj4P/RR+8WeXpV02yzndhry+Yjav
 L/WQH0CkRRmyUA5akTcpgFrohJCEOi+auHl7ivmDX4XWFavcAkX3H1Yz1FytCOkb
 Cbb9Lv1rGdlTno8fTVUn8mNyTG64AlWrAw3ICixWw6q6I/k6SO7EkKigCxPhmY+2
 BE2EvkZmOY/PEUXgM6HtUdifORatX48p1toS7p3CDQ31cxBN5OVNZUXa1FakJpyH
 7R+1imKLsjuyi/G7Bt061LyWQkOh7L/ITWN+5Rx4RsUwRRT3vm1H9nlqUBsPS0PW
 qfkktkCmn/cFpKbfBipuglnt16jHPMfI/pghKvzx8n2uJMNEGXbfFDJefpzcdih9
 wIRgB6a5bvA8VF6Xpcn0I5JhqLAcnWTer07JgjZevjqYCdZStpbJjvE5131JjTLw
 Dnm7UshE+VFBcA3iXNX64p/X7WDJSk+SIDsJDuNe57dktFVLw76Ibb55XG18Ex7e
 c9QcIEhD1P19VzOniDZQZNEJhqnu5Vjle/eG+JRVRCm/BgoQFyJuD3EooKPN7hHR
 Op4oqf5RhDf7XNH0+Y4rOdRMZRiumdfcEl6kdcQGPJPycxpD7xCJNzBLWK/BvQgT
 Kl0AEkRC0KE2c5LyFttW+g1Byu1rctlMz2TVJDTskTm0XGOQ9mTzsQBP5rgBVt+b
 hQsMMWNVSI+jfm8bTJwx
 =LTjZ
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

 . revert 20b279 - require exclude_guest to use PEBS - kernel side, now
   older binaries will continue working for things like cycles:pp
   without needing to pass extra modifiers, from David Ahern.

 . Fix building from 'make perf-*-src-pkg' tarballs, broken by UAPI,
   from Sebastian Andrzej Siewior

[ Pulling directly, Ingo would normally pull but has been unresponsive ]

* tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf tools: Fix building from 'make perf-*-src-pkg' tarballs
  perf x86: revert 20b279 - require exclude_guest to use PEBS - kernel side
2013-01-22 14:32:07 -08:00
Oleg Nesterov 9899d11f65 ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL
putreg() assumes that the tracee is not running and pt_regs_access() can
safely play with its stack.  However a killed tracee can return from
ptrace_stop() to the low-level asm code and do RESTORE_REST, this means
that debugger can actually read/modify the kernel stack until the tracee
does SAVE_REST again.

set_task_blockstep() can race with SIGKILL too and in some sense this
race is even worse, the very fact the tracee can be woken up breaks the
logic.

As Linus suggested we can clear TASK_WAKEKILL around the arch_ptrace()
call, this ensures that nobody can ever wakeup the tracee while the
debugger looks at it.  Not only this fixes the mentioned problems, we
can do some cleanups/simplifications in arch_ptrace() paths.

Probably ptrace_unfreeze_traced() needs more callers, for example it
makes sense to make the tracee killable for oom-killer before
access_process_vm().

While at it, add the comment into may_ptrace_stop() to explain why
ptrace_stop() still can't rely on SIGKILL and signal_pending_state().

Reported-by: Salman Qazi <sqazi@google.com>
Reported-by: Suleiman Souhlal <suleiman@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-22 10:08:00 -08:00
Linus Torvalds 5c69bed266 Fixes:
- CVE-2013-0190/XSA-40 (or stack corruption for 32-bit PV kernels)
  - Fix racy vma access spotted by Al Viro
  - Fix mmap batch ioctl potentially resulting in large O(n) page allcations.
  - Fix vcpu online/offline BUG:scheduling while atomic..
  - Fix unbound buffer scanning for more than 32 vCPUs.
  - Fix grant table being incorrectly initialized
  - Fix incorrect check in pciback
  - Allow privcmd in backend domains.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEcBAABAgAGBQJQ+L7qAAoJEFjIrFwIi8fJLNIH/jUsneraEggWeh0L4GGWZvWL
 cNCf0zjQt/pi1Q5drbleW2/6Wv6s6N1QA9pGRsJ+rrliC73HVTqIWFh0TjpwmCVy
 hZal7jDXOuFVIR7GbGEPn004T6mkEnYDb/O2fyojwMVg0NQYwtMYJfTBkKdjKnmV
 z6sWpQPVqO3/nZ17k2DipYRldbeiqS6LLOiUWd72b2W8bV4ySY5iVPVsqFusSEr6
 PNyW33RPs5H0jEPR1uJlLD+l/uIbENykpEPeAS2uHGlch129+xHH5h79dwYJTbw6
 x5nAOveO9VNJscUoqhpE7YbySzJmrUwxnBerZ6YTW6WCknYXrx4uiVAlfWem7uY=
 =26Sq
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.8-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull Xen fixes from Konrad Rzeszutek Wilk:
 - CVE-2013-0190/XSA-40 (or stack corruption for 32-bit PV kernels)
 - Fix racy vma access spotted by Al Viro
 - Fix mmap batch ioctl potentially resulting in large O(n) page allcations.
 - Fix vcpu online/offline BUG:scheduling while atomic..
 - Fix unbound buffer scanning for more than 32 vCPUs.
 - Fix grant table being incorrectly initialized
 - Fix incorrect check in pciback
 - Allow privcmd in backend domains.

Fix up whitespace conflict due to ugly merge resolution in Xen tree in
arch/arm/xen/enlighten.c

* tag 'stable/for-linus-3.8-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: Fix stack corruption in xen_failsafe_callback for 32bit PVOPS guests.
  Revert "xen/smp: Fix CPU online/offline bug triggering a BUG: scheduling while atomic."
  xen/gntdev: remove erronous use of copy_to_user
  xen/gntdev: correctly unmap unlinked maps in mmu notifier
  xen/gntdev: fix unsafe vma access
  xen/privcmd: Fix mmap batch ioctl.
  Xen: properly bound buffer access when parsing cpu/*/availability
  xen/grant-table: correctly initialize grant table version 1
  x86/xen : Fix the wrong check in pciback
  xen/privcmd: Relax access control in privcmd_ioctl_mmap
2013-01-18 12:02:52 -08:00
Andrew Cooper 9174adbee4 xen: Fix stack corruption in xen_failsafe_callback for 32bit PVOPS guests.
This fixes CVE-2013-0190 / XSA-40

There has been an error on the xen_failsafe_callback path for failed
iret, which causes the stack pointer to be wrong when entering the
iret_exc error path.  This can result in the kernel crashing.

In the classic kernel case, the relevant code looked a little like:

        popl %eax      # Error code from hypervisor
        jz 5f
        addl $16,%esp
        jmp iret_exc   # Hypervisor said iret fault
5:      addl $16,%esp
                       # Hypervisor said segment selector fault

Here, there are two identical addls on either option of a branch which
appears to have been optimised by hoisting it above the jz, and
converting it to an lea, which leaves the flags register unaffected.

In the PVOPS case, the code looks like:

        popl_cfi %eax         # Error from the hypervisor
        lea 16(%esp),%esp     # Add $16 before choosing fault path
        CFI_ADJUST_CFA_OFFSET -16
        jz 5f
        addl $16,%esp         # Incorrectly adjust %esp again
        jmp iret_exc

It is possible unprivileged userspace applications to cause this
behaviour, for example by loading an LDT code selector, then changing
the code selector to be not-present.  At this point, there is a race
condition where it is possible for the hypervisor to return back to
userspace from an interrupt, fault on its own iret, and inject a
failsafe_callback into the kernel.

This bug has been present since the introduction of Xen PVOPS support
in commit 5ead97c84 (xen: Core Xen implementation), in 2.6.23.

Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: stable@vger.kernel.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-16 16:17:42 -05:00
Linus Torvalds 2409c873be Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "This is mainly a workaround for a bug in Sandy Bridge graphics which
  causes corruption of certain memory pages."

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/Sandy Bridge: Sandy Bridge workaround depends on CONFIG_PCI
  x86/Sandy Bridge: mark arrays in __init functions as __initconst
  x86/Sandy Bridge: reserve pages when integrated graphics is present
  x86, efi: correct precedence of operators in setup_efi_pci
2013-01-16 09:11:50 -08:00
Konrad Rzeszutek Wilk d55bf532d7 Revert "xen/smp: Fix CPU online/offline bug triggering a BUG: scheduling while atomic."
This reverts commit 41bd956de3.

The fix is incorrect and not appropiate for the latest kernels.
In fact it _causes_ the BUG: scheduling while atomic while
doing vCPU hotplug.

Suggested-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-15 22:41:27 -05:00
Konrad Rzeszutek Wilk 7bcc1ec077 Linux 3.7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJQxqj1AAoJEHm+PkMAQRiG9MQH/j21UwP2QGpdpXbWAnFMjtlv
 uE/yCFhPoqR1QjjE6oRlO6MHFA41xGDbr5RQki9Ik2AfSYiastt4ZWYvtSJKVTCr
 O0Lj+Cdt/2qBkGiARHqVEBZ4S/l/cw4/EHPb5StFyu3ggnPPQhoPIP7oAmRn0+mh
 NNb5CEcJOLqIaJSteqMP71Q899ncbLayBnimYCaC2f6r00beqNXIqxSHipcPlUsf
 ehNxqCX+5z5Q788EL33EL8GpBcy4Ueevu6nvnuVI8qIEnBnrBVngsiaQ4Hti+2eK
 A//4DYoF2N1wLjQv7hFeiwMURQ16OlxXoc/Z66sv2QQRwPxOIQlxdhWuey4KebA=
 =7LYr
 -----END PGP SIGNATURE-----

Merge tag 'v3.7' into stable/for-linus-3.8

Linux 3.7

* tag 'v3.7': (833 commits)
  Linux 3.7
  Input: matrix-keymap - provide proper module license
  Revert "revert "Revert "mm: remove __GFP_NO_KSWAPD""" and associated damage
  ipv4: ip_check_defrag must not modify skb before unsharing
  Revert "mm: avoid waking kswapd for THP allocations when compaction is deferred or contended"
  inet_diag: validate port comparison byte code to prevent unsafe reads
  inet_diag: avoid unsafe and nonsensical prefix matches in inet_diag_bc_run()
  inet_diag: validate byte code to prevent oops in inet_diag_bc_run()
  inet_diag: fix oops for IPv4 AF_INET6 TCP SYN-RECV state
  mm: vmscan: fix inappropriate zone congestion clearing
  vfs: fix O_DIRECT read past end of block device
  net: gro: fix possible panic in skb_gro_receive()
  tcp: bug fix Fast Open client retransmission
  tmpfs: fix shared mempolicy leak
  mm: vmscan: do not keep kswapd looping forever due to individual uncompactable zones
  mm: compaction: validate pfn range passed to isolate_freepages_block
  mmc: sh-mmcif: avoid oops on spurious interrupts (second try)
  Revert misapplied "mmc: sh-mmcif: avoid oops on spurious interrupts"
  mmc: sdhci-s3c: fix missing clock for gpio card-detect
  lib/Makefile: Fix oid_registry build dependency
  ...

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Conflicts:
	arch/arm/xen/enlighten.c
	drivers/xen/Makefile

[We need to have the v3.7 base as the 'for-3.8' was based off v3.7-rc3
and there are some patches in v3.7-rc6 that we to have in our branch]
2013-01-15 15:58:25 -05:00
H. Peter Anvin e43b3cec71 x86/Sandy Bridge: Sandy Bridge workaround depends on CONFIG_PCI
early_pci_allowed() and read_pci_config_16() are only available if
CONFIG_PCI is defined.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
2013-01-13 20:58:57 -08:00