linux

Commit Graph

Author	SHA1	Message	Date
Paul Gortmaker	ff24b07abb	s390: mm: Audit and remove any unnecessary uses of module.h Historically a lot of these existed because we did not have a distinction between what was modular code and what was providing support to modules via EXPORT_SYMBOL and friends. That changed when we forked out support for the latter into the export.h file. This means we should be able to reduce the usage of module.h in code that is obj-y Makefile or bool Kconfig. The advantage in doing so is that module.h itself sources about 15 other headers; adding significantly to what we feed cpp, and it can obscure what headers we are effectively using. Since module.h was the source for init.h (for __init) and for export.h (for EXPORT_SYMBOL) we consider each change instance for the presence of either and replace as needed. An instance where module_param was used without moduleparam.h was also fixed, as well as an implict use of asm/elf.h header. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-02-17 07:40:35 +01:00
Paul Gortmaker	3994a52b54	s390: kernel: Audit and remove any unnecessary uses of module.h Historically a lot of these existed because we did not have a distinction between what was modular code and what was providing support to modules via EXPORT_SYMBOL and friends. That changed when we forked out support for the latter into the export.h file. This means we should be able to reduce the usage of module.h in code that is obj-y Makefile or bool Kconfig. The advantage in doing so is that module.h itself sources about 15 other headers; adding significantly to what we feed cpp, and it can obscure what headers we are effectively using. Since module.h was the source for init.h (for __init) and for export.h (for EXPORT_SYMBOL) we consider each change instance for the presence of either and replace as needed. Build testing revealed some implicit header usage that was fixed up accordingly. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-02-17 07:40:31 +01:00
Michael Holzheu	a4a81d8eeb	s390/kdump: Use "LINUX" ELF note name instead of "CORE" In binutils/libbfd (bfd/elf.c) it is enforced that all s390 specific ELF notes like e.g. NT_S390_PREFIX or NT_S390_CTRS have "LINUX" specified as note name. Otherwise the notes are ignored. For /proc/vmcore we currently use "CORE" for these notes. Up to now this has not been a real problem because the dump analysis tool "crash" does not check the note name. But it will break all programs that use libbfd for processing ELF notes. So fix this and use "LINUX" for all s390 specific notes to comply with libbfd. Cc: stable@vger.kernel.org # v4.4+ Reported-by: Philipp Rudo <prudo@linux.vnet.ibm.com> Reviewed-by: Philipp Rudo <prudo@linux.vnet.ibm.com> Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-02-17 07:40:25 +01:00
Martin Schwidefsky	57d7f939e7	s390: add no-execute support Bit 0x100 of a page table, segment table of region table entry can be used to disallow code execution for the virtual addresses associated with the entry. There is one tricky bit, the system call to return from a signal is part of the signal frame written to the user stack. With a non-executable stack this would stop working. To avoid breaking things the protection fault handler checks the opcode that caused the fault for 0x0a77 (sys_sigreturn) and 0x0aad (sys_rt_sigreturn) and injects a system call. This is preferable to the alternative solution with a stub function in the vdso because it works for vdso=off and statically linked binaries as well. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-02-08 14:13:25 +01:00
Martin Schwidefsky	2583b848ca	s390: report new vector facilities Add hardware capability bits and feature tags to /proc/cpuinfo for the "Vector Packed Decimal Facility" (tag "vxd" / hwcap bit 2^12) and the "Vector Enhancements Facility 1" (tag "vxe" / hwcap bit 2^13). Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-02-08 14:13:24 +01:00
Heiko Carstens	4920e3cf77	s390: use correct input data address for setup_randomness The current implementation of setup_randomness uses the stack address and therefore the pointer to the SYSIB 3.2.2 block as input data address. Furthermore the length of the input data is the number of virtual-machine description blocks which is typically one. This means that typically a single zero byte is fed to add_device_randomness. Fix both of these and use the address of the first virtual machine description block as input data address and also use the correct length. Fixes: `bcfcbb6bae` ("s390: add system information as device randomness") Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-02-08 14:13:23 +01:00
Heiko Carstens	02407baaeb	s390/sclp: don't add new lines to each printed string The early vt220 sclp printk code added an extra new line to each printed multi-line text. If used for the early sclp console this will lead to numerous extra new lines. Therefore get rid of this semantic and require that each to be printed string contains a line feed character if a new line is wanted. Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-02-08 14:13:20 +01:00
Heiko Carstens	d5ab7a34f9	s390/sclp: make early sclp code readable This patch - unifies the old sclp early code and the sclp early printk code, so they can use common functions - makes sure all sclp early functions and variables have the same "sclp_early" prefix - converts the sclp early printk code into readable code by using existing data structures instead of hard coded magic arrays - splits the early sclp code into two files: sclp_early.c and sclp_early_core.c. The core file contains everything that is required by the kernel decompressor and may not call functions not contained within the core file. Otherwise the result would be a link error. - changes interrupt handling to be completely synchronous. The old early sclp code had a small window which allowed to receive several interrupts instead of exactly the single expected interrupt. This did hide a subtle potential bug, which is fixed with this large rework. - contains a couple of small cleanups. Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-02-08 14:13:19 +01:00
Heiko Carstens	9090f3feb3	s390/sclp: move early printk code to drivers Move the early sclp printk code to the drivers folder where also the rest of the sclp code can be found. This way it is possible to use the sclp private header files for further cleanups. Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-02-08 14:13:17 +01:00
Martin Schwidefsky	ea417aa8a3	s390/debug: make debug event time stamps relative to the boot TOD clock The debug features currently uses absolute TOD time stamps for the debug events. Given that the TOD clock can jump forward and backward due to STP sync checks the order of debug events can get obfuscated. Replace the absolute TOD time stamps with a delta to the IPL time stamp. On a STP sync check the TOD clock correction is added to the IPL time stamp as well to make the deltas unaffected by STP sync check. The readout of the debug feature entries will convert the deltas back to absolute time stamps based on the Unix epoch. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-02-07 07:27:13 +01:00
Martin Schwidefsky	8676caa4fb	s390/topology: correct allocation of topology information The data stored by the STSI instruction can be up to a page in size but the memblock_virt_alloc allocation for tl_info only specifies 16 bytes. The memory after the short allocation is overwritten every time arch_update_cpu_topology is called. Fixes: `8c91058022` "s390/numa: establish cpu to node mapping early" Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-02-07 07:27:12 +01:00
Heiko Carstens	da8fd820f3	s390: make setup_randomness work Commit `bcfcbb6bae` ("s390: add system information as device randomness") intended to add some virtual machine specific information to the randomness pool. Unfortunately it uses the page allocator before it is ready to use. In result the page allocator always returns NULL and the setup_randomness function never adds anything to the randomness pool. To fix this use memblock_alloc and memblock_free instead. Fixes: `bcfcbb6bae` ("s390: add system information as device randomness") Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-02-07 07:27:10 +01:00
Martin Schwidefsky	34525e1f7e	s390: store breaking event address only for program checks The principles of operations specifies that the breaking event address is stored to the address 0x110 in the prefix page only for program checks. The last branch in user space is lost as soon as a branch in kernel space is executed after e.g. an svc. This makes it impossible to accurately maintain the breaking event address for a user space process. Simplify the code, just copy the current breaking event address from 0x110 to the task structure for program checks from user space. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-01-31 10:46:53 +01:00
Harald Freudenberger	d34b1acb78	s390/prng: Adjust generation of entropy to produce real 256 bits. The generate_entropy function used a sha256 for compacting together 256 bits of entropy into 32 bytes hash. However, it is questionable if a sha256 can really be used here, as potential collisions may reduce the max entropy fitting into a 32 byte hash value. So this batch introduces the use of sha512 instead and the required buffer adjustments for the calling functions. Further more the working buffer for the generate_entropy function has been widened from one page to two pages. So now 1024 stckf invocations are used to gather 256 bits of entropy. This has been done to be on the save side if the jitters of stckf values isn't as good as supposed. Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-01-31 10:46:10 +01:00
Harald Freudenberger	a4f2779ecf	s390/crypto: Extend key length check for AES-XTS in fips mode. In fips mode only xts keys with 128 bit or 125 bit are allowed. This fix extends the xts_aes_set_key function to check for these valid key lengths in fips mode. Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-01-31 10:46:06 +01:00
Matthew Rosato	f3d3584faf	s390/crypto: Check des3_ede keys for uniqueness in fips mode Triple-DES implementations will soon be required to check for uniqueness of keys with fips mode enabled. Add checks to ensure none of the 3 keys match. Signed-off-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com> Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-01-31 10:46:02 +01:00
Daniel Borkmann	9437964885	s390/bpf: remove redundant check for non-null image After we already allocated the jit.prg_buf image via bpf_jit_binary_alloc() and filled it out with instructions, jit.prg_buf cannot be NULL anymore. Thus, remove the unnecessary check. Tested on s390x with test_bpf module. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-01-16 07:27:55 +01:00
Heiko Carstens	89175cf766	s390: provide sclp based boot console Use the early sclp code to provide a boot console. This boot console is available if the kernel parameter "earlyprintk" has been specified, just like it works for other architectures that also provide an early boot console. This makes debugging of early problems much easier, since now we finally have working console output even before memory detection is running. The boot console will be automatically disabled as soon as another console will be registered. Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-01-16 07:27:55 +01:00
Heiko Carstens	f031974859	s390/sclp: always stay within bounds of the early sccb Make sure the _sclp_print_lm function stays within bounds of the early sccb, even if the passed string is very long. If the string is too long, the remaining characters will be dropped. Suggested-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-01-16 07:27:55 +01:00
Heiko Carstens	742dc5773c	s390/sclp: make early sclp irq handler more robust Make the early sclp interrupt handler more robust: - disable all interrupt sub classes except for the service signal subclass - extend ctlreg0 union so it is easily possible to set the service signal subclass mask bit without using a magic number - disable lowcore protection before writing to it - make sure that all write accesses are done before the original content of control register 0 is restored, which could enable lowcore protection Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-01-16 07:27:55 +01:00
Heiko Carstens	68cc795d19	s390/topology: make "topology=off" parameter work The "topology=off" kernel parameter is supposed to prevent the kernel to use hardware topology information to generate scheduling domains etc. For an unknown reason I implemented this in a very odd way back then: instead of simply clearing the MACHINE_HAS_TOPOLOGY flag within the lowcore I added a second variable which indicated that topology information should not be used. This is more than suboptimal since it partially doesn't work. For the fake NUMA case topology information is still considered and scheduling domains will be created based on this. To fix this and to simplify the code get rid of the extra variable and implement the "topology=off" case like it is done for other features. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-01-16 07:27:54 +01:00
Heiko Carstens	970ba6ac6a	s390: use false/true when using bool Yet another trivial patch to reduce the noise that coccinelle generates. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-01-16 07:27:54 +01:00
Heiko Carstens	0b92515916	s390: remove couple of unneeded semicolons Remove a couple of unneeded semicolons. This is just to reduce the noise that the coccinelle static code checker generates. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-01-16 07:27:54 +01:00
Heiko Carstens	496e59cc48	s390/topology: reduce number of printks Merge the seven printks within topology_init_early to a single one. With an early boot console this avoids printing six lines each containing only a single character. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-01-16 07:27:54 +01:00
Heiko Carstens	00de54c803	s390/mem_detect: fix memory type of first block Fix a long-standing but currently irrelevant bug: the memory detection code performs a tprot instruction on address zero to figure out if the first memory chunk is readable or writable. Due to low address protection the result is "read-only". If the memory detection code would actually care, it would have to ignore the first memory increment, but it adds the memory increment to writable memory anyway. If memblock debugging is enabled this leads to an extra rather surprising call which registers memory. To avoid this get rid of the first misleading tprot call and simply assume that the first memory increment is writable. Otherwise we wouldn't have reached the memory detection code anyway. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-01-16 07:27:54 +01:00
Heiko Carstens	a2ce2a9568	s390/mem_detect: add debugging output The s390 specific memory detection code does not call memblock_add, which would generate debug output if memblock=debug is specified on the kernel command line. Instead it directly calls memblock_add_range, which doesn't generate any debug output. To have a chance to debug early memblock related bugs add an s390 specific memblock_dbg call and a (missing) memblock_dump_all call. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-01-16 07:27:53 +01:00
Heiko Carstens	7be5e359a7	s390/setup: call memblock_reserve only for size > 0 reserve_initrd currently calls memblock_reserve even if the to be reserved size is zero. Even though the memblock core code can handle this correctly, it still yields confusing debug messages if memblock debugging is enabled. Therefore make sure to not call memblock_reserve with a size of zero. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-01-16 07:27:53 +01:00
Sebastian Ott	5064cd3506	s390/pci: use proper endianness annotations Add proper annotation to the bar definition and use casts within the bus accessors. Also change the sequence in the accessors to do the shifts in the native byte order. No functional change. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-01-16 07:27:53 +01:00
Heiko Carstens	90b3baa232	s390: proper type casts for csum_partial invocations Keep sparse and other static code checkers from emitting warnings like: arch/s390/kernel/ipl.c:1549:14: warning: incorrect type in assignment (different base types) arch/s390/kernel/ipl.c:1549:14: expected unsigned int [unsigned] csum arch/s390/kernel/ipl.c:1549:14: got restricted __wsum All usages in s390 code are ok. Therefore add proper casts. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-01-16 07:27:53 +01:00
Heiko Carstens	551f413434	s390/lib: improve memmove, memset and memcpy Improve the memmove implementation to save one instruction and use better label names. Also use better label names for the memset and memcpy implementations so everything looks consistent. Suggested-by: Jens Remus <jremus@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-01-16 07:27:51 +01:00
Heiko Carstens	e3850ecfc1	s390/cpumf: get rid of variable length array The stcctm5 inline assembly uses a variable length array to specify the memory that is written to. According to the gcc manual this trick only works if the length is known at compile time. This is not the the case for the stccm5 inline assembly. Therefore simply use a full memory clobber. As requested by Martin also move the output Q constraint operand to the input operands list, since all we want is that the compiler generates an instruction that may use the displacement field: in other words we only need the address of *val. That the inline assembly actually writes to an array starting at val is taken care of with the memory clobber. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-01-16 07:27:51 +01:00
Heiko Carstens	1d9995771f	s390: update defconfigs Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2017-01-16 07:27:48 +01:00
Heiko Carstens	e991c24d68	s390/ctl_reg: make __ctl_load a full memory barrier We have quite a lot of code that depends on the order of the __ctl_load inline assemby and subsequent memory accesses, like e.g. disabling lowcore protection and the writing to lowcore. Since the __ctl_load macro does not have memory barrier semantics, nor any other dependencies the compiler is, theoretically, free to shuffle code around. Or in other words: storing to lowcore could happen before lowcore protection is disabled. In order to avoid this class of potential bugs simply add a full memory barrier to the __ctl_load macro. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-01-16 07:27:48 +01:00
Linus Torvalds	83346fbc07	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: - unwinder fixes - AMD CPU topology enumeration fixes - microcode loader fixes - x86 embedded platform fixes - fix for a bootup crash that may trigger when clearcpuid= is used with invalid values" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mpx: Use compatible types in comparison to fix sparse error x86/tsc: Add the Intel Denverton Processor to native_calibrate_tsc() x86/entry: Fix the end of the stack for newly forked tasks x86/unwind: Include __schedule() in stack traces x86/unwind: Disable KASAN checks for non-current tasks x86/unwind: Silence warnings for non-current tasks x86/microcode/intel: Use correct buffer size for saving microcode data x86/microcode/intel: Fix allocation size of struct ucode_patch x86/microcode/intel: Add a helper which gives the microcode revision x86/microcode: Use native CPUID to tickle out microcode revision x86/CPU: Add native CPUID variants returning a single datum x86/boot: Add missing declaration of string functions x86/CPU/AMD: Fix Bulldozer topology x86/platform/intel-mid: Rename 'spidev' to 'mrfld_spidev' x86/cpu: Fix typo in the comment for Anniedale x86/cpu: Fix bootup crashes by sanitizing the argument of the 'clearcpuid=' command-line option	2017-01-15 12:03:11 -08:00
Linus Torvalds	79078c53ba	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc race fixes uncovered by fuzzing efforts, a Sparse fix, two PMU driver fixes, plus miscellanous tooling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Reject non sampling events with precise_ip perf/x86/intel: Account interrupts for PEBS errors perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race perf/core: Fix sys_perf_event_open() vs. hotplug perf/x86/intel: Use ULL constant to prevent undefined shift behaviour perf/x86/intel/uncore: Fix hardcoded socket 0 assumption in the Haswell init code perf/x86: Set pmu->module in Intel PMU modules perf probe: Fix to probe on gcc generated symbols for offline kernel perf probe: Fix --funcs to show correct symbols for offline module perf symbols: Robustify reading of build-id from sysfs perf tools: Install tools/lib/traceevent plugins with install-bin tools lib traceevent: Fix prev/next_prio for deadline tasks perf record: Fix --switch-output documentation and comment perf record: Make __record_options static tools lib subcmd: Add OPT_STRING_OPTARG_SET option perf probe: Fix to get correct modname from elf header samples/bpf trace_output_user: Remove duplicate sys/ioctl.h include samples/bpf sock_example: Avoid getting ethhdr from two includes perf sched timehist: Show total scheduling time	2017-01-15 11:37:43 -08:00
Linus Torvalds	255e6140fa	Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Ingo Molnar: "A number of regression fixes: - Fix a boot hang on machines that have somewhat unusual memory map entries of phys_addr=0x0 num_pages=0, which broke due to a recent commit. This commit got cherry-picked from the v4.11 queue because the bug is affecting real machines. - Fix a boot hang also reported by KASAN, caused by incorrect init ordering introduced by a recent optimization. - Fix a recent robustification fix to allocate_new_fdt_and_exit_boot() that introduced an invalid assumption. Neither bugs were seen in the wild AFAIK" * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/x86: Prune invalid memory map entries and fix boot regression x86/efi: Don't allocate memmap through memblock after mm_init() efi/libstub/arm*: Pass latest memory map to the kernel	2017-01-15 10:54:39 -08:00
Peter Jones	0100a3e67a	efi/x86: Prune invalid memory map entries and fix boot regression Some machines, such as the Lenovo ThinkPad W541 with firmware GNET80WW (2.28), include memory map entries with phys_addr=0x0 and num_pages=0. These machines fail to boot after the following commit, commit `8e80632fb2` ("efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()") Fix this by removing such bogus entries from the memory map. Furthermore, currently the log output for this case (with efi=debug) looks like: [ 0.000000] efi: mem45: [Reserved \| \| \| \| \| \| \| \| \| \| \| \| ] range=[0x0000000000000000-0xffffffffffffffff] (0MB) This is clearly wrong, and also not as informative as it could be. This patch changes it so that if we find obviously invalid memory map entries, we print an error and skip those entries. It also detects the display of the address range calculation overflow, so the new output is: [ 0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries: [ 0.000000] efi: mem45: [Reserved \| \| \| \| \| \| \| \| \| \| \| \| ] range=[0x0000000000000000-0x0000000000000000] (invalid) It also detects memory map sizes that would overflow the physical address, for example phys_addr=0xfffffffffffff000 and num_pages=0x0200000000000001, and prints: [ 0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries: [ 0.000000] efi: mem45: [Reserved \| \| \| \| \| \| \| \| \| \| \| \| ] range=[phys_addr=0xfffffffffffff000-0x20ffffffffffffffff] (invalid) It then removes these entries from the memory map. Signed-off-by: Peter Jones <pjones@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [ardb: refactor for clarity with no functional changes, avoid PAGE_SHIFT] Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> [Matt: Include bugzilla info in commit log] Cc: <stable@vger.kernel.org> # v4.9+ Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://bugzilla.kernel.org/show_bug.cgi?id=191121 Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-14 16:48:53 +01:00
Jiri Olsa	18e7a45af9	perf/x86: Reject non sampling events with precise_ip As Peter suggested [1] rejecting non sampling PEBS events, because they dont make any sense and could cause bugs in the NMI handler [2]. [1] http://lkml.kernel.org/r/20170103094059.GC3093@worktop [2] http://lkml.kernel.org/r/1482931866-6018-3-git-send-email-jolsa@kernel.org Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vince@deater.net> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20170103142454.GA26251@krava Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-14 11:06:50 +01:00
Jiri Olsa	475113d937	perf/x86/intel: Account interrupts for PEBS errors It's possible to set up PEBS events to get only errors and not any data, like on SNB-X (model 45) and IVB-EP (model 62) via 2 perf commands running simultaneously: taskset -c 1 ./perf record -c 4 -e branches:pp -j any -C 10 This leads to a soft lock up, because the error path of the intel_pmu_drain_pebs_nhm() does not account event->hw.interrupt for error PEBS interrupts, so in case you're getting ONLY errors you don't have a way to stop the event when it's over the max_samples_per_tick limit: NMI watchdog: BUG: soft lockup - CPU#22 stuck for 22s! [perf_fuzzer:5816] ... RIP: 0010:[<ffffffff81159232>] [<ffffffff81159232>] smp_call_function_single+0xe2/0x140 ... Call Trace: ? trace_hardirqs_on_caller+0xf5/0x1b0 ? perf_cgroup_attach+0x70/0x70 perf_install_in_context+0x199/0x1b0 ? ctx_resched+0x90/0x90 SYSC_perf_event_open+0x641/0xf90 SyS_perf_event_open+0x9/0x10 do_syscall_64+0x6c/0x1f0 entry_SYSCALL64_slow_path+0x25/0x25 Add perf_event_account_interrupt() which does the interrupt and frequency checks and call it from intel_pmu_drain_pebs_nhm()'s error path. We keep the pending_kill and pending_wakeup logic only in the __perf_event_overflow() path, because they make sense only if there's any data to deliver. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vince@deater.net> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1482931866-6018-2-git-send-email-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-14 11:06:49 +01:00
Tobias Klauser	4538286257	x86/mpx: Use compatible types in comparison to fix sparse error info->si_addr is of type void __user *, so it should be compared against something from the same address space. This fixes the following sparse error: arch/x86/mm/mpx.c:296:27: error: incompatible types in comparison expression (different address spaces) Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-14 09:32:06 +01:00
Len Brown	695085b4bc	x86/tsc: Add the Intel Denverton Processor to native_calibrate_tsc() The Intel Denverton microserver uses a 25 MHz TSC crystal, so we can derive its exact [] TSC frequency using CPUID and some arithmetic, eg.: TSC: 1800 MHz (25000000 Hz 216 / 3 / 1000000) [*] 'exact' is only as good as the crystal, which should be +/- 20ppm Signed-off-by: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/306899f94804aece6d8fa8b4223ede3b48dbb59c.1484287748.git.len.brown@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-14 09:30:37 +01:00
Linus Torvalds	406732c932	* fix for module unload vs. deferred jump labels (note: there might be other buggy modules!) * two NULL pointer dereferences from syzkaller * CVE from syzkaller, very serious on 4.10-rc, "just" kernel memory leak on releases * CVE from security@kernel.org, somewhat serious on AMD, less so on Intel -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJYd7l5AAoJEL/70l94x66DLWYH/0GUg+lK9J/gj0kwqi6BwsOP Rrs5Y7XvyNLsy/piBrrHDHvRa+DfAkrU8nepwgygX/yuGmSDV/zmdIb8XA/dvKht MN285NFlVjTyznYlU/LH3etx11CHLMNclishiFHQbcnohtvhOe+fvN6RVNdfeRxm d9iBPOum15ikc1xDl2z8Op+ZXVjMxkgLkzIXFcDBpJf4BvUx0X+ZHZXIKdizVhgU ZMD2ds/MutMB8X1A52qp6kQvT7xE4rp87M0So4qDMTbAto5G4ZmMaWC5MlK2Oxe/ o+3qnx4vVz4H6uYzg1N4diHiC+buhgtXCLwwkcUOKKUVqJRP9e0Bh7kw8JA52XU= =C+tM -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Paolo Bonzini: - fix for module unload vs deferred jump labels (note: there might be other buggy modules!) - two NULL pointer dereferences from syzkaller - also syzkaller: fix emulation of fxsave/fxrstor/sgdt/sidt, problem made worse during this merge window, "just" kernel memory leak on releases - fix emulation of "mov ss" - somewhat serious on AMD, less so on Intel * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: fix emulation of "MOV SS, null selector" KVM: x86: fix NULL deref in vcpu_scan_ioapic KVM: eventfd: fix NULL deref irqbypass consumer KVM: x86: Introduce segmented_write_std KVM: x86: flush pending lapic jump label updates on module unload jump_labels: API for flushing deferred jump label updates	2017-01-13 17:06:24 -08:00
Linus Torvalds	a65c92597d	- Fix huge_ptep_set_access_flags() to return "changed" when any of the ptes in the contiguous range is changed, not just the last one - Fix the adr_l assembly macro to work in modules under KASLR -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJYeRmYAAoJEGvWsS0AyF7x6tYQAJd0Rtb88kwalDYb/kMcutBU 3xyjvb8mIEKtnMOP1wS4o3YdqD6ke9OMCUm2EAwhAxgkfzwklsDOOOUlWsDijif2 X3TzYWoKVgoje3oFODXOHMZNLqU6lBmuVN6G4ZdVPsTfvntTLE4cn9q828OgLdtB L1H+cRkHMhO9w4a0VxZFsNWtSDs4UugGLUp/cNLA4gXFj4atw8+bgX9o7BsmCb1d x+rd3LDWJb+a1YFKhKJkLQO+uQKk3n7d1WQ0DrQeDBgPs4uzMx422WpfmoW+j/dq MV/6C8ZYtQczS4BKp8k9apFHq3SC0bZcPLhtXqf/NZZCCLvDKS0iPflDAArYmIHo mOnmYhw+SeGc0llp9+tDaReco71HAqzdlpYnhGEePDEc0ZXBBr4/xqAwQoY4tgWa uZLSGZuiGqCFovzLb+LMLEtQlFyu48w+Y4Ct6r0M9gmRmU6d8msoEvXkA2IB/q8z JGFdFkJ1ZD8MtabRqUzYhuqn7WD+aC5eA3uqImnPjcrqNaYaiSy8Wif6vO+7asz5 1YWyEaLuL9rITllunTQuK0crgZGjplwhGKYASz/w82AZebBeTl84adK/x7jrJgbn BPxQRHg4LqoX7i6tU3KWc/ulbE8EzOeJabCcKN8HnkPvt2akgKh/nlH3NQVLpG0l c/ffN90w3+fK7pNQKYnu =aUnr -----END PGP SIGNATURE----- Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - Fix huge_ptep_set_access_flags() to return "changed" when any of the ptes in the contiguous range is changed, not just the last one - Fix the adr_l assembly macro to work in modules under KASLR * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: assembler: make adr_l work in modules under KASLR arm64: hugetlb: fix the wrong return value for huge_ptep_set_access_flags	2017-01-13 17:00:42 -08:00
Ard Biesheuvel	41c066f2c4	arm64: assembler: make adr_l work in modules under KASLR When CONFIG_RANDOMIZE_MODULE_REGION_FULL=y, the offset between loaded modules and the core kernel may exceed 4 GB, putting symbols exported by the core kernel out of the reach of the ordinary adrp/add instruction pairs used to generate relative symbol references. So make the adr_l macro emit a movz/movk sequence instead when executing in module context. While at it, remove the pointless special case for the stack pointer. Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2017-01-12 18:10:52 +00:00
Paolo Bonzini	33ab91103b	KVM: x86: fix emulation of "MOV SS, null selector" This is CVE-2017-2583. On Intel this causes a failed vmentry because SS's type is neither 3 nor 7 (even though the manual says this check is only done for usable SS, and the dmesg splat says that SS is unusable!). On AMD it's worse: svm.c is confused and sets CPL to 0 in the vmcb. The fix fabricates a data segment descriptor when SS is set to a null selector, so that CPL and SS.DPL are set correctly in the VMCS/vmcb. Furthermore, only allow setting SS to a NULL selector if SS.RPL < 3; this in turn ensures CPL < 3 because RPL must be equal to CPL. Thanks to Andy Lutomirski and Willy Tarreau for help in analyzing the bug and deciphering the manuals. Reported-by: Xiaohan Zhang <zhangxiaohan1@huawei.com> Fixes: `79d5b4c3cd` Cc: stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-01-12 15:17:13 +01:00
Wanpeng Li	546d87e5c9	KVM: x86: fix NULL deref in vcpu_scan_ioapic Reported by syzkaller: BUG: unable to handle kernel NULL pointer dereference at 00000000000001b0 IP: _raw_spin_lock+0xc/0x30 PGD 3e28eb067 PUD 3f0ac6067 PMD 0 Oops: 0002 [#1] SMP CPU: 0 PID: 2431 Comm: test Tainted: G OE 4.10.0-rc1+ #3 Call Trace: ? kvm_ioapic_scan_entry+0x3e/0x110 [kvm] kvm_arch_vcpu_ioctl_run+0x10a8/0x15f0 [kvm] ? pick_next_task_fair+0xe1/0x4e0 ? kvm_arch_vcpu_load+0xea/0x260 [kvm] kvm_vcpu_ioctl+0x33a/0x600 [kvm] ? hrtimer_try_to_cancel+0x29/0x130 ? do_nanosleep+0x97/0xf0 do_vfs_ioctl+0xa1/0x5d0 ? __hrtimer_init+0x90/0x90 ? do_nanosleep+0x5b/0xf0 SyS_ioctl+0x79/0x90 do_syscall_64+0x6e/0x180 entry_SYSCALL64_slow_path+0x25/0x25 RIP: _raw_spin_lock+0xc/0x30 RSP: ffffa43688973cc0 The syzkaller folks reported a NULL pointer dereference due to ENABLE_CAP succeeding even without an irqchip. The Hyper-V synthetic interrupt controller is activated, resulting in a wrong request to rescan the ioapic and a NULL pointer dereference. #include <sys/ioctl.h> #include <sys/mman.h> #include <sys/types.h> #include <linux/kvm.h> #include <pthread.h> #include <stddef.h> #include <stdint.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #ifndef KVM_CAP_HYPERV_SYNIC #define KVM_CAP_HYPERV_SYNIC 123 #endif void* thr(void* arg) { struct kvm_enable_cap cap; cap.flags = 0; cap.cap = KVM_CAP_HYPERV_SYNIC; ioctl((long)arg, KVM_ENABLE_CAP, &cap); return 0; } int main() { void host_mem = mmap(0, 0x1000, PROT_READ\|PROT_WRITE, MAP_PRIVATE\|MAP_ANONYMOUS, -1, 0); int kvmfd = open("/dev/kvm", 0); int vmfd = ioctl(kvmfd, KVM_CREATE_VM, 0); struct kvm_userspace_memory_region memreg; memreg.slot = 0; memreg.flags = 0; memreg.guest_phys_addr = 0; memreg.memory_size = 0x1000; memreg.userspace_addr = (unsigned long)host_mem; host_mem[0] = 0xf4; ioctl(vmfd, KVM_SET_USER_MEMORY_REGION, &memreg); int cpufd = ioctl(vmfd, KVM_CREATE_VCPU, 0); struct kvm_sregs sregs; ioctl(cpufd, KVM_GET_SREGS, &sregs); sregs.cr0 = 0; sregs.cr4 = 0; sregs.efer = 0; sregs.cs.selector = 0; sregs.cs.base = 0; ioctl(cpufd, KVM_SET_SREGS, &sregs); struct kvm_regs regs = { .rflags = 2 }; ioctl(cpufd, KVM_SET_REGS, &regs); ioctl(vmfd, KVM_CREATE_IRQCHIP, 0); pthread_t th; pthread_create(&th, 0, thr, (void)(long)cpufd); usleep(rand() % 10000); ioctl(cpufd, KVM_RUN, 0); pthread_join(th, 0); return 0; } This patch fixes it by failing ENABLE_CAP if without an irqchip. Reported-by: Dmitry Vyukov <dvyukov@google.com> Fixes: `5c919412fe` (kvm/x86: Hyper-V synthetic interrupt controller) Cc: stable@vger.kernel.org # 4.5+ Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-01-12 14:52:52 +01:00
Steve Rutherford	129a72a0d3	KVM: x86: Introduce segmented_write_std Introduces segemented_write_std. Switches from emulated reads/writes to standard read/writes in fxsave, fxrstor, sgdt, and sidt. This fixes CVE-2017-2584, a longstanding kernel memory leak. Since commit `283c95d0e3` ("KVM: x86: emulate FXSAVE and FXRSTOR", 2016-11-09), which is luckily not yet in any final release, this would also be an exploitable kernel memory write! Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Fixes: `96051572c8` Fixes: `283c95d0e3` Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Steve Rutherford <srutherford@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-01-12 14:34:58 +01:00
David Matlack	cef84c302f	KVM: x86: flush pending lapic jump label updates on module unload KVM's lapic emulation uses static_key_deferred (apic_{hw,sw}_disabled). These are implemented with delayed_work structs which can still be pending when the KVM module is unloaded. We've seen this cause kernel panics when the kvm_intel module is quickly reloaded. Use the new static_key_deferred_flush() API to flush pending updates on module unload. Signed-off-by: David Matlack <dmatlack@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-01-12 14:33:17 +01:00
Josh Poimboeuf	ff3f7e2475	x86/entry: Fix the end of the stack for newly forked tasks When unwinding a task, the end of the stack is always at the same offset right below the saved pt_regs, regardless of which syscall was used to enter the kernel. That convention allows the unwinder to verify that a stack is sane. However, newly forked tasks don't always follow that convention, as reported by the following unwinder warning seen by Dave Jones: WARNING: kernel stack frame pointer at ffffc90001443f30 in kworker/u8:8:30468 has bad value (null) The warning was due to the following call chain: (ftrace handler) call_usermodehelper_exec_async+0x5/0x140 ret_from_fork+0x22/0x30 The problem is that ret_from_fork() doesn't create a stack frame before calling other functions. Fix that by carefully using the frame pointer macros. In addition to conforming to the end of stack convention, this also makes related stack traces more sensible by making it clear to the user that ret_from_fork() was involved. Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/8854cdaab980e9700a81e9ebf0d4238e4bbb68ef.1483978430.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-12 09:28:29 +01:00
Josh Poimboeuf	2c96b2fe9c	x86/unwind: Include __schedule() in stack traces In the following commit: `0100301bfd` ("sched/x86: Rewrite the switch_to() code") ... the layout of the 'inactive_task_frame' struct was designed to have a frame pointer header embedded in it, so that the unwinder could use the 'bp' and 'ret_addr' fields to report __schedule() on the stack (or ret_from_fork() for newly forked tasks which haven't actually run yet). Finish the job by changing get_frame_pointer() to return a pointer to inactive_task_frame's 'bp' field rather than 'bp' itself. This allows the unwinder to start one frame higher on the stack, so that it properly reports __schedule(). Reported-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/598e9f7505ed0aba86e8b9590aa528c6c7ae8dcd.1483978430.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-12 09:28:28 +01:00

1 2 3 4 5 ...

129918 Commits