Commit Graph

31886 Commits

Author SHA1 Message Date
Kirill A. Shutemov 82434d23f3 x86/boot/compressed/64: Explain paging_prepare()'s return value
paging_prepare() returns a two-quadword structure which lands
into RDX:RAX:

  - Address of the trampoline is returned in RAX.
  - Non zero RDX means trampoline needs to enable 5-level paging.

Document that explicitly.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: dave.hansen@linux.intel.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kyle D Pelton <kyle.d.pelton@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Huang <wei@redhat.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190206154756.matwldebbxkmlnae@black.fi.intel.com
2019-02-06 19:08:34 +01:00
Kirill A. Shutemov 45b13b424f x86/boot/compressed/64: Do not corrupt EDX on EFER.LME=1 setting
RDMSR in the trampoline code overwrites EDX but that register is used
to indicate whether 5-level paging has to be enabled and if clobbered,
leads to failure to boot on a 5-level paging machine.

Preserve EDX on the stack while we are dealing with EFER.

Fixes: b677dfae5a ("x86/boot/compressed/64: Set EFER.LME=1 in 32-bit trampoline before returning to long mode")
Reported-by: Kyle D Pelton <kyle.d.pelton@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: dave.hansen@linux.intel.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Huang <wei@redhat.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190206115253.1907-1-kirill.shutemov@linux.intel.com
2019-02-06 18:56:18 +01:00
Linus Walleij 01dc79cd6f
regulator: fixed/gpio: Pull inversion/OD into gpiolib
This pushes the handling of inversion semantics and open drain
settings to the GPIO descriptor and gpiolib. All affected board
files are also augmented.

This is especially nice since we don't have to have any
confusing flags passed around to the left and right littering
the fixed and GPIO regulator drivers and the regulator core.
It is all just very straight-forward: the core asks the GPIO
line to be asserted or deasserted and gpiolib deals with the
rest depending on how the platform is configured: if the line
is active low, it deals with that, if the line is open drain,
it deals with that too.

Cc: Alexander Shiyan <shc_work@mail.ru> # i.MX boards user
Cc: Haojian Zhuang <haojian.zhuang@gmail.com> # MMP2 maintainer
Cc: Aaro Koskinen <aaro.koskinen@iki.fi> # OMAP1 maintainer
Cc: Tony Lindgren <tony@atomide.com> # OMAP1,2,3 maintainer
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> # EM-X270 maintainer
Cc: Robert Jarzmik <robert.jarzmik@free.fr> # EZX maintainer
Cc: Philipp Zabel <philipp.zabel@gmail.com> # Magician maintainer
Cc: Petr Cvek <petr.cvek@tul.cz> # Magician
Cc: Robert Jarzmik <robert.jarzmik@free.fr> # PXA
Cc: Paul Parsons <lost.distance@yahoo.com> # hx4700
Cc: Daniel Mack <zonque@gmail.com> # Raumfeld maintainer
Cc: Marc Zyngier <marc.zyngier@arm.com> # Zeus maintainer
Cc: Geert Uytterhoeven <geert+renesas@glider.be> # SuperH pinctrl/GPIO maintainer
Cc: Russell King <rmk+kernel@armlinux.org.uk> # SA1100
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Janusz Krzysztofik <jmkrzyszt@gmail.com> #OMAP1 Amstrad Delta
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
2019-02-06 15:58:29 +00:00
Kairui Song ccec81e425 x86/kexec: Fill in acpi_rsdp_addr from the first kernel
When efi=noruntime or efi=oldmap is used on the kernel command line, EFI
services won't be available in the second kernel, therefore the second
kernel will not be able to get the ACPI RSDP address from firmware by
calling EFI services and so it won't boot.

Commit

  e6e094e053 ("x86/acpi, x86/boot: Take RSDP address from boot params if available")

added an acpi_rsdp_addr field to boot_params which stores the RSDP
address for other kernel users.

Recently, after

  3a63f70bf4 ("x86/boot: Early parse RSDP and save it in boot_params")

the acpi_rsdp_addr will always be filled with a valid RSDP address.

So fill in that value into the second kernel's boot_params thus ensuring
that the second kernel receives the RSDP value from the first kernel.

 [ bp: massage commit message. ]

Signed-off-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: kexec@lists.infradead.org
Cc: Philipp Rudo <prudo@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: Yannik Sembritzki <yannik@sembritzki.me>
Link: https://lkml.kernel.org/r/20190204173852.4863-1-kasong@redhat.com
2019-02-06 15:29:03 +01:00
Mathieu Poirier 840018668c perf/aux: Make perf_event accessible to setup_aux()
When pmu::setup_aux() is called the coresight PMU needs to know which
sink to use for the session by looking up the information in the
event's attr::config2 field.

As such simply replace the cpu information by the complete perf_event
structure and change all affected customers.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Reviewed-by: Suzuki Poulouse <suzuki.poulose@arm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-s390@vger.kernel.org
Link: http://lkml.kernel.org/r/20190131184714.20388-2-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06 10:00:39 -03:00
Borislav Petkov 82df8261c6 x86/boot: Fix randconfig build error due to MEMORY_HOTREMOVE
When building randconfigs, one of the failures is:

  ld: arch/x86/boot/compressed/kaslr.o: in function `choose_random_location':
  kaslr.c:(.text+0xbf7): undefined reference to `count_immovable_mem_regions'
  ld: kaslr.c:(.text+0xcbe): undefined reference to `immovable_mem'
  make[2]: *** [arch/x86/boot/compressed/vmlinux] Error 1

because CONFIG_ACPI is not enabled in this particular .config but
CONFIG_MEMORY_HOTREMOVE is and count_immovable_mem_regions() is
unresolvable because it is defined in compressed/acpi.c which is the
compilation unit that depends on CONFIG_ACPI.

Add CONFIG_ACPI to the explicit dependencies for MEMORY_HOTREMOVE.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20190205131033.9564-1-bp@alien8.de
2019-02-06 11:48:42 +01:00
Borislav Petkov 82f9ed3a93 x86/boot: Fix cmdline_find_option() prototype visibility
ac09c5f43c ("x86/boot: Build the command line parsing code unconditionally")

enabled building the command line parsing code unconditionally but it
forgot to remove the respective ifdeffery around the prototypes in the
misc.h header, leading to

  arch/x86/boot/compressed/acpi.c: In function ‘get_acpi_rsdp’:
  arch/x86/boot/compressed/acpi.c:37:8: warning: implicit declaration of function \
	  ‘cmdline_find_option’ [-Wimplicit-function-declaration]
    ret = cmdline_find_option("acpi_rsdp", val, MAX_ADDR_LEN);
          ^~~~~~~~~~~~~~~~~~~

for configs where neither CONFIG_EARLY_PRINTK nor CONFIG_RANDOMIZE_BASE
was defined.

Drop the ifdeffery in the header too.

Fixes: ac09c5f43c ("x86/boot: Build the command line parsing code unconditionally")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/5c51daf0.83pQEkvDZILqoSYW%lkp@intel.com
Link: https://lkml.kernel.org/r/20190205131352.GA27396@zn.tnic
2019-02-06 11:41:21 +01:00
Reinette Chatre 8ad382dd11 x86/resctrl: Remove duplicate MSR_MISC_FEATURE_CONTROL definition
The definition of MSR_MISC_FEATURE_CONTROL was first introduced in

  98af74599e ("x86 msr_index.h: Define MSR_MISC_FEATURE_CONTROL")

and present in Linux since v4.11.

The Cache Pseudo-Locking code added this duplicate definition in more
recent

  f2a177292b ("x86/intel_rdt: Discover supported platforms via prefetch disable bits"),

available since v4.19.

Remove the duplicate definition from the resctrl subsystem and let that
code obtain the needed definition from the core architecture msr-index.h
instead.

Fixes: f2a177292b ("x86/intel_rdt: Discover supported platforms via prefetch disable bits")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: gavin.hindman@intel.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: jithu.joseph@intel.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/ff6b95d9b6ef6f4ac96267f130719ba1af09614b.1549312475.git.reinette.chatre@intel.com
2019-02-05 07:21:56 +01:00
Kairui Song 278311e417 kexec, KEYS: Make use of platform keyring for signature verify
This patch allows the kexec_file_load syscall to verify the PE signed
kernel image signature based on the preboot keys stored in the .platform
keyring, as fall back, if the signature verification failed due to not
finding the public key in the secondary or builtin keyrings.

This commit adds a VERIFY_USE_PLATFORM_KEYRING similar to previous
VERIFY_USE_SECONDARY_KEYRING indicating that verify_pkcs7_signature
should verify the signature using platform keyring.  Also, decrease
the error message log level when verification failed with -ENOKEY,
so that if called tried multiple time with different keyring it
won't generate extra noises.

Signed-off-by: Kairui Song <kasong@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com> (for kexec_file_load part)
[zohar@linux.ibm.com: tweaked the first paragraph of the patch description,
 and fixed checkpatch warning.]
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2019-02-04 17:34:07 -05:00
Elena Reshetova 47b8f3ab9c refcount_t: Add ACQUIRE ordering on success for dec(sub)_and_test() variants
This adds an smp_acquire__after_ctrl_dep() barrier on successful
decrease of refcounter value from 1 to 0 for refcount_dec(sub)_and_test
variants and therefore gives stronger memory ordering guarantees than
prior versions of these functions.

Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: dvyukov@google.com
Cc: keescook@chromium.org
Cc: stern@rowland.harvard.edu
Link: https://lkml.kernel.org/r/1548847131-27854-2-git-send-email-elena.reshetova@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-04 09:03:31 +01:00
Ingo Molnar 98cb621081 Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-04 08:45:42 +01:00
Peter Zijlstra 602cae04c4 perf/x86/intel: Delay memory deallocation until x86_pmu_dead_cpu()
intel_pmu_cpu_prepare() allocated memory for ->shared_regs among other
members of struct cpu_hw_events. This memory is released in
intel_pmu_cpu_dying() which is wrong. The counterpart of the
intel_pmu_cpu_prepare() callback is x86_pmu_dead_cpu().

Otherwise if the CPU fails on the UP path between CPUHP_PERF_X86_PREPARE
and CPUHP_AP_PERF_X86_STARTING then it won't release the memory but
allocate new memory on the next attempt to online the CPU (leaking the
old memory).
Also, if the CPU down path fails between CPUHP_AP_PERF_X86_STARTING and
CPUHP_PERF_X86_PREPARE then the CPU will go back online but never
allocate the memory that was released in x86_pmu_dying_cpu().

Make the memory allocation/free symmetrical in regard to the CPU hotplug
notifier by moving the deallocation to intel_pmu_cpu_dead().

This started in commit:

   a7e3ed1e47 ("perf: Add support for supplementary event registers").

In principle the bug was introduced in v2.6.39 (!), but it will almost
certainly not backport cleanly across the big CPU hotplug rewrite between v4.7-v4.15...

[ bigeasy: Added patch description. ]
[ mingo: Added backporting guidance. ]

Reported-by: He Zhe <zhe.he@windriver.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> # With developer hat on
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> # With maintainer hat on
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: jolsa@kernel.org
Cc: kan.liang@linux.intel.com
Cc: namhyung@kernel.org
Cc: <stable@vger.kernel.org>
Fixes: a7e3ed1e47 ("perf: Add support for supplementary event registers").
Link: https://lkml.kernel.org/r/20181219165350.6s3jvyxbibpvlhtq@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-04 08:44:51 +01:00
Kan Liang 9e63a7894f perf/x86/intel/uncore: Add Node ID mask
Some PCI uncore PMUs cannot be registered on an 8-socket system (HPE
Superdome Flex).

To understand which Socket the PCI uncore PMUs belongs to, perf retrieves
the local Node ID of the uncore device from CPUNODEID(0xC0) of the PCI
configuration space, and the mapping between Socket ID and Node ID from
GIDNIDMAP(0xD4). The Socket ID can be calculated accordingly.

The local Node ID is only available at bit 2:0, but current code doesn't
mask it. If a BIOS doesn't clear the rest of the bits, an incorrect Node ID
will be fetched.

Filter the Node ID by adding a mask.

Reported-by: Song Liu <songliubraving@fb.com>
Tested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org> # v3.7+
Fixes: 7c94ee2e09 ("perf/x86: Add Intel Nehalem and Sandy Bridge-EP uncore support")
Link: https://lkml.kernel.org/r/1548600794-33162-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-04 08:44:43 +01:00
Ard Biesheuvel 69c1f396f2 efi/x86: Convert x86 EFI earlyprintk into generic earlycon implementation
Move the x86 EFI earlyprintk implementation to a shared location under
drivers/firmware and tweak it slightly so we can expose it as an earlycon
implementation (which is generic) rather than earlyprintk (which is only
implemented for a few architectures)

This also involves switching to write-combine mappings by default (which
is required on ARM since device mappings lack memory semantics, and so
memcpy/memset may not be used on them), and adding support for shared
memory framebuffers on cache coherent non-x86 systems (which do not
tolerate mismatched attributes).

Note that 32-bit ARM does not populate its struct screen_info early
enough for earlycon=efifb to work, so it is disabled there.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Alexander Graf <agraf@suse.de>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Jeffrey Hugo <jhugo@codeaurora.org>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20190202094119.13230-10-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-04 08:27:30 +01:00
Ard Biesheuvel ce9084ba0d x86: Make ARCH_USE_MEMREMAP_PROT a generic Kconfig symbol
Turn ARCH_USE_MEMREMAP_PROT into a generic Kconfig symbol, and fix the
dependency expression to reflect that AMD_MEM_ENCRYPT depends on it,
instead of the other way around. This will permit ARCH_USE_MEMREMAP_PROT
to be selected by other architectures.

Note that the encryption related early memremap routines in
arch/x86/mm/ioremap.c cannot be built for 32-bit x86 without triggering
the following warning:

     arch/x86//mm/ioremap.c: In function 'early_memremap_encrypted':
  >> arch/x86/include/asm/pgtable_types.h:193:27: warning: conversion from
                     'long long unsigned int' to 'long unsigned int' changes
                     value from '9223372036854776163' to '355' [-Woverflow]
      #define __PAGE_KERNEL_ENC (__PAGE_KERNEL | _PAGE_ENC)
                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~
     arch/x86//mm/ioremap.c:713:46: note: in expansion of macro '__PAGE_KERNEL_ENC'
       return early_memremap_prot(phys_addr, size, __PAGE_KERNEL_ENC);

which essentially means they are 64-bit only anyway. However, we cannot
make them dependent on CONFIG_ARCH_HAS_MEM_ENCRYPT, since that is always
defined, even for i386 (and changing that results in a slew of build errors)

So instead, build those routines only if CONFIG_AMD_MEM_ENCRYPT is
defined.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Alexander Graf <agraf@suse.de>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Jeffrey Hugo <jhugo@codeaurora.org>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20190202094119.13230-9-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-04 08:27:29 +01:00
Sai Praneeth Prakhya 8fe55212aa x86/efi: Mark can_free_region() as an __init function
can_free_region() is called only once during boot, by
efi_reserve_boot_services().

Hence, mark it as an __init function.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Alexander Graf <agraf@suse.de>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Jeffrey Hugo <jhugo@codeaurora.org>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20190202094119.13230-2-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-04 08:19:22 +01:00
Deepa Dinamani 2edfd8e061 arch: Use asm-generic/socket.h when possible
Many architectures maintain an arch specific copy of the
file even though there are no differences with the asm-generic
one. Allow these architectures to use the generic one instead.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Cc: chris@zankel.net
Cc: fenghua.yu@intel.com
Cc: tglx@linutronix.de
Cc: schwidefsky@de.ibm.com
Cc: linux-ia64@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: linux-s390@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03 11:17:30 -08:00
Linus Torvalds 24b888d8d5 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A few updates for x86:

   - Fix an unintended sign extension issue in the fault handling code

   - Rename the new resource control config switch so it's less
     confusing

   - Avoid setting up EFI info in kexec when the EFI runtime is
     disabled.

   - Fix the microcode version check in the AMD microcode loader so it
     only loads higher version numbers and never downgrades

   - Set EFER.LME in the 32bit trampoline before returning to long mode
     to handle older AMD/KVM behaviour properly.

   - Add Darren and Andy as x86/platform reviewers"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Avoid confusion over the new X86_RESCTRL config
  x86/kexec: Don't setup EFI info if EFI runtime is not enabled
  x86/microcode/amd: Don't falsely trick the late loading mechanism
  MAINTAINERS: Add Andy and Darren as arch/x86/platform/ reviewers
  x86/fault: Fix sign-extend unintended sign extension
  x86/boot/compressed/64: Set EFER.LME=1 in 32-bit trampoline before returning to long mode
  x86/cpu: Add Atom Tremont (Jacobsville)
2019-02-03 09:08:12 -08:00
Linus Torvalds cc6810e36b Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull cpu hotplug fixes from Thomas Gleixner:
 "Two fixes for the cpu hotplug machinery:

   - Replace the overly clever 'SMT disabled by BIOS' detection logic as
     it breaks KVM scenarios and prevents speculation control updates
     when the Hyperthreads are brought online late after boot.

   - Remove a redundant invocation of the speculation control update
     function"

* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM
  x86/speculation: Remove redundant arch_smt_update() invocation
2019-02-03 09:02:03 -08:00
Tony Luck d28af26faa x86/MCE: Initialize mce.bank in the case of a fatal error in mce_no_way_out()
Internal injection testing crashed with a console log that said:

  mce: [Hardware Error]: CPU 7: Machine Check Exception: f Bank 0: bd80000000100134

This caused a lot of head scratching because the MCACOD (bits 15:0) of
that status is a signature from an L1 data cache error. But Linux says
that it found it in "Bank 0", which on this model CPU only reports L1
instruction cache errors.

The answer was that Linux doesn't initialize "m->bank" in the case that
it finds a fatal error in the mce_no_way_out() pre-scan of banks. If
this was a local machine check, then this partially initialized struct
mce is being passed to mce_panic().

Fix is simple: just initialize m->bank in the case of a fatal error.

Fixes: 40c36e2741 ("x86/mce: Fix incorrect "Machine check from unknown source" message")
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: x86-ml <x86@kernel.org>
Cc: stable@vger.kernel.org # v4.18 Note pre-v5.0 arch/x86/kernel/cpu/mce/core.c was called arch/x86/kernel/cpu/mcheck/mce.c
Link: https://lkml.kernel.org/r/20190201003341.10638-1-tony.luck@intel.com
2019-02-03 13:24:24 +01:00
Yazen Ghannam 8a5dd2cd2f x86/MCE/AMD, EDAC/mce_amd: Add new error descriptions for some SMCA bank types
Some SMCA bank types on future systems will report new error types even
though the bank type is not treated as a new version. These new error
types will reported by bits that are reserved in past systems.

Add the new error descriptions to the lists in edac_mce_amd.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Shirish S <Shirish.S@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190201225534.8177-4-Yazen.Ghannam@amd.com
2019-02-03 13:05:16 +01:00
Yazen Ghannam 3ad7e748c1 x86/MCE/AMD, EDAC/mce_amd: Add new McaTypes for CS, PSP, and SMU units
The existing CS, PSP, and SMU SMCA bank types will see new versions (as
indicated by their McaTypes) in future SMCA systems.

Add the new (HWID, MCATYPE) tuples for these new versions. Reuse the
same names as the older versions, since they are logically the same to
the user. SMCA systems won't mix and match IP blocks with different
McaType versions in the same system, so there isn't a need to
distinguish them. The MCA_IPID register is saved when logging an MCA
error, and that can be used to triage the error.

Also, add the new error descriptions to edac_mce_amd. Some error types
(positions in the list) are overloaded compared to the previous
McaTypes. Therefore, just create new lists of the error descriptions to
keep things simple even if some of the error descriptions are the same
between versions.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Shirish S <Shirish.S@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190201225534.8177-3-Yazen.Ghannam@amd.com
2019-02-03 13:01:57 +01:00
Yazen Ghannam cbfa447edd x86/MCE/AMD, EDAC/mce_amd: Add new MP5, NBIO, and PCIE SMCA bank types
Add the (HWID, MCATYPE) tuples and names for the new MP5, NBIO, and
PCIE SMCA bank types.

Also, add their respective error descriptions to the MCE decoding module
edac_mce_amd.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Shirish S <Shirish.S@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190201225534.8177-2-Yazen.Ghannam@amd.com
2019-02-03 13:01:44 +01:00
Johannes Weiner e6d429313e x86/resctrl: Avoid confusion over the new X86_RESCTRL config
"Resource Control" is a very broad term for this CPU feature, and a term
that is also associated with containers, cgroups etc. This can easily
cause confusion.

Make the user prompt more specific. Match the config symbol name.

 [ bp: In the future, the corresponding ARM arch-specific code will be
   under ARM_CPU_RESCTRL and the arch-agnostic bits will be carved out
   under the CPU_RESCTRL umbrella symbol. ]

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Babu Moger <Babu.Moger@amd.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: linux-doc@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190130195621.GA30653@cmpxchg.org
2019-02-02 10:34:52 +01:00
Qian Cai a8e911d135 x86_64: increase stack size for KASAN_EXTRA
If the kernel is configured with KASAN_EXTRA, the stack size is
increasted significantly because this option sets "-fstack-reuse" to
"none" in GCC [1].  As a result, it triggers stack overrun quite often
with 32k stack size compiled using GCC 8.  For example, this reproducer

  https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise06.c

triggers a "corrupted stack end detected inside scheduler" very reliably
with CONFIG_SCHED_STACK_END_CHECK enabled.

There are just too many functions that could have a large stack with
KASAN_EXTRA due to large local variables that have been called over and
over again without being able to reuse the stacks.  Some noticiable ones
are

  size
  7648 shrink_page_list
  3584 xfs_rmap_convert
  3312 migrate_page_move_mapping
  3312 dev_ethtool
  3200 migrate_misplaced_transhuge_page
  3168 copy_process

There are other 49 functions are over 2k in size while compiling kernel
with "-Wframe-larger-than=" even with a related minimal config on this
machine.  Hence, it is too much work to change Makefiles for each object
to compile without "-fsanitize-address-use-after-scope" individually.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715#c23

Although there is a patch in GCC 9 to help the situation, GCC 9 probably
won't be released in a few months and then it probably take another
6-month to 1-year for all major distros to include it as a default.
Hence, the stack usage with KASAN_EXTRA can be revisited again in 2020
when GCC 9 is everywhere.  Until then, this patch will help users avoid
stack overrun.

This has already been fixed for arm64 for the same reason via
6e8830674e ("arm64: kasan: Increase stack size for KASAN_EXTRA").

Link: http://lkml.kernel.org/r/20190109215209.2903-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:23 -08:00
Kairui Song 2aa958c99c x86/kexec: Don't setup EFI info if EFI runtime is not enabled
Kexec-ing a kernel with "efi=noruntime" on the first kernel's command
line causes the following null pointer dereference:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
  #PF error: [normal kernel read fault]
  Call Trace:
   efi_runtime_map_copy+0x28/0x30
   bzImage64_load+0x688/0x872
   arch_kexec_kernel_image_load+0x6d/0x70
   kimage_file_alloc_init+0x13e/0x220
   __x64_sys_kexec_file_load+0x144/0x290
   do_syscall_64+0x55/0x1a0
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

Just skip the EFI info setup if EFI runtime services are not enabled.

 [ bp: Massage commit message. ]

Suggested-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: bhe@redhat.com
Cc: David Howells <dhowells@redhat.com>
Cc: erik.schmauss@intel.com
Cc: fanc.fnst@cn.fujitsu.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: kexec@lists.infradead.org
Cc: lenb@kernel.org
Cc: linux-acpi@vger.kernel.org
Cc: Philipp Rudo <prudo@linux.vnet.ibm.com>
Cc: rafael.j.wysocki@intel.com
Cc: robert.moore@intel.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: Yannik Sembritzki <yannik@sembritzki.me>
Link: https://lkml.kernel.org/r/20190118111310.29589-2-kasong@redhat.com
2019-02-01 18:18:54 +01:00
Linus Torvalds c228d294f2 x86: explicitly align IO accesses in memcpy_{to,from}io
In commit 170d13ca3a ("x86: re-introduce non-generic memcpy_{to,from}io")
I made our copy from IO space use a separate copy routine rather than
rely on the generic memcpy.  I did that because our generic memory copy
isn't actually well-defined when it comes to internal access ordering or
alignment, and will in fact depend on various CPUID flags.

In particular, the default memcpy() for a modern Intel CPU will
generally be just a "rep movsb", which works reasonably well for
medium-sized memory copies of regular RAM, since the CPU will turn it
into fairly optimized microcode.

However, for non-cached memory and IO, "rep movs" ends up being
horrendously slow and will just do the architectural "one byte at a
time" accesses implied by the movsb.

At the other end of the spectrum, if you _don't_ end up using the "rep
movsb" code, you'd likely fall back to the software copy, which does
overlapping accesses for the tail, and may copy things backwards.
Again, for regular memory that's fine, for IO memory not so much.

The thinking was that clearly nobody really cared (because things
worked), but some people had seen horrible performance due to the byte
accesses, so let's just revert back to our long ago version that dod
"rep movsl" for the bulk of the copy, and then fixed up the potentially
last few bytes of the tail with "movsw/b".

Interestingly (and perhaps not entirely surprisingly), while that was
our original memory copy implementation, and had been used before for
IO, in the meantime many new users of memcpy_*io() had come about.  And
while the access patterns for the memory copy weren't well-defined (so
arguably _any_ access pattern should work), in practice the "rep movsb"
case had been very common for the last several years.

In particular Jarkko Sakkinen reported that the memcpy_*io() change
resuled in weird errors from his Geminilake NUC TPM module.

And it turns out that the TPM TCG accesses according to spec require
that the accesses be

 (a) done strictly sequentially

 (b) be naturally aligned

otherwise the TPM chip will abort the PCI transaction.

And, in fact, the tpm_crb.c driver did this:

	memcpy_fromio(buf, priv->rsp, 6);
	...
	memcpy_fromio(&buf[6], &priv->rsp[6], expected - 6);

which really should never have worked in the first place, but back
before commit 170d13ca3a it *happened* to work, because the
memcpy_fromio() would be expanded to a regular memcpy, and

 (a) gcc would expand the first memcpy in-line, and turn it into a
     4-byte and a 2-byte read, and they happened to be in the right
     order, and the alignment was right.

 (b) gcc would call "memcpy()" for the second one, and the machines that
     had this TPM chip also apparently ended up always having ERMS
     ("Enhanced REP MOVSB/STOSB instructions"), so we'd use the "rep
     movbs" for that copy.

In other words, basically by pure luck, the code happened to use the
right access sizes in the (two different!) memcpy() implementations to
make it all work.

But after commit 170d13ca3a, both of the memcpy_fromio() calls
resulted in a call to the routine with the consistent memory accesses,
and in both cases it started out transferring with 4-byte accesses.
Which worked for the first copy, but resulted in the second copy doing a
32-bit read at an address that was only 2-byte aligned.

Jarkko is actually fixing the fragile code in the TPM driver, but since
this is an excellent example of why we absolutely must not use a generic
memcpy for IO accesses, _and_ an IO-specific one really should strive to
align the IO accesses, let's do exactly that.

Side note: Jarkko also noted that the driver had been used on ARM
platforms, and had worked.  That was because on 32-bit ARM, memcpy_*io()
ends up always doing byte accesses, and on 64-bit ARM it first does byte
accesses to align to 8-byte boundaries, and then does 8-byte accesses
for the bulk.

So ARM actually worked by design, and the x86 case worked by pure luck.

We *might* want to make x86-64 do the 8-byte case too.  That should be a
pretty straightforward extension, but let's do one thing at a time.  And
generally MMIO accesses aren't really all that performance-critical, as
shown by the fact that for a long time we just did them a byte at a
time, and very few people ever noticed.

Reported-and-tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Jerry Snitselaar <jsnitsel@redhat.com>
Cc: David Laight <David.Laight@aculab.com>
Fixes: 170d13ca3a ("x86: re-introduce non-generic memcpy_{to,from}io")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 09:07:48 -08:00
Chao Fan 690eaa5320 x86/boot/KASLR: Limit KASLR to extract the kernel in immovable memory only
KASLR may randomly choose a range which is located in movable memory
regions. As a result, this will break memory hotplug and make the
movable memory chosen by KASLR immovable.

Therefore, limit KASLR to choose memory regions in the immovable range
after consulting the SRAT table.

 [ bp:
    - Rewrite commit message.
    - Trim comments.
 ]

Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: caoj.fnst@cn.fujitsu.com
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: indou.takao@jp.fujitsu.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: kasong@redhat.com
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: msys.mizuma@gmail.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190123110850.12433-8-fanc.fnst@cn.fujitsu.com
2019-02-01 11:52:55 +01:00
Chao Fan 02a3e3cdb7 x86/boot: Parse SRAT table and count immovable memory regions
Parse SRAT for the immovable memory regions and use that information to
control which offset KASLR selects so that it doesn't overlap with any
movable region.

 [ bp:
   - Move struct mem_vector where it is visible so that it builds.
   - Correct comments.
   - Rewrite commit message.
   ]

Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: <caoj.fnst@cn.fujitsu.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <indou.takao@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: <kasong@redhat.com>
Cc: <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <msys.mizuma@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190123110850.12433-7-fanc.fnst@cn.fujitsu.com
2019-02-01 11:52:55 +01:00
Chao Fan 3a63f70bf4 x86/boot: Early parse RSDP and save it in boot_params
The RSDP is needed by KASLR so parse it early and save it in
boot_params.acpi_rsdp_addr, before KASLR setup runs.

RSDP is needed by other kernel facilities so have the parsing code
built-in instead of a long "depends on" line in Kconfig.

 [ bp:
    - Trim commit message and comments
    - Add CONFIG_ACPI dependency in the Makefile
    - Move ->acpi_rsdp_addr assignment with the rest of boot_params massaging in extract_kernel().
 ]

Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: bhe@redhat.com
Cc: Cao jin <caoj.fnst@cn.fujitsu.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: indou.takao@jp.fujitsu.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: kasong@redhat.com
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: msys.mizuma@gmail.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190123110850.12433-6-fanc.fnst@cn.fujitsu.com
2019-02-01 11:52:55 +01:00
Chao Fan 93a209aaaa x86/boot: Search for RSDP in memory
Scan memory (EBDA) for the RSDP and verify RSDP by signature and
checksum.

 [ bp:
   - Trim commit message.
   - Simplify bios_get_rsdp_addr() and cleanup mad casting.
 ]

Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: bhe@redhat.com
Cc: caoj.fnst@cn.fujitsu.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: indou.takao@jp.fujitsu.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: kasong@redhat.com
Cc: Kees Cook <keescook@chromium.org>
Cc: msys.mizuma@gmail.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190123110850.12433-5-fanc.fnst@cn.fujitsu.com
2019-02-01 11:52:55 +01:00
Chao Fan 33f0df8d84 x86/boot: Search for RSDP in the EFI tables
The immovable memory ranges information in the SRAT table is necessary
to fix the issue of KASLR not paying attention to movable memory regions
when selecting the offset. Therefore, SRAT needs to be parsed.

Depending on the boot: KEXEC/EFI/BIOS, the methods to compute RSDP are
different. When booting from EFI, the EFI table points to the RSDP. So
iterate over the EFI system tables in order to find the RSDP.

 [ bp:
   - Heavily massage commit message
   - Trim comments
   - Move the CONFIG_ACPI ifdeffery into the Makefile.
 ]

Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: bhe@redhat.com
Cc: caoj.fnst@cn.fujitsu.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: indou.takao@jp.fujitsu.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: kasong@redhat.com
Cc: Kees Cook <keescook@chromium.org>
Cc: msys.mizuma@gmail.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190123110850.12433-4-fanc.fnst@cn.fujitsu.com
2019-02-01 11:52:54 +01:00
Chao Fan 3c98e71b42 x86/boot: Add "acpi_rsdp=" early parsing
KASLR may randomly choose offsets which are located in movable memory
regions resulting in the movable memory becoming immovable.

The ACPI SRAT (System/Static Resource Affinity Table) describes memory
ranges including ranges of memory provided by hot-added memory devices.
In order to access SRAT, one needs the Root System Description Pointer
(RSDP) with which to find the Root/Extended System Description Table
(R/XSDT) which then contains the system description tables of which SRAT
is one of.

In case the RSDP address has been passed on the command line (kexec-ing
a second kernel) parse it from there.

 [ bp: Rewrite the commit message and cleanup the code. ]

Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: bhe@redhat.com
Cc: caoj.fnst@cn.fujitsu.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: indou.takao@jp.fujitsu.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: kasong@redhat.com
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: msys.mizuma@gmail.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190123110850.12433-3-fanc.fnst@cn.fujitsu.com
2019-02-01 11:52:54 +01:00
Chao Fan de50ce20cd x86/boot: Copy kstrtoull() to boot/string.c
Copy kstrtoull() and the other necessary functions from lib/kstrtox.c
to boot/string.c so that code in boot/ can use kstrtoull() and the old
simple_strtoull() can gradually be phased out.

Using div_u64() from math64.h directly will cause the dividend to be
handled as a 64-bit value and cause the infamous __divdi3 linker error
due to gcc trying to use its library function for the 64-bit division.

Therefore, separate the dividend into an upper and lower part.

 [ bp: Rewrite commit message. ]

Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: bhe@redhat.com
Cc: caoj.fnst@cn.fujitsu.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: indou.takao@jp.fujitsu.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: kasong@redhat.com
Cc: Kees Cook <keescook@chromium.org>
Cc: msys.mizuma@gmail.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190123110850.12433-2-fanc.fnst@cn.fujitsu.com
2019-02-01 11:52:54 +01:00
Borislav Petkov ac09c5f43c x86/boot: Build the command line parsing code unconditionally
Just drop the three-item ifdeffery and build it in unconditionally.
Early cmdline parsing is needed more often than not.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: bhe@redhat.com
Cc: hpa@zytor.com
Cc: indou.takao@jp.fujitsu.com
Cc: kasong@redhat.com
Cc: keescook@chromium.org
Cc: mingo@redhat.com
Cc: msys.mizuma@gmail.com
Cc: tglx@linutronix.de
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20190130112238.GB18383@zn.tnic
2019-02-01 11:51:01 +01:00
Thomas Lendacky 912139cfbf x86/microcode/amd: Don't falsely trick the late loading mechanism
The load_microcode_amd() function searches for microcode patches and
attempts to apply a microcode patch if it is of different level than the
currently installed level.

While the processor won't actually load a level that is less than
what is already installed, the logic wrongly returns UCODE_NEW thus
signaling to its caller reload_store() that a late loading should be
attempted.

If the file-system contains an older microcode revision than what is
currently running, such a late microcode reload can result in these
misleading messages:

  x86/CPU: CPU features have changed after loading microcode, but might not take effect.
  x86/CPU: Please consider either early loading through initrd/built-in or a potential BIOS update.

These messages were issued on a system where SME/SEV are not
enabled by the BIOS (MSR C001_0010[23] = 0b) because during boot,
early_detect_mem_encrypt() is called and cleared the SME and SEV
features in this case.

However, after the wrong late load attempt, get_cpu_cap() is called and
reloads the SME and SEV feature bits, resulting in the messages.

Update the microcode level check to not attempt microcode loading if the
current level is greater than(!) and not only equal to the current patch
level.

 [ bp: massage commit message. ]

Fixes: 2613f36ed9 ("x86/microcode: Attempt late loading only when new microcode is present")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/154894518427.9406.8246222496874202773.stgit@tlendack-t1.amdoffice.net
2019-01-31 16:54:32 +01:00
Josh Poimboeuf b284909aba cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM
With the following commit:

  73d5e2b472 ("cpu/hotplug: detect SMT disabled by BIOS")

... the hotplug code attempted to detect when SMT was disabled by BIOS,
in which case it reported SMT as permanently disabled.  However, that
code broke a virt hotplug scenario, where the guest is booted with only
primary CPU threads, and a sibling is brought online later.

The problem is that there doesn't seem to be a way to reliably
distinguish between the HW "SMT disabled by BIOS" case and the virt
"sibling not yet brought online" case.  So the above-mentioned commit
was a bit misguided, as it permanently disabled SMT for both cases,
preventing future virt sibling hotplugs.

Going back and reviewing the original problems which were attempted to
be solved by that commit, when SMT was disabled in BIOS:

  1) /sys/devices/system/cpu/smt/control showed "on" instead of
     "notsupported"; and

  2) vmx_vm_init() was incorrectly showing the L1TF_MSG_SMT warning.

I'd propose that we instead consider #1 above to not actually be a
problem.  Because, at least in the virt case, it's possible that SMT
wasn't disabled by BIOS and a sibling thread could be brought online
later.  So it makes sense to just always default the smt control to "on"
to allow for that possibility (assuming cpuid indicates that the CPU
supports SMT).

The real problem is #2, which has a simple fix: change vmx_vm_init() to
query the actual current SMT state -- i.e., whether any siblings are
currently online -- instead of looking at the SMT "control" sysfs value.

So fix it by:

  a) reverting the original "fix" and its followup fix:

     73d5e2b472 ("cpu/hotplug: detect SMT disabled by BIOS")
     bc2d8d262c ("cpu/hotplug: Fix SMT supported evaluation")

     and

  b) changing vmx_vm_init() to query the actual current SMT state --
     instead of the sysfs control value -- to determine whether the L1TF
     warning is needed.  This also requires the 'sched_smt_present'
     variable to exported, instead of 'cpu_smt_control'.

Fixes: 73d5e2b472 ("cpu/hotplug: detect SMT disabled by BIOS")
Reported-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Joe Mario <jmario@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/e3a85d585da28cc333ecbc1e78ee9216e6da9396.1548794349.git.jpoimboe@redhat.com
2019-01-30 19:27:00 +01:00
Jiri Slaby 5a064d398f x86/asm/suspend: Drop ENTRY from local data
ENTRY is intended for functions and shall be paired with ENDPROC. ENTRY
also aligns symbols which creates unnecessary holes between data.

So drop ENTRY from saved_eip in wakeup_32 and many saved_* in wakeup_64,
as these symbols are local only.

One could've used SYM_DATA_LOCAL for these symbols, but it was
discouraged earlier:

  https://lkml.kernel.org/r/20170427124310.GC23352@amd

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190130124711.12463-3-jslaby@suse.cz
2019-01-30 16:07:10 +01:00
Borislav Petkov fab940755d x86/hw_breakpoints, kprobes: Remove kprobes ifdeffery
Remove the ifdeffery in the breakpoint parsing arch_build_bp_info() by
adding a within_kprobe_blacklist() stub for the !CONFIG_KPROBES case.

It is returning true when kprobes are not enabled to mean that any
address is within the kprobes blacklist on such kernels and thus not
allow kernel breakpoints on non-kprobes kernels.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190127131237.4557-1-bp@alien8.de
2019-01-30 11:52:21 +01:00
David S. Miller eaf2a47f40 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-01-29 21:18:54 -08:00
Waiman Long 71368af902 x86/speculation: Add PR_SPEC_DISABLE_NOEXEC
With the default SPEC_STORE_BYPASS_SECCOMP/SPEC_STORE_BYPASS_PRCTL mode,
the TIF_SSBD bit will be inherited when a new task is fork'ed or cloned.
It will also remain when a new program is execve'ed.

Only certain class of applications (like Java) that can run on behalf of
multiple users on a single thread will require disabling speculative store
bypass for security purposes. Those applications will call prctl(2) at
startup time to disable SSB. They won't rely on the fact the SSB might have
been disabled. Other applications that don't need SSBD will just move on
without checking if SSBD has been turned on or not.

The fact that the TIF_SSBD is inherited across execve(2) boundary will
cause performance of applications that don't need SSBD but their
predecessors have SSBD on to be unwittingly impacted especially if they
write to memory a lot.

To remedy this problem, a new PR_SPEC_DISABLE_NOEXEC argument for the
PR_SET_SPECULATION_CTRL option of prctl(2) is added to allow applications
to specify that the SSBD feature bit on the task structure should be
cleared whenever a new program is being execve'ed.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: KarimAllah Ahmed <karahmed@amazon.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: https://lkml.kernel.org/r/1547676096-3281-1-git-send-email-longman@redhat.com
2019-01-29 22:11:49 +01:00
Cao jin 0a278662f5 x86/boot: Save several bytes in decompressor
gdt64 represents the content of GDTR under x86-64, which actually needs
10 bytes only, ".long" & ".word" is superfluous.

Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: <bp@alien8.de>
Cc: <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20190123100014.23721-1-caoj.fnst@cn.fujitsu.com
2019-01-29 22:09:12 +01:00
Pingfan Liu 439fbdf6a2 x86/trap: Remove useless declaration
There is no early_trap_pf_init() implementation, hence remove this useless
declaration.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/1546591579-23502-1-git-send-email-kernelfans@gmail.com
2019-01-29 22:09:12 +01:00
Colin Ian King 5ccd35287e x86/fault: Fix sign-extend unintended sign extension
show_ldttss() shifts desc.base2 by 24 bit, but base2 is 8 bits of a
bitfield in a u16.

Due to the really great idea of integer promotion in C99 base2 is promoted
to an int, because that's the standard defined behaviour when all values
which can be represented by base2 fit into an int.

Now if bit 7 is set in desc.base2 the result of the shift left by 24 makes
the resulting integer negative and the following conversion to unsigned
long legitmately sign extends first causing the upper bits 32 bits to be
set in the result.

Fix this by casting desc.base2 to unsigned long before the shift.

Detected by CoverityScan, CID#1475635 ("Unintended sign extension")

[ tglx: Reworded the changelog a bit as I actually had to lookup
  	the standard (again) to decode the original one. ]

Fixes: a1a371c468 ("x86/fault: Decode page fault OOPSes better")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: kernel-janitors@vger.kernel.org
Link: https://lkml.kernel.org/r/20181222191116.21831-1-colin.king@canonical.com
2019-01-29 21:58:59 +01:00
Wei Huang b677dfae5a x86/boot/compressed/64: Set EFER.LME=1 in 32-bit trampoline before returning to long mode
In some old AMD KVM implementation, guest's EFER.LME bit is cleared by KVM
when the hypervsior detects that the guest sets CR0.PG to 0. This causes
the guest OS to reboot when it tries to return from 32-bit trampoline code
because the CPU is in incorrect state: CR4.PAE=1, CR0.PG=1, CS.L=1, but
EFER.LME=0.  As a precaution, set EFER.LME=1 as part of long mode
activation procedure. This extra step won't cause any harm when Linux is
booted on a bare-metal machine.

Signed-off-by: Wei Huang <wei@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/20190104054411.12489-1-wei@redhat.com
2019-01-29 21:58:59 +01:00
Shaokun Zhang 691b9ab6c9 x86/mm/tlb: Remove unused cpu variable
The "cpu" local variable became unused after

  a2055abe9c ("x86/mm: Pass flush_tlb_info to flush_tlb_others() etc").

Remove it.

Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/1548747417-33551-1-git-send-email-zhangshaokun@hisilicon.com
2019-01-29 18:32:30 +01:00
Kan Liang 00ae831dfe x86/cpu: Add Atom Tremont (Jacobsville)
Add the Atom Tremont model number to the Intel family list.

[ Tony: Also update comment at head of file to say "_X" suffix is
  also used for microserver parts. ]

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Cc: Megha Dey <megha.dey@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Rajneesh Bhardwaj <rajneesh.bhardwaj@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190125195902.17109-4-tony.luck@intel.com
2019-01-29 16:37:35 +01:00
Gustavo A. R. Silva 2b0fc3742b x86/events: Mark expected switch-case fall-throughs
In preparation to enable -Wimplicit-fallthrough by default, mark
switch-case statements where fall-through is intentional, explicitly in
order to fix a couple of -Wimplicit-fallthrough warnings.

Warning level 3 was used: -Wimplicit-fallthrough=3.

 [ bp: Massasge and trim commit message. ]

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jacek Tomaka <jacek.tomaka@poczta.fm>
Cc: Jia Zhang <qianyue.zj@alibaba-inc.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190125184917.GA7289@embeddedor
2019-01-29 16:23:47 +01:00
David S. Miller ec7146db15 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2019-01-29

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Teach verifier dead code removal, this also allows for optimizing /
   removing conditional branches around dead code and to shrink the
   resulting image. Code store constrained architectures like nfp would
   have hard time doing this at JIT level, from Jakub.

2) Add JMP32 instructions to BPF ISA in order to allow for optimizing
   code generation for 32-bit sub-registers. Evaluation shows that this
   can result in code reduction of ~5-20% compared to 64 bit-only code
   generation. Also add implementation for most JITs, from Jiong.

3) Add support for __int128 types in BTF which is also needed for
   vmlinux's BTF conversion to work, from Yonghong.

4) Add a new command to bpftool in order to dump a list of BPF-related
   parameters from the system or for a specific network device e.g. in
   terms of available prog/map types or helper functions, from Quentin.

5) Add AF_XDP sock_diag interface for querying sockets from user
   space which provides information about the RX/TX/fill/completion
   rings, umem, memory usage etc, from Björn.

6) Add skb context access for skb_shared_info->gso_segs field, from Eric.

7) Add support for testing flow dissector BPF programs by extending
   existing BPF_PROG_TEST_RUN infrastructure, from Stanislav.

8) Split BPF kselftest's test_verifier into various subgroups of tests
   in order better deal with merge conflicts in this area, from Jakub.

9) Add support for queue/stack manipulations in bpftool, from Stanislav.

10) Document BTF, from Yonghong.

11) Dump supported ELF section names in libbpf on program load
    failure, from Taeung.

12) Silence a false positive compiler warning in verifier's BTF
    handling, from Peter.

13) Fix help string in bpftool's feature probing, from Prashant.

14) Remove duplicate includes in BPF kselftests, from Yue.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-28 19:38:33 -08:00
Brajeswar Ghosh fc5014cc55 x86/asm-prototypes: Remove duplicate include <asm/page.h>
Remove asm/page.h which is included more than once.

Signed-off-by: Brajeswar Ghosh <brajeswar.linux@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: jrdr.linux@gmail.com
Cc: sabyasachi.linux@gmail.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190128161847.GA2318@hp-pavilion-15-notebook-pc-brajeswar
2019-01-28 18:32:34 +01:00
Masahiro Yamada afa974b771 kbuild: add real-prereqs shorthand for $(filter-out FORCE,$^)
In Kbuild, if_changed and friends must have FORCE as a prerequisite.

Hence, $(filter-out FORCE,$^) or $(filter-out $(PHONY),$^) is a common
idiom to get the names of all the prerequisites except phony targets.

Add real-prereqs as a shorthand.

Note:
We cannot replace $(filter %.o,$^) in cmd_link_multi-m because $^ may
include auto-generated dependencies from the .*.cmd file when a single
object module is changed into a multi object module. Refer to commit
69ea912fda ("kbuild: remove unneeded link_multi_deps"). I added some
comment to avoid accidental breakage.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Rob Herring <robh@kernel.org>
2019-01-28 09:11:17 +09:00
Linus Torvalds 8a5f06056a Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A set of fixes for x86:

   - Fix the swapped outb() parameters in the KASLR code

   - Fix the PKEY handling at fork which missed to preserve the pkey
     state for the child. Comes with a test case to validate that.

   - Fix the entry stack handling for XEN PV to respect that XEN PV
     systems enter the function already on the current thread stack and
     not on the trampoline.

   - Fix kexec load failure caused by using a stale value when the
     kexec_buf structure is reused for subsequent allocations.

   - Fix a bogus sizeof() in the memory encryption code

   - Enforce PCI dependency for the Intel Low Power Subsystem

   - Enforce PCI_LOCKLESS_CONFIG when PCI is enabled"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/Kconfig: Select PCI_LOCKLESS_CONFIG if PCI is enabled
  x86/entry/64/compat: Fix stack switching for XEN PV
  x86/kexec: Fix a kexec_file_load() failure
  x86/mm/mem_encrypt: Fix erroneous sizeof()
  x86/selftests/pkeys: Fork() to check for state being preserved
  x86/pkeys: Properly copy pkey state at fork()
  x86/kaslr: Fix incorrect i8254 outb() parameters
  x86/intel/lpss: Make PCI dependency explicit
2019-01-27 12:02:00 -08:00
Linus Torvalds 351e1aa6cb Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 timer fixes from Thomas Gleixner:
 "Two commits which were missed to be sent during the merge window.

   - The TSC calibration fix turns out to be more urgent as recent
     Skylake-X systems seem to have massive trouble with calibration
     disturbance. This should go back into stable for that reason and it
     the risk of breakage is rather low.

   - Drop an unused define"

* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/hpet: Remove unused FSEC_PER_NSEC define
  x86/tsc: Make calibration refinement more robust
2019-01-27 11:57:46 -08:00
Linus Torvalds 1fc7f56db7 Quite a few fixes for x86: nested virtualization save/restore, AMD nested virtualization
and virtual APIC, 32-bit fixes, an important fix to restore operation on older
 processors, and a bunch of hyper-v bugfixes.  Several are marked stable.
 
 There are also fixes for GCC warnings and for a GCC/objtool interaction.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJcTBqRAAoJEL/70l94x66DF8wH/0LKWQd4ay54OgpfdPRcUqXa
 yW10qqnO5OZfpI19UU2rfv2QXrdSOnDafgN6xiQ2Dz7m3JsZB7gBDsCCYKzeUgCz
 YB0UVcX1ZS+gr3igDDHIw3lWPBUqDIzKmEJO++9nAbDi4gOmWaPQ8vWrfORWAZcl
 yx2nCjeljjbO65UdRTdr3TkUNbpFlJ2NEUrzzco8OgChNB9QoxLTSJHrZxeZ7dNn
 J/ZDAaBwRxXN/aKH0A3+pwUFrP5nGuronT6nGo1048WWrlQzdMp7qh8fPtTBvWJ4
 uqUrrYc7jY/EhfZ4k/aAUGkAdt4IZI1KyHjhqtmB9zf+hezphUJv66QYQGVTFts=
 =yUth
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Quite a few fixes for x86: nested virtualization save/restore, AMD
  nested virtualization and virtual APIC, 32-bit fixes, an important fix
  to restore operation on older processors, and a bunch of hyper-v
  bugfixes. Several are marked stable.

  There are also fixes for GCC warnings and for a GCC/objtool interaction"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Mark expected switch fall-throughs
  KVM: x86: fix TRACE_INCLUDE_PATH and remove -I. header search paths
  KVM: selftests: check returned evmcs version range
  x86/kvm/hyper-v: nested_enable_evmcs() sets vmcs_version incorrectly
  KVM: VMX: Move vmx_vcpu_run()'s VM-Enter asm blob to a helper function
  kvm: selftests: Fix region overlap check in kvm_util
  kvm: vmx: fix some -Wmissing-prototypes warnings
  KVM: nSVM: clear events pending from svm_complete_interrupts() when exiting to L1
  svm: Fix AVIC incomplete IPI emulation
  svm: Add warning message for AVIC IPI invalid target
  KVM: x86: WARN_ONCE if sending a PV IPI returns a fatal error
  KVM: x86: Fix PV IPIs for 32-bit KVM host
  x86/kvm/hyper-v: recommend using eVMCS only when it is enabled
  x86/kvm/hyper-v: don't recommend doing reset via synthetic MSR
  kvm: x86/vmx: Use kzalloc for cached_vmcs12
  KVM: VMX: Use the correct field var when clearing VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL
  KVM: x86: Fix single-step debugging
  x86/kvm/hyper-v: don't announce GUEST IDLE MSR support
2019-01-27 09:21:00 -08:00
Jiong Wang 69f827eb6e x32: bpf: implement jitting of JMP32
This patch implements code-gen for new JMP32 instructions on x32.
Also fixed several reverse xmas tree coding style issues as I am there.

Cc: Wang YanQing <udknight@gmail.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-26 13:33:02 -08:00
Jiong Wang 3f5d6525f2 x86_64: bpf: implement jitting of JMP32
This patch implements code-gen for new JMP32 instructions on x86_64.

Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-26 13:33:01 -08:00
Gustavo A. R. Silva 6fcebf1302 x86/kernel: Mark expected switch-case fall-throughs
In preparation to enable -Wimplicit-fallthrough by default, mark
switch-case statements where fall-through is intentional, explicitly in
order to fix a couple of -Wimplicit-fallthrough warnings.

Warning level 3 was used: -Wimplicit-fallthrough=3.

 [ bp: Massasge and trim commit message. ]

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: David Wang <davidwang@zhaoxin.com>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190125183903.GA4712@embeddedor
2019-01-26 11:19:13 +01:00
Gustavo A. R. Silva 89da344629 x86/insn-eval: Mark expected switch-case fall-through
In preparation to enable -Wimplicit-fallthrough by default, mark
switch-case statements where fall-through is intentional, explicitly.

Thus fix the following warning:

  arch/x86/lib/insn-eval.c: In function ‘resolve_default_seg’:
  arch/x86/lib/insn-eval.c:179:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
     if (insn->addr_bytes == 2)
        ^
  arch/x86/lib/insn-eval.c:182:2: note: here
    case -EDOM:
    ^~~~

Warning level 3 was used: -Wimplicit-fallthrough=3

This is part of the ongoing efforts to enable -Wimplicit-fallthrough by
default.

 [ bp: Massage commit message. ]

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190125205520.GA9602@embeddedor
2019-01-26 10:46:42 +01:00
Gustavo A. R. Silva b2869f28e1 KVM: x86: Mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.

This patch fixes the following warnings:

arch/x86/kvm/lapic.c:1037:27: warning: this statement may fall through [-Wimplicit-fallthrough=]
arch/x86/kvm/lapic.c:1876:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
arch/x86/kvm/hyperv.c:1637:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
arch/x86/kvm/svm.c:4396:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
arch/x86/kvm/mmu.c:4372:36: warning: this statement may fall through [-Wimplicit-fallthrough=]
arch/x86/kvm/x86.c:3835:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
arch/x86/kvm/x86.c:7938:23: warning: this statement may fall through [-Wimplicit-fallthrough=]
arch/x86/kvm/vmx/vmx.c:2015:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
arch/x86/kvm/vmx/vmx.c:1773:6: warning: this statement may fall through [-Wimplicit-fallthrough=]

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enabling -Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-25 19:29:36 +01:00
Masahiro Yamada 5cd5548ff4 KVM: x86: fix TRACE_INCLUDE_PATH and remove -I. header search paths
The header search path -I. in kernel Makefiles is very suspicious;
it allows the compiler to search for headers in the top of $(srctree),
where obviously no header file exists.

The reason of having -I. here is to make the incorrectly set
TRACE_INCLUDE_PATH working.

As the comment block in include/trace/define_trace.h says,
TRACE_INCLUDE_PATH should be a relative path to the define_trace.h

Fix the TRACE_INCLUDE_PATH, and remove the iffy include paths.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-25 19:12:37 +01:00
Vitaly Kuznetsov 3a2f5773ba x86/kvm/hyper-v: nested_enable_evmcs() sets vmcs_version incorrectly
Commit e2e871ab2f ("x86/kvm/hyper-v: Introduce nested_get_evmcs_version()
helper") broke EVMCS enablement: to set vmcs_version we now call
nested_get_evmcs_version() but this function checks
enlightened_vmcs_enabled flag which is not yet set so we end up returning
zero.

Fix the issue by re-arranging things in nested_enable_evmcs().

Fixes: e2e871ab2f ("x86/kvm/hyper-v: Introduce nested_get_evmcs_version() helper")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-25 19:11:37 +01:00
Sean Christopherson 5ad6ece869 KVM: VMX: Move vmx_vcpu_run()'s VM-Enter asm blob to a helper function
...along with the function's STACK_FRAME_NON_STANDARD tag.  Moving the
asm blob results in a significantly smaller amount of code that is
marked with STACK_FRAME_NON_STANDARD, which makes it far less likely
that gcc will split the function and trigger a spurious objtool warning.
As a bonus, removing STACK_FRAME_NON_STANDARD from vmx_vcpu_run() allows
the bulk of code to be properly checked by objtool.

Because %rbp is not loaded via VMCS fields, vmx_vcpu_run() must manually
save/restore the host's RBP and load the guest's RBP prior to calling
vmx_vmenter().  Modifying %rbp triggers objtool's stack validation code,
and so vmx_vcpu_run() is tagged with STACK_FRAME_NON_STANDARD since it's
impossible to avoid modifying %rbp.

Unfortunately, vmx_vcpu_run() is also a gigantic function that gcc will
split into separate functions, e.g. so that pieces of the function can
be inlined.  Splitting the function means that the compiled Elf file
will contain one or more vmx_vcpu_run.part.* functions in addition to
a vmx_vcpu_run function.  Depending on where the function is split,
objtool may warn about a "call without frame pointer save/setup" in
vmx_vcpu_run.part.* since objtool's stack validation looks for exact
names when whitelisting functions tagged with STACK_FRAME_NON_STANDARD.

Up until recently, the undesirable function splitting was effectively
blocked because vmx_vcpu_run() was tagged with __noclone.  At the time,
__noclone had an unintended side effect that put vmx_vcpu_run() into a
separate optimization unit, which in turn prevented gcc from inlining
the function (or any of its own function calls) and thus eliminated gcc's
motivation to split the function.  Removing the __noclone attribute
allowed gcc to optimize vmx_vcpu_run(), exposing the objtool warning.

Kudos to Qian Cai for root causing that the fnsplit optimization is what
caused objtool to complain.

Fixes: 453eafbe65 ("KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines")
Tested-by: Qian Cai <cai@lca.pw>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-25 19:11:37 +01:00
Yi Wang 8997f65700 kvm: vmx: fix some -Wmissing-prototypes warnings
We get some warnings when building kernel with W=1:
arch/x86/kvm/vmx/vmx.c:426:5: warning: no previous prototype for ‘kvm_fill_hv_flush_list_func’ [-Wmissing-prototypes]
arch/x86/kvm/vmx/nested.c:58:6: warning: no previous prototype for ‘init_vmcs_shadow_fields’ [-Wmissing-prototypes]

Make them static to fix this.

Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-25 19:11:35 +01:00
Vitaly Kuznetsov 619ad846fc KVM: nSVM: clear events pending from svm_complete_interrupts() when exiting to L1
kvm-unit-tests' eventinj "NMI failing on IDT" test results in NMI being
delivered to the host (L1) when it's running nested. The problem seems to
be: svm_complete_interrupts() raises 'nmi_injected' flag but later we
decide to reflect EXIT_NPF to L1. The flag remains pending and we do NMI
injection upon entry so it got delivered to L1 instead of L2.

It seems that VMX code solves the same issue in prepare_vmcs12(), this was
introduced with code refactoring in commit 5f3d579997 ("KVM: nVMX: Rework
event injection and recovery").

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-25 19:11:35 +01:00
Suravee Suthikulpanit bb218fbcfa svm: Fix AVIC incomplete IPI emulation
In case of incomplete IPI with invalid interrupt type, the current
SVM driver does not properly emulate the IPI, and fails to boot
FreeBSD guests with multiple vcpus when enabling AVIC.

Fix this by update APIC ICR high/low registers, which also
emulate sending the IPI.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-25 19:11:34 +01:00
Suravee Suthikulpanit 37ef0c4414 svm: Add warning message for AVIC IPI invalid target
Print warning message when IPI target ID is invalid due to one of
the following reasons:
  * In logical mode: cluster > max_cluster (64)
  * In physical mode: target > max_physical (512)
  * Address is not present in the physical or logical ID tables

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-25 19:11:34 +01:00
Sean Christopherson de81c2f912 KVM: x86: WARN_ONCE if sending a PV IPI returns a fatal error
KVM hypercalls return a negative value error code in case of a fatal
error, e.g. when the hypercall isn't supported or was made with invalid
parameters.  WARN_ONCE on fatal errors when sending PV IPIs as any such
error all but guarantees an SMP system will hang due to a missing IPI.

Fixes: aaffcfd1e8 ("KVM: X86: Implement PV IPIs in linux guest")
Cc: stable@vger.kernel.org
Cc: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-25 19:11:33 +01:00
Sean Christopherson 1ed199a41c KVM: x86: Fix PV IPIs for 32-bit KVM host
The recognition of the KVM_HC_SEND_IPI hypercall was unintentionally
wrapped in "#ifdef CONFIG_X86_64", causing 32-bit KVM hosts to reject
any and all PV IPI requests despite advertising the feature.  This
results in all KVM paravirtualized guests hanging during SMP boot due
to IPIs never being delivered.

Fixes: 4180bf1b65 ("KVM: X86: Implement "send IPI" hypercall")
Cc: stable@vger.kernel.org
Cc: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-25 19:11:32 +01:00
Vitaly Kuznetsov f1adceaf01 x86/kvm/hyper-v: recommend using eVMCS only when it is enabled
We shouldn't probably be suggesting using Enlightened VMCS when it's not
enabled (not supported from guest's point of view). Hyper-V on KVM seems
to be fine either way but let's be consistent.

Fixes: 2bc39970e9 ("x86/kvm/hyper-v: Introduce KVM_GET_SUPPORTED_HV_CPUID")
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-25 19:11:25 +01:00
Vitaly Kuznetsov 1998fd32aa x86/kvm/hyper-v: don't recommend doing reset via synthetic MSR
System reset through synthetic MSR is not recommended neither by genuine
Hyper-V nor my QEMU.

Fixes: 2bc39970e9 ("x86/kvm/hyper-v: Introduce KVM_GET_SUPPORTED_HV_CPUID")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-25 18:53:45 +01:00
Tom Roeder 3a33d030da kvm: x86/vmx: Use kzalloc for cached_vmcs12
This changes the allocation of cached_vmcs12 to use kzalloc instead of
kmalloc. This removes the information leak found by Syzkaller (see
Reported-by) in this case and prevents similar leaks from happening
based on cached_vmcs12.

It also changes vmx_get_nested_state to copy out the full 4k VMCS12_SIZE
in copy_to_user rather than only the size of the struct.

Tested: rebuilt against head, booted, and ran the syszkaller repro
  https://syzkaller.appspot.com/text?tag=ReproC&x=174efca3400000 without
  observing any problems.

Reported-by: syzbot+ded1696f6b50b615b630@syzkaller.appspotmail.com
Fixes: 8fcc4b5923
Cc: stable@vger.kernel.org
Signed-off-by: Tom Roeder <tmroeder@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-25 18:53:10 +01:00
Sean Christopherson 85ba2b165d KVM: VMX: Use the correct field var when clearing VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL
Fix a recently introduced bug that results in the wrong VMCS control
field being updated when applying a IA32_PERF_GLOBAL_CTRL errata.

Fixes: c73da3fcab ("KVM: VMX: Properly handle dynamic VM Entry/Exit controls")
Reported-by: Harald Arnesen <harald@skogtun.org>
Tested-by: Harald Arnesen <harald@skogtun.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-25 18:52:54 +01:00
Alexander Popov 5cc244a20b KVM: x86: Fix single-step debugging
The single-step debugging of KVM guests on x86 is broken: if we run
gdb 'stepi' command at the breakpoint when the guest interrupts are
enabled, RIP always jumps to native_apic_mem_write(). Then other
nasty effects follow.

Long investigation showed that on Jun 7, 2017 the
commit c8401dda2f ("KVM: x86: fix singlestepping over syscall")
introduced the kvm_run.debug corruption: kvm_vcpu_do_singlestep() can
be called without X86_EFLAGS_TF set.

Let's fix it. Please consider that for -stable.

Signed-off-by: Alexander Popov <alex.popov@linux.com>
Cc: stable@vger.kernel.org
Fixes: c8401dda2f ("KVM: x86: fix singlestepping over syscall")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-25 18:52:53 +01:00
Vitaly Kuznetsov 9699f970de x86/kvm/hyper-v: don't announce GUEST IDLE MSR support
HV_X64_MSR_GUEST_IDLE_AVAILABLE appeared in kvm_vcpu_ioctl_get_hv_cpuid()
by mistake: it announces support for HV_X64_MSR_GUEST_IDLE (0x400000F0)
which we don't support in KVM (yet).

Fixes: 2bc39970e9 ("x86/kvm/hyper-v: Introduce KVM_GET_SUPPORTED_HV_CPUID")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-25 18:52:34 +01:00
Arnd Bergmann 0d6040d468 arch: add split IPC system calls where needed
The IPC system call handling is highly inconsistent across architectures,
some use sys_ipc, some use separate calls, and some use both.  We also
have some architectures that require passing IPC_64 in the flags, and
others that set it implicitly.

For the addition of a y2038 safe semtimedop() system call, I chose to only
support the separate entry points, but that requires first supporting
the regular ones with their own syscall numbers.

The IPC_64 is now implied by the new semctl/shmctl/msgctl system
calls even on the architectures that require passing it with the ipc()
multiplexer.

I'm not adding the new semtimedop() or semop() on 32-bit architectures,
those will get implemented using the new semtimedop_time64() version
that gets added along with the other time64 calls.
Three 64-bit architectures (powerpc, s390 and sparc) get semtimedop().

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2019-01-25 17:22:50 +01:00
Sinan Kaya 625210cfa6 x86/Kconfig: Select PCI_LOCKLESS_CONFIG if PCI is enabled
After commit

  5d32a66541 ("PCI/ACPI: Allow ACPI to be built without CONFIG_PCI set")

dependencies on CONFIG_PCI that previously were satisfied implicitly
through dependencies on CONFIG_ACPI have to be specified directly.

PCI_LOCKLESS_CONFIG depends on PCI but this dependency has not been
mentioned in the Kconfig so add an explicit dependency here and fix

  WARNING: unmet direct dependencies detected for PCI_LOCKLESS_CONFIG
    Depends on [n]: PCI [=n]
    Selected by [y]:
    - X86 [=y]

Fixes: 5d32a66541 ("PCI/ACPI: Allow ACPI to be built without CONFIG_PCI set")
Signed-off-by: Sinan Kaya <okaya@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-acpi@vger.kernel.org
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190121231958.28255-2-okaya@kernel.org
2019-01-22 17:06:28 +01:00
Borislav Petkov bae54dc4f3 x86/fpu: Get rid of CONFIG_AS_FXSAVEQ
This was a "workaround" to probe for binutils which could generate
FXSAVEQ, apparently gas with min version 2.16. In the meantime, minimal
required gas version is 2.20 so all those workarounds for older binutils
can be dropped.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20190117232408.GH5023@zn.tnic
2019-01-22 14:16:39 +01:00
Borislav Petkov ee35b9b9f6 x86/traps: Have read_cr0() only once in the #NM handler
... instead of twice in the code. In any case, CR0 ends up being read
once anyway:

1. The CONFIG_MATH_EMULATION case does so and exits.
2. The normal case does it once too.

However, read it on function entry instead to make the code even simpler
to follow.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20190117120728.3811-1-bp@alien8.de
2019-01-22 14:13:35 +01:00
Andrew Murray 88dbe3c94e perf/core, arch/x86: Strengthen exclusion checks with PERF_PMU_CAP_NO_EXCLUDE
For x86 PMUs that do not support context exclusion let's advertise the
PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that perf will
prevent us from handling events where any exclusion flags are set.
Let's also remove the now unnecessary check for exclusion flags.

This change means that amd/iommu and amd/uncore will now also
indicate that they do not support exclude_{hv|idle} and intel/uncore
that it does not support exclude_{guest|host}.

Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: robin.murphy@arm.com
Cc: suzuki.poulose@arm.com
Link: https://lkml.kernel.org/r/1547128414-50693-12-git-send-email-andrew.murray@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-01-21 11:01:29 +01:00
Andrew Murray 2ff4025069 perf/core, arch/x86: Use PERF_PMU_CAP_NO_EXCLUDE for exclusion incapable PMUs
For drivers that do not support context exclusion let's advertise the
PERF_PMU_CAP_NOEXCLUDE capability. This ensures that perf will
prevent us from handling events where any exclusion flags are set.
Let's also remove the now unnecessary check for exclusion flags.

PMU drivers that support at least one exclude flag won't have the
PERF_PMU_CAP_NOEXCLUDE capability set - these PMU drivers should still
check and fail on unsupported exclude flags. These missing tests are
not added in this patch.

Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: robin.murphy@arm.com
Cc: suzuki.poulose@arm.com
Link: https://lkml.kernel.org/r/1547128414-50693-11-git-send-email-andrew.murray@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-01-21 11:01:27 +01:00
Will Deacon 6e693b3ffe x86: uaccess: Inhibit speculation past access_ok() in user_access_begin()
Commit 594cc251fd ("make 'user_access_begin()' do 'access_ok()'")
makes the access_ok() check part of the user_access_begin() preceding a
series of 'unsafe' accesses.  This has the desirable effect of ensuring
that all 'unsafe' accesses have been range-checked, without having to
pick through all of the callsites to verify whether the appropriate
checking has been made.

However, the consolidated range check does not inhibit speculation, so
it is still up to the caller to ensure that they are not susceptible to
any speculative side-channel attacks for user addresses that ultimately
fail the access_ok() check.

This is an oversight, so use __uaccess_begin_nospec() to ensure that
speculation is inhibited until the access_ok() check has passed.

Reported-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-20 15:33:22 +12:00
Linus Torvalds e6ec2fda2d xen: fixes for 5.0-rc3
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCXEGhYwAKCRCAXGG7T9hj
 vu/qAP4ljV1QwAicTP2IosIpmSojJV4BkkJOs36dgF1u4DscPAEA7YhvTzGYcZr6
 7rmBIbGve6Yb24Uh4aL3ir4w+LUvSAg=
 =Jm9L
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

 - Several fixes for the Xen pvcalls drivers (1 fix for the backend and
   8 for the frontend).

 - A fix for a rather longstanding bug in the Xen sched_clock()
   interface which led to weird time jumps when migrating the system.

 - A fix for avoiding accesses to x2apic MSRs in Xen PV guests.

* tag 'for-linus-5.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: Fix x86 sched_clock() interface for xen
  pvcalls-front: fix potential null dereference
  always clear the X2APIC_ENABLE bit for PV guest
  pvcalls-front: Avoid get_free_pages(GFP_KERNEL) under spinlock
  xen/pvcalls: remove set but not used variable 'intf'
  pvcalls-back: set -ENOTCONN in pvcalls_conn_back_read
  pvcalls-front: don't return error when the ring is full
  pvcalls-front: properly allocate sk
  pvcalls-front: don't try to free unallocated rings
  pvcalls-front: read all data before closing the connection
2019-01-19 05:53:41 +12:00
Jiaxun Yang 0237199186 x86/CPU/AMD: Set the CPB bit unconditionally on F17h
Some F17h models do not have CPB set in CPUID even though the CPU
supports it. Set the feature bit unconditionally on all F17h.

 [ bp: Rewrite commit message and patch. ]

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20181120030018.5185-1-jiaxun.yang@flygoat.com
2019-01-18 16:44:03 +01:00
Eric Biggers 793ff5ffc1 crypto: x86/aesni-gcm - make 'struct aesni_gcm_tfm_s' static const
Add missing static keywords to fix the following sparse warnings:

    arch/x86/crypto/aesni-intel_glue.c:197:24: warning: symbol 'aesni_gcm_tfm_sse' was not declared. Should it be static?
    arch/x86/crypto/aesni-intel_glue.c:246:24: warning: symbol 'aesni_gcm_tfm_avx_gen2' was not declared. Should it be static?
    arch/x86/crypto/aesni-intel_glue.c:291:24: warning: symbol 'aesni_gcm_tfm_avx_gen4' was not declared. Should it be static?

I also made the affected structures 'const', and adjusted the
indentation in the struct definition to not be insane.

Cc: Dave Watson <davejwatson@fb.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-01-18 18:43:43 +08:00
Jan Beulich fc24d75a7f x86/entry/64/compat: Fix stack switching for XEN PV
While in the native case entry into the kernel happens on the trampoline
stack, PV Xen kernels get entered with the current thread stack right
away. Hence source and destination stacks are identical in that case,
and special care is needed.

Other than in sync_regs() the copying done on the INT80 path isn't
NMI / #MC safe, as either of these events occurring in the middle of the
stack copying would clobber data on the (source) stack.

There is similar code in interrupt_entry() and nmi(), but there is no fixup
required because those code paths are unreachable in XEN PV guests.

[ tglx: Sanitized subject, changelog, Fixes tag and stable mail address. Sigh ]

Fixes: 7f2590a110 ("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: xen-devel@lists.xenproject.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/5C3E1128020000780020DFAD@prv1-mh.provo.novell.com
2019-01-18 00:39:33 +01:00
Shirish S 30aa3d26ed x86/MCE/AMD: Carve out the MC4_MISC thresholding quirk
The MC4_MISC thresholding quirk needs to be applied during S5 -> S0 and
S3 -> S0 state transitions, which follow different code paths. Carve it
out into a separate function and call it mce_amd_feature_init() where
the two code paths of the state transitions converge.

 [ bp: massage commit message and the carved out function. ]

Signed-off-by: Shirish S <shirish.s@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/1547651417-23583-3-git-send-email-shirish.s@amd.com
2019-01-16 19:42:00 +01:00
Juergen Gross 867cefb4cb xen: Fix x86 sched_clock() interface for xen
Commit f94c8d1169 ("sched/clock, x86/tsc: Rework the x86 'unstable'
sched_clock() interface") broke Xen guest time handling across
migration:

[  187.249951] Freezing user space processes ... (elapsed 0.001 seconds) done.
[  187.251137] OOM killer disabled.
[  187.251137] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[  187.252299] suspending xenstore...
[  187.266987] xen:grant_table: Grant tables using version 1 layout
[18446743811.706476] OOM killer enabled.
[18446743811.706478] Restarting tasks ... done.
[18446743811.720505] Setting capacity to 16777216

Fix that by setting xen_sched_clock_offset at resume time to ensure a
monotonic clock value.

[boris: replaced pr_info() with pr_info_once() in xen_callback_vector()
 to avoid printing with incorrect timestamp during resume (as we
 haven't re-adjusted the clock yet)]

Fixes: f94c8d1169 ("sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface")
Cc: <stable@vger.kernel.org> # 4.11
Reported-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2019-01-16 13:06:05 -05:00
Gustavo A. R. Silva 2bc217c616 x86/platform/UV: Replace kmalloc() and memset() with k[cz]alloc() calls
Replace kmalloc_node() and memset() with kzalloc_node(), and
kmalloc_array() and memset() with kcalloc().

This code was detected with the help of Coccinelle.

No functional changes.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Andrew Banman <abanman@hpe.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Varsha Rao <rvarsha016@gmail.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190115173713.GA31031@embeddedor
2019-01-16 13:01:26 +01:00
Borislav Petkov 093ae8f9a8 x86/TSC: Use RDTSCP
Currently, the kernel uses

  [LM]FENCE; RDTSC

in the timekeeping code, to guarantee monotonicity of time where the
*FENCE is selected based on vendor.

Replace that sequence with RDTSCP which is faster or on-par and gives
the same guarantees.

A microbenchmark on Intel shows that the change is on-par.

On AMD, the change is either on-par with the current LFENCE-prefixed
RDTSC or slightly better with RDTSCP.

The comparison is done with the LFENCE-prefixed RDTSC (and not with the
MFENCE-prefixed one, as one would normally expect) because all modern
AMD families make LFENCE serializing and thus avoid the heavy MFENCE by
effectively enabling X86_FEATURE_LFENCE_RDTSC.

Co-developed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20181119184556.11479-1-bp@alien8.de
2019-01-16 12:43:08 +01:00
Borislav Petkov 71a93c2693 x86/alternatives: Add an ALTERNATIVE_3() macro
Similar to ALTERNATIVE_2(), ALTERNATIVE_3() selects between 3 possible
variants. Will be used for adding RDTSCP to the rdtsc_ordered()
alternatives.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: X86 ML <x86@kernel.org>
Link: https://lkml.kernel.org/r/20181211222326.14581-4-bp@alien8.de
2019-01-16 12:42:50 +01:00
Borislav Petkov c1d4e4192a x86/alternatives: Print containing function
... in the "debug-alternative" output so that one can find her way
easier when staring at the vmlinux disassembly.

For example:

  apply_alternatives: feat: 3*32+18, old: (read_tsc+0x0/0x10 (ffffffff8101d1c0) len: 5), repl: (ffffffff824e6d33, len: 5)
  					   ^^^^^^^^^^^^^^^^^
  ffffffff8101d1c0: old_insn: 0f 31 90 90 90
  ffffffff824e6d33: rpl_insn: 0f ae e8 0f 31
  ffffffff8101d1c0: final_insn: 0f ae e8 0f 31

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: X86 ML <x86@kernel.org>
Link: https://lkml.kernel.org/r/20181211222326.14581-3-bp@alien8.de
2019-01-16 12:41:57 +01:00
Borislav Petkov 1c1ed4731c x86/alternatives: Add macro comments
... so that when one stares at the .s output, one can find her way
around the resulting asm magic.

With it, ALTERNATIVE looks like this now:

          # ALT: oldnstr
  661:
          ...
  662:
          # ALT: padding
  .skip ...
  663:
  .pushsection .altinstructions,"a"

  ...

  .popsection
  .pushsection .altinstr_replacement, "ax"
          # ALT: replacement 1
  6641:
  	...
  6651:
          .popsection

Merge __OLDINSTR() into OLDINSTR(), while at it.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: X86 ML <x86@kernel.org>
Link: https://lkml.kernel.org/r/20181211222326.14581-2-bp@alien8.de
2019-01-16 12:40:07 +01:00
Borislav Petkov e6d7bc0bdf x86/build: Use the single-argument OUTPUT_FORMAT() linker script command
The various x86 linker scripts use the three-argument linker script
command variant OUTPUT_FORMAT(DEFAULT, BIG, LITTLE) which specifies
three object file formats when the -EL and -EB linker command line
options are used. When -EB is specified, OUTPUT_FORMAT issues the BIG
object file format, when -EL, LITTLE, respectively, and when neither is
specified, DEFAULT.

However, those -E[LB] options are not used by arch/x86/ so switch to the
simple OUTPUT_FORMAT(BFDNAME) macro variant.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20190109181531.27513-1-bp@alien8.de
2019-01-16 12:21:53 +01:00
Sinan Kaya e9820d6b0a x86/intel/lpss: Make PCI dependency explicit
After commit 5d32a66541 (PCI/ACPI: Allow ACPI to be built without
CONFIG_PCI set) dependencies on CONFIG_PCI that previously were
satisfied implicitly through dependencies on CONFIG_ACPI have to be
specified directly.

LPSS code relies on PCI infrastructure but this dependency has not
been called out explicitly yet.

Fixes: 5d32a66541 ("PCI/ACPI: Allow ACPI to be built without CONFIG_PCI set")
Signed-off-by: Sinan Kaya <okaya@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-01-15 23:17:28 +01:00
Shirish S c95b323dcd x86/MCE/AMD: Turn off MC4_MISC thresholding on all family 0x15 models
MC4_MISC thresholding is not supported on all family 0x15 processors,
hence skip the x86_model check when applying the quirk.

 [ bp: massage commit message. ]

Signed-off-by: Shirish S <shirish.s@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/1547106849-3476-2-git-send-email-shirish.s@amd.com
2019-01-15 18:11:38 +01:00
Dave Young 993a110319 x86/kexec: Fix a kexec_file_load() failure
Commit

  b6664ba42f ("s390, kexec_file: drop arch_kexec_mem_walk()")

changed the behavior of kexec_locate_mem_hole(): it will try to allocate
free memory only when kbuf.mem is initialized to zero.

However, x86's kexec_file_load() implementation reuses a struct
kexec_buf allocated on the stack and its kbuf.mem member gets set by
each kexec_add_buffer() invocation.

The second kexec_add_buffer() will reuse the same kbuf but not
reinitialize kbuf.mem.

Therefore, explictily reset kbuf.mem each time in order for
kexec_locate_mem_hole() to locate a free memory region each time.

 [ bp: massage commit message. ]

Fixes: b6664ba42f ("s390, kexec_file: drop arch_kexec_mem_walk()")
Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Philipp Rudo <prudo@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yannik Sembritzki <yannik@sembritzki.me>
Cc: Yi Wang <wang.yi59@zte.com.cn>
Cc: kexec@lists.infradead.org
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20181228011247.GA9999@dhcp-128-65.nay.redhat.com
2019-01-15 12:12:50 +01:00
Peng Hao bf7d28c534 x86/mm/mem_encrypt: Fix erroneous sizeof()
Using sizeof(pointer) for determining the size of a memset() only works
when the size of the pointer and the size of type to which it points are
the same. For pte_t this is only true for 64bit and 32bit-NONPAE. On 32bit
PAE systems this is wrong as the pointer size is 4 byte but the PTE entry
is 8 bytes. It's actually not a real world issue as this code depends on
64bit, but it's wrong nevertheless.

Use sizeof(*p) for correctness sake.

Fixes: aad983913d ("x86/mm/encrypt: Simplify sme_populate_pgd() and sme_populate_pgd_large()")
Signed-off-by: Peng Hao <peng.hao2@zte.com.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: dave.hansen@linux.intel.com
Cc: peterz@infradead.org
Cc: luto@kernel.org
Link: https://lkml.kernel.org/r/1546065252-97996-1-git-send-email-peng.hao2@zte.com.cn
2019-01-15 11:41:58 +01:00
Dave Hansen a31e184e4f x86/pkeys: Properly copy pkey state at fork()
Memory protection key behavior should be the same in a child as it was
in the parent before a fork.  But, there is a bug that resets the
state in the child at fork instead of preserving it.

The creation of new mm's is a bit convoluted.  At fork(), the code
does:

  1. memcpy() the parent mm to initialize child
  2. mm_init() to initalize some select stuff stuff
  3. dup_mmap() to create true copies that memcpy() did not do right

For pkeys two bits of state need to be preserved across a fork:
'execute_only_pkey' and 'pkey_allocation_map'.

Those are preserved by the memcpy(), but mm_init() invokes
init_new_context() which overwrites 'execute_only_pkey' and
'pkey_allocation_map' with "new" values.

The author of the code erroneously believed that init_new_context is *only*
called at execve()-time.  But, alas, init_new_context() is used at execve()
and fork().

The result is that, after a fork(), the child's pkey state ends up looking
like it does after an execve(), which is totally wrong.  pkeys that are
already allocated can be allocated again, for instance.

To fix this, add code called by dup_mmap() to copy the pkey state from
parent to child explicitly.  Also add a comment above init_new_context() to
make it more clear to the next poor sod what this code is used for.

Fixes: e8c24d3a23 ("x86/pkeys: Allocation/free syscalls")
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: peterz@infradead.org
Cc: mpe@ellerman.id.au
Cc: will.deacon@arm.com
Cc: luto@kernel.org
Cc: jroedel@suse.de
Cc: stable@vger.kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/20190102215655.7A69518C@viggo.jf.intel.com
2019-01-15 10:33:45 +01:00
Talons Lee 5268c8f39e always clear the X2APIC_ENABLE bit for PV guest
Commit e657fcc clears cpu capability bit instead of using fake cpuid
value, the EXTD should always be off for PV guest without depending
on cpuid value. So remove the cpuid check in xen_read_msr_safe() to
always clear the X2APIC_ENABLE bit.

Signed-off-by: Talons Lee <xin.li@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2019-01-14 09:00:32 -05:00
Andy Shevchenko b62928ff55 x86/MCE: Switch to use the new generic UUID API
Switch the code to use the new, generic helpers.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190110153645.40649-1-andriy.shevchenko@linux.intel.com
2019-01-14 11:16:55 +01:00
Huang Zijiang 345dca4ca7 x86/e820: Replace kmalloc() + memcpy() with kmemdup()
Use the equivalent kmemdup() directly instead of kmalloc() + memcpy().

No functional changes.

 [ bp: rewrite commit message. ]

Signed-off-by: Huang Zijiang <huang.zijiang@zte.com.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: jschoenh@amazon.de
Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: xue.zhihong@zte.com.cn
Link: https://lkml.kernel.org/r/1547277384-22156-1-git-send-email-wang.yi59@zte.com.cn
2019-01-13 15:11:35 +01:00
Linus Torvalds 473348891c KVM fixes for 5.0-rc2
Minor fixes for new code, corner cases, and documentation.
 Patches are isolated and sufficiently described by the shortlog.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEj/xJEkTHTJzZE/FJQP/qGw8qh+gFAlw52x0ACgkQQP/qGw8q
 h+iejgf+NjD3jXCYAxLtGInXyQEwqw113Gr8941oC2E50CifTqrFhWNkRgl+GyCS
 x92SCiNS1xpy+SnUZDj63MjsB90zOOogPG+RYsSE+BAN/i/9IiE2F5aYSXlen0rR
 SOOfraQTDlS5rOp9auGMNlNWkmEKCO6YEsdmlNte/QVNuy4vrHfAJDvdfCUiC0qO
 XJWMjIyZElsKJUtmZzKuTrCZniBvqn6yGH+/0kggrzDPPIHU+M7qSkMkAC7wqrFy
 0zaeKNloArFLKlDQPQ3AykgxCrp6CJnXuR/Oo6R5rn9jF2kVdlzglx7pHBAc5OGp
 wI+VDWRA2x2JevWcIdsN2Xp2vaCRGw==
 =L1zP
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Radim Krčmář:
 "Minor fixes for new code, corner cases, and documentation"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  x86/kvm/nVMX: don't skip emulated instruction twice when vmptr address is not backed
  Documentation/virtual/kvm: Update URL for AMD SEV API specification
  KVM/VMX: Avoid return error when flush tlb successfully in the hv_remote_flush_tlb_with_range()
  kvm: sev: Fail KVM_SEV_INIT if already initialized
  KVM: validate userspace input in kvm_clear_dirty_log_protect()
  KVM: x86: Fix bit shifting in update_intel_pt_cfg
2019-01-12 10:39:43 -08:00
Rasmus Villemoes 2e905c7abd x86/asm: Remove unused __constant_c_x_memset() macro and inlines
Nothing refers to the __constant_c_x_memset() macro anymore. Remove
it and the two referenced static inline functions.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190111084931.24601-2-linux@rasmusvillemoes.dk
2019-01-12 17:54:49 +01:00
Rasmus Villemoes 88ca66d854 x86/asm: Remove dead __GNUC__ conditionals
The minimum supported gcc version is >= 4.6, so these can be removed.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190111084931.24601-1-linux@rasmusvillemoes.dk
2019-01-12 17:50:48 +01:00
George Rimar 927185c124 x86/build: Specify elf_i386 linker emulation explicitly for i386 objects
The kernel uses the OUTPUT_FORMAT linker script command in it's linker
scripts. Most of the time, the -m option is passed to the linker with
correct architecture, but sometimes (at least for x86_64) the -m option
contradicts the OUTPUT_FORMAT directive.

Specifically, arch/x86/boot and arch/x86/realmode/rm produce i386 object
files, but are linked with the -m elf_x86_64 linker flag when building
for x86_64.

The GNU linker manpage doesn't explicitly state any tie-breakers between
-m and OUTPUT_FORMAT. But with BFD and Gold linkers, OUTPUT_FORMAT
overrides the emulation value specified with the -m option.

LLVM lld has a different behavior, however. When supplied with
contradicting -m and OUTPUT_FORMAT values it fails with the following
error message:

  ld.lld: error: arch/x86/realmode/rm/header.o is incompatible with elf_x86_64

Therefore, just add the correct -m after the incorrect one (it overrides
it), so the linker invocation looks like this:

  ld -m elf_x86_64 -z max-page-size=0x200000 -m elf_i386 --emit-relocs -T \
    realmode.lds header.o trampoline_64.o stack.o reboot.o -o realmode.elf

This is not a functional change for GNU ld, because (although not
explicitly documented) OUTPUT_FORMAT overrides -m EMULATION.

Tested by building x86_64 kernel with GNU gcc/ld toolchain and booting
it in QEMU.

 [ bp: massage and clarify text. ]

Suggested-by: Dmitry Golovin <dima@golovin.in>
Signed-off-by: George Rimar <grimar@accesssoftek.com>
Signed-off-by: Tri Vo <trong@android.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Tri Vo <trong@android.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Matz <matz@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: morbo@google.com
Cc: ndesaulniers@google.com
Cc: ruiu@google.com
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190111201012.71210-1-trong@android.com
2019-01-12 00:11:39 +01:00
Daniel Drake 7e6fc2f50a x86/kaslr: Fix incorrect i8254 outb() parameters
The outb() function takes parameters value and port, in that order.  Fix
the parameters used in the kalsr i8254 fallback code.

Fixes: 5bfce5ef55 ("x86, kaslr: Provide randomness functions")
Signed-off-by: Daniel Drake <drake@endlessm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: linux@endlessm.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190107034024.15005-1-drake@endlessm.com
2019-01-11 21:35:47 +01:00
Sinan Kaya 5962dd22f0 x86/intel/lpss: Make PCI dependency explicit
After commit

  5d32a66541 ("PCI/ACPI: Allow ACPI to be built without CONFIG_PCI set")

dependencies on CONFIG_PCI that previously were satisfied implicitly
through dependencies on CONFIG_ACPI have to be specified directly. LPSS
code relies on PCI infrastructure but this dependency has not been
explicitly called out so do that.

Fixes: 5d32a66541 ("PCI/ACPI: Allow ACPI to be built without CONFIG_PCI set")
Signed-off-by: Sinan Kaya <okaya@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-acpi@vger.kernel.org
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190102181038.4418-11-okaya@kernel.org
2019-01-11 19:32:22 +01:00
Vitaly Kuznetsov 826c1362e7 x86/kvm/nVMX: don't skip emulated instruction twice when vmptr address is not backed
Since commit 09abb5e3e5 ("KVM: nVMX: call kvm_skip_emulated_instruction
in nested_vmx_{fail,succeed}") nested_vmx_failValid() results in
kvm_skip_emulated_instruction() so doing it again in handle_vmptrld() when
vmptr address is not backed is wrong, we end up advancing RIP twice.

Fixes: fca91f6d60 ("kvm: nVMX: Set VM instruction error for VMPTRLD of unbacked page")
Reported-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2019-01-11 18:41:53 +01:00
Lan Tianyu b7c1c226f9 KVM/VMX: Avoid return error when flush tlb successfully in the hv_remote_flush_tlb_with_range()
The "ret" is initialized to be ENOTSUPP. The return value of
__hv_remote_flush_tlb_with_range() will be Or with "ret" when ept
table potiners are mismatched. This will cause return ENOTSUPP even if
flush tlb successfully. This patch is to fix the issue and set
"ret" to 0.

Fixes: a5c214dad1 ("KVM/VMX: Change hv flush logic when ept tables are mismatched.")
Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2019-01-11 18:38:07 +01:00
David Rientjes 3f14a89d11 kvm: sev: Fail KVM_SEV_INIT if already initialized
By code inspection, it was found that multiple calls to KVM_SEV_INIT
could deplete asid bits and overwrite kvm_sev_info's regions_list.

Multiple calls to KVM_SVM_INIT is not likely to occur with QEMU, but this
should likely be fixed anyway.

This code is serialized by kvm->lock.

Fixes: 1654efcbc4 ("KVM: SVM: Add KVM_SEV_INIT command")
Reported-by: Cfir Cohen <cfir@google.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2019-01-11 18:38:07 +01:00
Gustavo A. R. Silva d14eff1bc5 KVM: x86: Fix bit shifting in update_intel_pt_cfg
ctl_bitmask in pt_desc is of type u64. When an integer like 0xf is
being left shifted more than 32 bits, the behavior is undefined.

Fix this by adding suffix ULL to integer 0xf.

Addresses-Coverity-ID: 1476095 ("Bad bit shift operation")
Fixes: 6c0f0bba85 ("KVM: x86: Introduce a function to initialize the PT configuration")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
Reviewed-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2019-01-11 18:38:07 +01:00
Lianbo Jiang 65f750e545 x86/kdump: Export the SME mask to vmcoreinfo
On AMD SME machines, makedumpfile tools need to know whether the crashed
kernel was encrypted.

If SME is enabled in the first kernel, the crashed kernel's page table
entries (pgd/pud/pmd/pte) contain the memory encryption mask which
makedumpfile needs to remove in order to obtain the true physical
address.

Export that mask in a vmcoreinfo variable.

 [ bp: Massage commit message and move define at the end of the
   function. ]

Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: anderson@redhat.com
Cc: k-hagio@ab.jp.nec.com
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190110121944.6050-3-lijiang@redhat.com
2019-01-11 16:09:25 +01:00
Rafael Ávila de Espíndola d071ae09a4 x86/build: Mark per-CPU symbols as absolute explicitly for LLD
Accessing per-CPU variables is done by finding the offset of the
variable in the per-CPU block and adding it to the address of the
respective CPU's block.

Section 3.10.8 of ld.bfd's documentation states:

  For expressions involving numbers, relative addresses and absolute
  addresses, ld follows these rules to evaluate terms:

  Other binary operations, that is, between two relative addresses
  not in the same section, or between a relative address and an
  absolute address, first convert any non-absolute term to an
  absolute address before applying the operator."

Note that LLVM's linker does not adhere to the GNU ld's implementation
and as such requires implicitly-absolute terms to be explicitly marked
as absolute in the linker script. If not, it fails currently with:

  ld.lld: error: ./arch/x86/kernel/vmlinux.lds:153: at least one side of the expression must be absolute
  ld.lld: error: ./arch/x86/kernel/vmlinux.lds:154: at least one side of the expression must be absolute
  Makefile:1040: recipe for target 'vmlinux' failed

This is not a functional change for ld.bfd which converts the term to an
absolute symbol anyways as specified above.

Based on a previous submission by Tri Vo <trong@android.com>.

Reported-by: Dmitry Golovin <dima@golovin.in>
Signed-off-by: Rafael Ávila de Espíndola <rafael@espindo.la>
[ Update commit message per Boris' and Michael's suggestions. ]
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
[ Massage commit message more, fix typos. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Dmitry Golovin <dima@golovin.in>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Cao Jin <caoj.fnst@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tri Vo <trong@android.com>
Cc: dima@golovin.in
Cc: morbo@google.com
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20181219190145.252035-1-ndesaulniers@google.com
2019-01-09 11:22:35 +01:00
WANG Chao e4f358916d x86, modpost: Replace last remnants of RETPOLINE with CONFIG_RETPOLINE
Commit

  4cd24de3a0 ("x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support")

replaced the RETPOLINE define with CONFIG_RETPOLINE checks. Remove the
remaining pieces.

 [ bp: Massage commit message. ]

Fixes: 4cd24de3a0 ("x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support")
Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Kees Cook <keescook@chromium.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: linux-kbuild@vger.kernel.org
Cc: srinivas.eeda@oracle.com
Cc: stable <stable@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20181210163725.95977-1-chao.wang@ucloud.cn
2019-01-09 10:35:56 +01:00
Borislav Petkov 90802938f7 x86/cache: Rename config option to CONFIG_X86_RESCTRL
CONFIG_RESCTRL is too generic. The final goal is to have a generic
option called like this which is selected by the arch-specific ones
CONFIG_X86_RESCTRL and CONFIG_ARM64_RESCTRL. The generic one will
cover the resctrl filesystem and other generic and shared bits of
functionality.

Signed-off-by: Borislav Petkov <bp@suse.de>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Requested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Babu Moger <babu.moger@amd.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: James Morse <james.morse@arm.com>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20190108171401.GC12235@zn.tnic
2019-01-09 10:29:03 +01:00
Linus Torvalds 85e1ffbd42 Kbuild late updates for v4.21
- improve boolinit.cocci and use_after_iter.cocci semantic patches
 
 - fix alignment for kallsyms
 
 - move 'asm goto' compiler test to Kconfig and clean up jump_label
   CONFIG option
 
 - generate asm-generic wrappers automatically if arch does not implement
   mandatory UAPI headers
 
 - remove redundant generic-y defines
 
 - misc cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJcMV5GAAoJED2LAQed4NsGs9gQAI/oGg8wJgk9a7+dJCX245W5
 F4ReftnQd4AFptFCi9geJkr+sfViXNgwPLqlJxiXz8Qe8XP7z3LcArDw3FUzwvGn
 bMSBiN9ggwWkOFgF523XesYgUVtcLpkNch/Migzf1Ac0FHk0G9o7gjcdsvAWHkUu
 qFwtNcUB6PElRbhsHsh5qCY1/6HaAXgf/7O7wztnaKRe9myN6f2HzT4wANS9HHde
 1e1r0LcIQeGWfG+3va3fZl6SDxSI/ybl244OcDmDyYl6RA1skSDlHbIBIFgUPoS0
 cLyzoVj+GkfI1fRFEIfou+dj7lpukoAXHsggHo0M+ofqtbMF+VB2T3jvg4txanCP
 TXzDc+04QUguK5yVnBfcnyC64Htrhnbq0eGy43kd1VZWAEGApl+680P8CRsWU3ZV
 kOiFvZQ6RP/Ssw+a42yU3SHr31WD7feuQqHU65osQt4rdyL5wnrfU1vaUvJSkltF
 cyPr9Kz/Ism0kPodhpFkuKxwtlKOw6/uwdCQoQHtxAPkvkcydhYx93x3iE0nxObS
 CRMximiRyE12DOcv/3uv69n0JOPn6AsITcMNp8XryASYrR2/52txhGKGhvo3+Zoq
 5pwc063JsuxJ/5/dcOw/erQar5d1eBRaBJyEWnXroxUjbsLPAznE+UIN8tmvyVly
 SunlxNOXBdYeWN6t6S3H
 =I+r6
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v4.21-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull more Kbuild updates from Masahiro Yamada:

 - improve boolinit.cocci and use_after_iter.cocci semantic patches

 - fix alignment for kallsyms

 - move 'asm goto' compiler test to Kconfig and clean up jump_label
   CONFIG option

 - generate asm-generic wrappers automatically if arch does not
   implement mandatory UAPI headers

 - remove redundant generic-y defines

 - misc cleanups

* tag 'kbuild-v4.21-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kconfig: rename generated .*conf-cfg to *conf-cfg
  kbuild: remove unnecessary stubs for archheader and archscripts
  kbuild: use assignment instead of define ... endef for filechk_* rules
  arch: remove redundant UAPI generic-y defines
  kbuild: generate asm-generic wrappers if mandatory headers are missing
  arch: remove stale comments "UAPI Header export list"
  riscv: remove redundant kernel-space generic-y
  kbuild: change filechk to surround the given command with { }
  kbuild: remove redundant target cleaning on failure
  kbuild: clean up rule_dtc_dt_yaml
  kbuild: remove UIMAGE_IN and UIMAGE_OUT
  jump_label: move 'asm goto' support test to Kconfig
  kallsyms: lower alignment on ARM
  scripts: coccinelle: boolinit: drop warnings on named constants
  scripts: coccinelle: check for redeclaration
  kconfig: remove unused "file" field of yylval union
  nds32: remove redundant kernel-space generic-y
  nios2: remove unneeded HAS_DMA define
2019-01-06 16:33:10 -08:00
Linus Torvalds e2b745f469 dma-mapping fixes for Linux 4.21-rc1
Fix various regressions introduced in this cycles:
 
  - fix dma-debug tracking for the map_page / map_single consolidatation
  - properly stub out DMA mapping symbols for !HAS_DMA builds to avoid
    link failures
  - fix AMD Gart direct mappings
  - setup the dma address for no kernel mappings using the remap
    allocator
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlwyR9ULHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYPvOA/+L+32p2pm8o6NTgvtRvqsKNrbOm02fORLrhBqAiok
 AcirFDxTfMuUWU2isr7E7WNqwEmUQ1nVUa+I0IJ/IJFfKdTggXcaTX1M19+62KWa
 1LHpZLg1t2rl2yFQHgTrFKr5sz1PwUKZO8UbrYaYYgLgQkWDRzJs4E/tFNju8pMm
 0Usexo/bkI5mreJBImMsFwAnuk0k3NT058XIeD+eNttKjcuz5kEH+bE/999vySW3
 sOj9Peic/EFelOGb4ODxUIPjhiGFMv5dVusSAsFBH26iwQfX/tFSmXhrI5cnDewg
 NlREennfyM+6uTH/DO+BlX7eGCRYbFc1GU5H9q4rRMXhEam6oc2AzVKuElJOVstZ
 XVjP6zTwmuOh/5ff0NG6EPjA/OFcmlBEsmeWu4xSS8KsNILOkpUaPed/uWnA7O+2
 mvU104NA5cHgVMgiGNM/4ilirkEZEFEHYhafH42bQxjMigm7ZHN14NtwM7StLTu6
 QgyfPUcW/LmHj2scgvB1AZ+iQX0z7yJJMGifUxtz+eMCWCC7neOJ7JLvNnS9WI5w
 9RwYaCOcDAZyAmCpbSADWxeG9cfsCDp8wmaGs3YVyhkDU8tCSqbxWJutvyDQnC17
 GtZ0vYLTaJXBCq1L/FC0y8NCCGgvySPXYU7/ZYuOCzS4q2jvjwTWD3dKodvnS+mb
 B0s=
 =H9J6
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-4.21-1' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fixes from Christoph Hellwig:
 "Fix various regressions introduced in this cycles:

   - fix dma-debug tracking for the map_page / map_single
     consolidatation

   - properly stub out DMA mapping symbols for !HAS_DMA builds to avoid
     link failures

   - fix AMD Gart direct mappings

   - setup the dma address for no kernel mappings using the remap
     allocator"

* tag 'dma-mapping-4.21-1' of git://git.infradead.org/users/hch/dma-mapping:
  dma-direct: fix DMA_ATTR_NO_KERNEL_MAPPING for remapped allocations
  x86/amd_gart: fix unmapping of non-GART mappings
  dma-mapping: remove a few unused exports
  dma-mapping: properly stub out the DMA API for !CONFIG_HAS_DMA
  dma-mapping: remove dmam_{declare,release}_coherent_memory
  dma-mapping: implement dmam_alloc_coherent using dmam_alloc_attrs
  dma-mapping: implement dma_map_single_attrs using dma_map_page_attrs
2019-01-06 11:47:26 -08:00
Linus Torvalds 926b02d3eb pci-v4.21-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAlwtMCIUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vwQUQ/+P5/VDpo4abjudGO2c7FU1bJOwvfN
 cfV5dvDCw0kpx0Em5SmnpAD7Punllxxvb/04K75lqarGyx/Txqaw+lbIF+qSj6my
 GsQ16Iy8T48x5hr+Pf6vTh1eE+NaAVZfOPDOt7CyTNAgwfzHeVNyfNvz7pfKTIIJ
 Mk/jRE4kkeWo60jsY5p3sFo3OVOxBOsRdN+2sruaQuWFXrKHLyNDR+7Z9ZPxubFk
 cCO/TYPhNXmmKhCAR4V/rGiqz9OL2wyFixGhGhmD3tnC9nAb/wTMzjARsyBopBPi
 b/KkR2eLFEyXN0HJrwqxiURo4J3nveAYEuNXH5KjRBQZnoBCGSCIlqFhlrp9vdBk
 B4KIdT8h/M6LsVGeVSEIxIEXCp67YE31kxraFrk4Vsggdh2TFQ0llh1sajj8IFJB
 XekutedAOlTSOaM1/jvVPUJYg04X90bp3uXn3IU45XlQ8nBOG3immFVITRLkvd3w
 ywH+SEdeZAhWl3RGy8SHhqdeCJ7nNQbcRaRJ5CoWJBDNJTBGF1X+zJD2Swi6H9vA
 nWGNRlb3CPPIMPF127nADnOE7Cj2FlpAEIEu52HpcpIrhEdrGvLkGeQfgdWBjbyU
 aHwC04bLWnvsA9SEFVnuMIBaFQmJ1RuaWAHdtscyyO2uoeCtN8Aa+BX6jXFbVZQN
 9eFzpiv0kUiXlAQ=
 =g1ia
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.21-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:

 - Remove unused lists from ASPM pcie_link_state (Frederick Lawler)

 - Fix Broadcom CNB20LE host bridge unintended sign extension (Colin Ian
   King)

 - Expand Kconfig "PF" acronyms (Randy Dunlap)

 - Update MAINTAINERS for arch/x86/kernel/early-quirks.c (Bjorn Helgaas)

 - Add missing include to drivers/pci.h (Alexandru Gagniuc)

 - Override Synopsys USB 3.x HAPS device class so dwc3-haps can claim it
   instead of xhci (Thinh Nguyen)

 - Clean up P2PDMA documentation (Randy Dunlap)

 - Allow runtime PM even if driver doesn't supply callbacks (Jarkko
   Nikula)

 - Remove status check after submitting Switchtec MRPC Firmware Download
   commands to avoid Completion Timeouts (Kelvin Cao)

 - Set Switchtec coherent DMA mask to allow 64-bit DMA (Boris Glimcher)

 - Fix Switchtec SWITCHTEC_IOCTL_EVENT_IDX_ALL flag overwrite issue
   (Joey Zhang)

 - Enable write combining for Switchtec MRPC Input buffers (Kelvin Cao)

 - Add Switchtec MRPC DMA mode support (Wesley Sheng)

 - Skip VF scanning on powerpc, which does this in firmware (Sebastian
   Ott)

 - Add Amlogic Meson PCIe controller driver and DT bindings (Yue Wang)

 - Constify histb dw_pcie_host_ops structure (Julia Lawall)

 - Support multiple power domains for imx6 (Leonard Crestez)

 - Constify layerscape driver data (Stefan Agner)

 - Update imx6 Kconfig to allow imx6 PCIe in imx7 kernel (Trent Piepho)

 - Support armada8k GPIO reset (Baruch Siach)

 - Support suspend/resume support on imx6 (Leonard Crestez)

 - Don't hard-code DesignWare DBI/ATU offst (Stephen Warren)

 - Skip i.MX6 PHY setup on i.MX7D (Andrey Smirnov)

 - Remove Jianguo Sun from HiSilicon STB maintainers (Lorenzo Pieralisi)

 - Mask DesignWare interrupts instead of disabling them to avoid lost
   interrupts (Marc Zyngier)

 - Add locking when acking DesignWare interrupts (Marc Zyngier)

 - Ack DesignWare interrupts in the proper callbacks (Marc Zyngier)

 - Use devm resource parser in mediatek (Honghui Zhang)

 - Remove unused mediatek "num-lanes" DT property (Honghui Zhang)

 - Add UniPhier PCIe controller driver and DT bindings (Kunihiko
   Hayashi)

 - Enable MSI for imx6 downstream components (Richard Zhu)

* tag 'pci-v4.21-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (40 commits)
  PCI: imx: Enable MSI from downstream components
  s390/pci: skip VF scanning
  PCI/IOV: Add flag so platforms can skip VF scanning
  PCI/IOV: Factor out sriov_add_vfs()
  PCI: uniphier: Add UniPhier PCIe host controller support
  dt-bindings: PCI: Add UniPhier PCIe host controller description
  PCI: amlogic: Add the Amlogic Meson PCIe controller driver
  dt-bindings: PCI: meson: add DT bindings for Amlogic Meson PCIe controller
  arm64: dts: mt7622: Remove un-used property for PCIe
  arm: dts: mt7623: Remove un-used property for PCIe
  dt-bindings: PCI: MediaTek: Remove un-used property
  PCI: mediatek: Remove un-used variant in struct mtk_pcie_port
  MAINTAINERS: Remove Jianguo Sun from HiSilicon STB DWC entry
  PCI: dwc: Don't hard-code DBI/ATU offset
  PCI: imx: Add imx6sx suspend/resume support
  PCI: armada8k: Add support for gpio controlled reset signal
  PCI: dwc: Adjust Kconfig to allow IMX6 PCIe host on IMX7
  PCI: dwc: layerscape: Constify driver data
  PCI: imx: Add multi-pd support
  PCI: Override Synopsys USB 3.x HAPS device class
  ...
2019-01-05 17:57:34 -08:00
Masahiro Yamada d6e4b3e326 arch: remove redundant UAPI generic-y defines
Now that Kbuild automatically creates asm-generic wrappers for missing
mandatory headers, it is redundant to list the same headers in
generic-y and mandatory-y.

Suggested-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
2019-01-06 10:22:15 +09:00
Masahiro Yamada d4ce5458ea arch: remove stale comments "UAPI Header export list"
These comments are leftovers of commit fcc8487d47 ("uapi: export all
headers under uapi directories").

Prior to that commit, exported headers must be explicitly added to
header-y. Now, all headers under the uapi/ directories are exported.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-01-06 09:46:51 +09:00
Masahiro Yamada 172caf1993 kbuild: remove redundant target cleaning on failure
Since commit 9c2af1c737 ("kbuild: add .DELETE_ON_ERROR special
target"), the target file is automatically deleted on failure.

The boilerplate code

  ... || { rm -f $@; false; }

is unneeded.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-01-06 09:46:51 +09:00
Masahiro Yamada e9666d10a5 jump_label: move 'asm goto' support test to Kconfig
Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label".

The jump label is controlled by HAVE_JUMP_LABEL, which is defined
like this:

  #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
  # define HAVE_JUMP_LABEL
  #endif

We can improve this by testing 'asm goto' support in Kconfig, then
make JUMP_LABEL depend on CC_HAS_ASM_GOTO.

Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will
match to the real kernel capability.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
2019-01-06 09:46:51 +09:00
Linus Torvalds 505b050fdf Merge branch 'mount.part1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs mount API prep from Al Viro:
 "Mount API prereqs.

  Mostly that's LSM mount options cleanups. There are several minor
  fixes in there, but nothing earth-shattering (leaks on failure exits,
  mostly)"

* 'mount.part1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (27 commits)
  mount_fs: suppress MAC on MS_SUBMOUNT as well as MS_KERNMOUNT
  smack: rewrite smack_sb_eat_lsm_opts()
  smack: get rid of match_token()
  smack: take the guts of smack_parse_opts_str() into a new helper
  LSM: new method: ->sb_add_mnt_opt()
  selinux: rewrite selinux_sb_eat_lsm_opts()
  selinux: regularize Opt_... names a bit
  selinux: switch away from match_token()
  selinux: new helper - selinux_add_opt()
  LSM: bury struct security_mnt_opts
  smack: switch to private smack_mnt_opts
  selinux: switch to private struct selinux_mnt_opts
  LSM: hide struct security_mnt_opts from any generic code
  selinux: kill selinux_sb_get_mnt_opts()
  LSM: turn sb_eat_lsm_opts() into a method
  nfs_remount(): don't leak, don't ignore LSM options quietly
  btrfs: sanitize security_mnt_opts use
  selinux; don't open-code a loop in sb_finish_set_opts()
  LSM: split ->sb_set_mnt_opts() out of ->sb_kern_mount()
  new helper: security_sb_eat_lsm_opts()
  ...
2019-01-05 13:25:58 -08:00
Linus Torvalds a65981109f Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:

 - procfs updates

 - various misc bits

 - lib/ updates

 - epoll updates

 - autofs

 - fatfs

 - a few more MM bits

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (58 commits)
  mm/page_io.c: fix polled swap page in
  checkpatch: add Co-developed-by to signature tags
  docs: fix Co-Developed-by docs
  drivers/base/platform.c: kmemleak ignore a known leak
  fs: don't open code lru_to_page()
  fs/: remove caller signal_pending branch predictions
  mm/: remove caller signal_pending branch predictions
  arch/arc/mm/fault.c: remove caller signal_pending_branch predictions
  kernel/sched/: remove caller signal_pending branch predictions
  kernel/locking/mutex.c: remove caller signal_pending branch predictions
  mm: select HAVE_MOVE_PMD on x86 for faster mremap
  mm: speed up mremap by 20x on large regions
  mm: treewide: remove unused address argument from pte_alloc functions
  initramfs: cleanup incomplete rootfs
  scripts/gdb: fix lx-version string output
  kernel/kcov.c: mark write_comp_data() as notrace
  kernel/sysctl: add panic_print into sysctl
  panic: add options to print system info when panic happens
  bfs: extra sanity checking and static inode bitmap
  exec: separate MM_ANONPAGES and RLIMIT_STACK accounting
  ...
2019-01-05 09:16:18 -08:00
Christoph Hellwig 06f55fd2d2 x86/amd_gart: fix unmapping of non-GART mappings
In many cases we don't have to create a GART mapping at all, which
also means there is nothing to unmap.  Fix the range check that was
incorrectly modified when removing the mapping_error method.

Fixes: 9e8aa6b546 ("x86/amd_gart: remove the mapping_error dma_map_ops method")
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Michal Kubecek <mkubecek@suse.cz>
2019-01-05 08:27:32 +01:00
Linus Torvalds 170d13ca3a x86: re-introduce non-generic memcpy_{to,from}io
This has been broken forever, and nobody ever really noticed because
it's purely a performance issue.

Long long ago, in commit 6175ddf06b ("x86: Clean up mem*io functions")
Brian Gerst simplified the memory copies to and from iomem, since on
x86, the instructions to access iomem are exactly the same as the
regular instructions.

That is technically true, and things worked, and nobody said anything.
Besides, back then the regular memcpy was pretty simple and worked fine.

Nobody noticed except for David Laight, that is.  David has a testing a
TLP monitor he was writing for an FPGA, and has been occasionally
complaining about how memcpy_toio() writes things one byte at a time.

Which is completely unacceptable from a performance standpoint, even if
it happens to technically work.

The reason it's writing one byte at a time is because while it's
technically true that accesses to iomem are the same as accesses to
regular memory on x86, the _granularity_ (and ordering) of accesses
matter to iomem in ways that they don't matter to regular cached memory.

In particular, when ERMS is set, we default to using "rep movsb" for
larger memory copies.  That is indeed perfectly fine for real memory,
since the whole point is that the CPU is going to do cacheline
optimizations and executes the memory copy efficiently for cached
memory.

With iomem? Not so much.  With iomem, "rep movsb" will indeed work, but
it will copy things one byte at a time. Slowly and ponderously.

Now, originally, back in 2010 when commit 6175ddf06b was done, we
didn't use ERMS, and this was much less noticeable.

Our normal memcpy() was simpler in other ways too.

Because in fact, it's not just about using the string instructions.  Our
memcpy() these days does things like "read and write overlapping values"
to handle the last bytes of the copy.  Again, for normal memory,
overlapping accesses isn't an issue.  For iomem? It can be.

So this re-introduces the specialized memcpy_toio(), memcpy_fromio() and
memset_io() functions.  It doesn't particularly optimize them, but it
tries to at least not be horrid, or do overlapping accesses.  In fact,
this uses the existing __inline_memcpy() function that we still had
lying around that uses our very traditional "rep movsl" loop followed by
movsw/movsb for the final bytes.

Somebody may decide to try to improve on it, but if we've gone almost a
decade with only one person really ever noticing and complaining, maybe
it's not worth worrying about further, once it's not _completely_ broken?

Reported-by: David Laight <David.Laight@aculab.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 18:15:33 -08:00
Linus Torvalds a959dc88f9 Use __put_user_goto in __put_user_size() and unsafe_put_user()
This actually enables the __put_user_goto() functionality in
unsafe_put_user().

For an example of the effect of this, this is the code generated for the

        unsafe_put_user(signo, &infop->si_signo, Efault);

in the waitid() system call:

	movl %ecx,(%rbx)        # signo, MEM[(struct __large_struct *)_2]

It's just one single store instruction, along with generating an
exception table entry pointing to the Efault label case in case that
instruction faults.

Before, we would generate this:

	xorl    %edx, %edx
	movl %ecx,(%rbx)        # signo, MEM[(struct __large_struct *)_3]
        testl   %edx, %edx
        jne     .L309

with the exception table generated for that 'mov' instruction causing us
to jump to a stub that set %edx to -EFAULT and then jumped back to the
'testl' instruction.

So not only do we now get rid of the extra code in the normal sequence,
we also avoid unnecessarily keeping that extra error register live
across it all.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 18:15:25 -08:00
Linus Torvalds 4a789213c9 x86 uaccess: Introduce __put_user_goto
This is finally the actual reason for the odd error handling in the
"unsafe_get/put_user()" functions, introduced over three years ago.

Using a "jump to error label" interface is somewhat odd, but very
convenient as a programming interface, and more importantly, it fits
very well with simply making the target be the exception handler address
directly from the inline asm.

The reason it took over three years to actually do this? We need "asm
goto" support for it, which only became the default on x86 last year.
It's now been a year that we've forced asm goto support (see commit
e501ce957a "x86: Force asm-goto"), and so let's just do it here too.

[ Side note: this commit was originally done back in 2016. The above
  commentary about timing is obviously about it only now getting merged
  into my real upstream tree     - Linus ]

Sadly, gcc still only supports "asm goto" with asms that do not have any
outputs, so we are limited to only the put_user case for this.  Maybe in
several more years we can do the get_user case too.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 18:00:49 -08:00
Joel Fernandes (Google) 9f132f7e14 mm: select HAVE_MOVE_PMD on x86 for faster mremap
Moving page-tables at the PMD-level on x86 is known to be safe.  Enable
this option so that we can do fast mremap when possible.

Link: http://lkml.kernel.org/r/20181108181201.88826-4-joelaf@google.com
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:48 -08:00
Joel Fernandes (Google) 4cf5892495 mm: treewide: remove unused address argument from pte_alloc functions
Patch series "Add support for fast mremap".

This series speeds up the mremap(2) syscall by copying page tables at
the PMD level even for non-THP systems.  There is concern that the extra
'address' argument that mremap passes to pte_alloc may do something
subtle architecture related in the future that may make the scheme not
work.  Also we find that there is no point in passing the 'address' to
pte_alloc since its unused.  This patch therefore removes this argument
tree-wide resulting in a nice negative diff as well.  Also ensuring
along the way that the enabled architectures do not do anything funky
with the 'address' argument that goes unnoticed by the optimization.

Build and boot tested on x86-64.  Build tested on arm64.  The config
enablement patch for arm64 will be posted in the future after more
testing.

The changes were obtained by applying the following Coccinelle script.
(thanks Julia for answering all Coccinelle questions!).
Following fix ups were done manually:
* Removal of address argument from  pte_fragment_alloc
* Removal of pte_alloc_one_fast definitions from m68k and microblaze.

// Options: --include-headers --no-includes
// Note: I split the 'identifier fn' line, so if you are manually
// running it, please unsplit it so it runs for you.

virtual patch

@pte_alloc_func_def depends on patch exists@
identifier E2;
identifier fn =~
"^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
type T2;
@@

 fn(...
- , T2 E2
 )
 { ... }

@pte_alloc_func_proto_noarg depends on patch exists@
type T1, T2, T3, T4;
identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
@@

(
- T3 fn(T1, T2);
+ T3 fn(T1);
|
- T3 fn(T1, T2, T4);
+ T3 fn(T1, T2);
)

@pte_alloc_func_proto depends on patch exists@
identifier E1, E2, E4;
type T1, T2, T3, T4;
identifier fn =~
"^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
@@

(
- T3 fn(T1 E1, T2 E2);
+ T3 fn(T1 E1);
|
- T3 fn(T1 E1, T2 E2, T4 E4);
+ T3 fn(T1 E1, T2 E2);
)

@pte_alloc_func_call depends on patch exists@
expression E2;
identifier fn =~
"^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
@@

 fn(...
-,  E2
 )

@pte_alloc_macro depends on patch exists@
identifier fn =~
"^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
identifier a, b, c;
expression e;
position p;
@@

(
- #define fn(a, b, c) e
+ #define fn(a, b) e
|
- #define fn(a, b) e
+ #define fn(a) e
)

Link: http://lkml.kernel.org/r/20181108181201.88826-2-joelaf@google.com
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:47 -08:00
Matthew Wilcox 3fc2579e6f fls: change parameter to unsigned int
When testing in userspace, UBSAN pointed out that shifting into the sign
bit is undefined behaviour.  It doesn't really make sense to ask for the
highest set bit of a negative value, so just turn the argument type into
an unsigned int.

Some architectures (eg ppc) already had it declared as an unsigned int,
so I don't expect too many problems.

Link: http://lkml.kernel.org/r/20181105221117.31828-1-willy@infradead.org
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:46 -08:00
Linus Torvalds 594cc251fd make 'user_access_begin()' do 'access_ok()'
Originally, the rule used to be that you'd have to do access_ok()
separately, and then user_access_begin() before actually doing the
direct (optimized) user access.

But experience has shown that people then decide not to do access_ok()
at all, and instead rely on it being implied by other operations or
similar.  Which makes it very hard to verify that the access has
actually been range-checked.

If you use the unsafe direct user accesses, hardware features (either
SMAP - Supervisor Mode Access Protection - on x86, or PAN - Privileged
Access Never - on ARM) do force you to use user_access_begin().  But
nothing really forces the range check.

By putting the range check into user_access_begin(), we actually force
people to do the right thing (tm), and the range check vill be visible
near the actual accesses.  We have way too long a history of people
trying to avoid them.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 12:56:09 -08:00
Linus Torvalds 96d4f267e4 Remove 'type' argument from access_ok() function
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.

It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access.  But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.

A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model.  And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.

This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.

There were a couple of notable cases:

 - csky still had the old "verify_area()" name as an alias.

 - the iter_iov code had magical hardcoded knowledge of the actual
   values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
   really used it)

 - microblaze used the type argument for a debug printout

but other than those oddities this should be a total no-op patch.

I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something.  Any missed conversion should be trivially fixable, though.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-03 18:57:57 -08:00
Linus Torvalds f218a29c25 Merge branch 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull integrity updates from James Morris:
 "In Linux 4.19, a new LSM hook named security_kernel_load_data was
  upstreamed, allowing LSMs and IMA to prevent the kexec_load syscall.
  Different signature verification methods exist for verifying the
  kexec'ed kernel image. This adds additional support in IMA to prevent
  loading unsigned kernel images via the kexec_load syscall,
  independently of the IMA policy rules, based on the runtime "secure
  boot" flag. An initial IMA kselftest is included.

  In addition, this pull request defines a new, separate keyring named
  ".platform" for storing the preboot/firmware keys needed for verifying
  the kexec'ed kernel image's signature and includes the associated IMA
  kexec usage of the ".platform" keyring.

  (David Howell's and Josh Boyer's patches for reading the
  preboot/firmware keys, which were previously posted for a different
  use case scenario, are included here)"

* 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  integrity: Remove references to module keyring
  ima: Use inode_is_open_for_write
  ima: Support platform keyring for kernel appraisal
  efi: Allow the "db" UEFI variable to be suppressed
  efi: Import certificates from UEFI Secure Boot
  efi: Add an EFI signature blob parser
  efi: Add EFI signature data types
  integrity: Load certs to the platform keyring
  integrity: Define a trusted platform keyring
  selftests/ima: kexec_load syscall test
  ima: don't measure/appraise files on efivarfs
  x86/ima: retry detecting secure boot mode
  docs: Extend trusted keys documentation for TPM 2.0
  x86/ima: define arch_get_ima_policy() for x86
  ima: add support for arch specific policies
  ima: refactor ima_init_policy()
  ima: prevent kexec_load syscall based on runtime secureboot flag
  x86/ima: define arch_ima_get_secureboot
  integrity: support new struct public_key_signature encoding field
2019-01-02 09:43:14 -08:00
Linus Torvalds 8e143b90e4 IOMMU Updates for Linux v4.21
Including (in no particular order):
 
 	- Page table code for AMD IOMMU now supports large pages where
 	  smaller page-sizes were mapped before. VFIO had to work around
 	  that in the past and I included a patch to remove it (acked by
 	  Alex Williamson)
 
 	- Patches to unmodularize a couple of IOMMU drivers that would
 	  never work as modules anyway.
 
 	- Work to unify the the iommu-related pointers in
 	  'struct device' into one pointer. This work is not finished
 	  yet, but will probably be in the next cycle.
 
 	- NUMA aware allocation in iommu-dma code
 
 	- Support for r8a774a1 and r8a774c0 in the Renesas IOMMU driver
 
 	- Scalable mode support for the Intel VT-d driver
 
 	- PM runtime improvements for the ARM-SMMU driver
 
 	- Support for the QCOM-SMMUv2 IOMMU hardware from Qualcom
 
 	- Various smaller fixes and improvements
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJcKkEoAAoJECvwRC2XARrjCCoQAJxsgaAF5Z0s7z8j2A9SkaGp
 SIMnUAI5mDOdyhTOAI+eehpRzg5UVYt/JjFYnHz8HWqbSc8YOvDvHafmhMFIwYvO
 hq5knbs6ns2jJNFO+M4dioDq+3THdqkGIF5xoHdGTP7cn9+XyQ8lAoHo0RuL122U
 PJGqX7Cp4XnFP4HMb3uQYhVeBV7mU+XqAdB+4aDnQkzI5LkQCRr74GcqOm+Rlnyc
 cmQWc2arUMjgc1TJIrex8dx9dT6lq8kOmhyEg/IjHeGaZyJ3HqA+30XDDLEExN0G
 MeVawuxJz40HgXlkXr+iZTQtIFYkXdKvJH6rptMbOfbDeDz+YZ01TbtAMMH9o4jX
 yxjjMjdcWTsWYQ/MHHdsoMP34cajCi/EYPMNksbycw+E3Y+X/bSReCoWC0HUK8/+
 Z4TpZ9mZVygtJR+QNZ+pE9oiJpb4sroM10zTnbMoVHNnvfsO01FYk7FMPkolSKLw
 zB4MDswQYgchoFR9Z4ZB4PycYTzeafLKYgDPDoD1vIJgDavuidwvDWDRTDc+aMWM
 siIIewq19To9jDJkVjX4dsT/p99KVKgAR/Ps6jjWkAroha7g6GcmlYZHIJnyop04
 jiaSXUsk8aRucP/CRz5xdMmaGoN7BsNmpUjcrquc6Povk/6gvXvpY04oCs1+gNMX
 ipL9E3GTFCVBubRFrksv
 =DT9A
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU updates from Joerg Roedel:

 - Page table code for AMD IOMMU now supports large pages where smaller
   page-sizes were mapped before. VFIO had to work around that in the
   past and I included a patch to remove it (acked by Alex Williamson)

 - Patches to unmodularize a couple of IOMMU drivers that would never
   work as modules anyway.

 - Work to unify the the iommu-related pointers in 'struct device' into
   one pointer. This work is not finished yet, but will probably be in
   the next cycle.

 - NUMA aware allocation in iommu-dma code

 - Support for r8a774a1 and r8a774c0 in the Renesas IOMMU driver

 - Scalable mode support for the Intel VT-d driver

 - PM runtime improvements for the ARM-SMMU driver

 - Support for the QCOM-SMMUv2 IOMMU hardware from Qualcom

 - Various smaller fixes and improvements

* tag 'iommu-updates-v4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (78 commits)
  iommu: Check for iommu_ops == NULL in iommu_probe_device()
  ACPI/IORT: Don't call iommu_ops->add_device directly
  iommu/of: Don't call iommu_ops->add_device directly
  iommu: Consolitate ->add/remove_device() calls
  iommu/sysfs: Rename iommu_release_device()
  dmaengine: sh: rcar-dmac: Use device_iommu_mapped()
  xhci: Use device_iommu_mapped()
  powerpc/iommu: Use device_iommu_mapped()
  ACPI/IORT: Use device_iommu_mapped()
  iommu/of: Use device_iommu_mapped()
  driver core: Introduce device_iommu_mapped() function
  iommu/tegra: Use helper functions to access dev->iommu_fwspec
  iommu/qcom: Use helper functions to access dev->iommu_fwspec
  iommu/of: Use helper functions to access dev->iommu_fwspec
  iommu/mediatek: Use helper functions to access dev->iommu_fwspec
  iommu/ipmmu-vmsa: Use helper functions to access dev->iommu_fwspec
  iommu/dma: Use helper functions to access dev->iommu_fwspec
  iommu/arm-smmu: Use helper functions to access dev->iommu_fwspec
  ACPI/IORT: Use helper functions to access dev->iommu_fwspec
  iommu: Introduce wrappers around dev->iommu_fwspec
  ...
2019-01-01 15:55:29 -08:00
Linus Torvalds fcf010449e kgdb patches for 4.20-rc1
Mostly clean ups although whilst Doug's was chasing down a odd
 lockdep warning he also did some work to improved debugger resilience
 when some CPUs fail to respond to the round up request.
 
 The main changes are:
 
  * Fixing a lockdep warning on architectures that cannot use an NMI for
    the round up plus related changes to make CPU round up and all CPU
    backtrace more resilient.
 
  * Constify the arch ops tables
 
  * A couple of other small clean ups
 
 Two of the three patchsets here include changes that spill over into
 arch/.  Changes in the arch space are relatively narrow in scope
 (and directly related to kgdb). Didn't get comprehensive acks but
 all impacted maintainers were Cc:ed in good time.
 
 Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEELzVBU1D3lWq6cKzwfOMlXTn3iKEFAlwonoUbHGRhbmllbC50
 aG9tcHNvbkBsaW5hcm8ub3JnAAoJEHzjJV0594ihmooP/1uzSMGQIoQMB8XeU/jT
 Da2iILybi6hGp7ILA27d0yN3tsJBxWGWs8wzNdzMo3NQ3J0o4foAUnS/R0Vjkg9w
 uphe5EA4HDsIrH05OouNb984BeEgNaC9HSqtyr9fXuh024NboULFKIm7REYm+QHT
 C5SrBtmonL1xE7FmAhudLWjl7ZlvxM6DJeoVViH4kKq0raTiILt6VJaGl9JfcAdL
 m9GEf9r/nh0sCq3GNgyc0y4BvHed+Kxzy1fsIi3jE6t8elaYYR72gNRQ5LaFxcnQ
 F04/UtH75qB4rqYsqqV1q0rFi+tj+p9wYTmxixaGWsVDX4Gb5KXuLWJhaRb5IvwC
 bdq/0IAXRr4vUL3y0tFWfCj7pHGaVc/gfXi8aieRXLGAZG+tdfuu99NCiulIZTfc
 QqZz12Z+99/qi6dK7dBQtaN8SyPeB1QXKWefeGo2Bt5QqiBmcKHxsQYMUo3nkf3J
 UXHpj4LG6Ldsi/w8VZfvXmM0/vbO/jrus9m+X2v+4tJyisjrsyv0FRnREI4avfbC
 l09P1ajv7RrAaxtab0smV9krqWZ/mSn0zcgcaD6RdKe0+SwsiP/CEx1z1Wb1MH9c
 wjEiClXjdVB39YVT0YVfG2Ho7qH8WRErxVyNb/f4QKHMXL1Mu91hFWhBBpUOGUj2
 7Jrq2zK1uWramtt7GBDpHYYH
 =Aqlc
 -----END PGP SIGNATURE-----

Merge tag 'kgdb-4.21-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux

Pull kgdb updates from Daniel Thompson:
 "Mostly clean ups although while Doug's was chasing down a odd lockdep
  warning he also did some work to improved debugger resilience when
  some CPUs fail to respond to the round up request.

  The main changes are:

   - Fixing a lockdep warning on architectures that cannot use an NMI
     for the round up plus related changes to make CPU round up and all
     CPU backtrace more resilient.

   - Constify the arch ops tables

   - A couple of other small clean ups

  Two of the three patchsets here include changes that spill over into
  arch/. Changes in the arch space are relatively narrow in scope (and
  directly related to kgdb). Didn't get comprehensive acks but all
  impacted maintainers were Cc:ed in good time"

* tag 'kgdb-4.21-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
  kgdb/treewide: constify struct kgdb_arch arch_kgdb_ops
  mips/kgdb: prepare arch_kgdb_ops for constness
  kdb: use bool for binary state indicators
  kdb: Don't back trace on a cpu that didn't round up
  kgdb: Don't round up a CPU that failed rounding up before
  kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()
  kgdb: Remove irq flags from roundup
2019-01-01 15:38:14 -08:00
Linus Torvalds 495d714ad1 Tracing changes for v4.21:
- Rework of the kprobe/uprobe and synthetic events to consolidate all
    the dynamic event code. This will make changes in the future easier.
 
  - Partial rewrite of the function graph tracing infrastructure.
    This will allow for multiple users of hooking onto functions
    to get the callback (return) of the function. This is the ground
    work for having kprobes and function graph tracer using one code base.
 
  - Clean up of the histogram code that will facilitate adding more
    features to the histograms in the future.
 
  - Addition of str_has_prefix() and a few use cases. There currently
    is a similar function strstart() that is used in a few places, but
    only returns a bool and not a length. These instances will be
    removed in the future to use str_has_prefix() instead.
 
  - A few other various clean ups as well.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXCawlBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qhbcAQCFeT0fWWTUxofBQz5jqsHaRnVg21+9
 X4sTldYRYEn4YgEAmWOyiwq7zvrsAu4ZwkNBMeqxn3tVymYHiGOGe3Y4BAw=
 =u96o
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:

 - Rework of the kprobe/uprobe and synthetic events to consolidate all
   the dynamic event code. This will make changes in the future easier.

 - Partial rewrite of the function graph tracing infrastructure. This
   will allow for multiple users of hooking onto functions to get the
   callback (return) of the function. This is the ground work for having
   kprobes and function graph tracer using one code base.

 - Clean up of the histogram code that will facilitate adding more
   features to the histograms in the future.

 - Addition of str_has_prefix() and a few use cases. There currently is
   a similar function strstart() that is used in a few places, but only
   returns a bool and not a length. These instances will be removed in
   the future to use str_has_prefix() instead.

 - A few other various clean ups as well.

* tag 'trace-v4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (57 commits)
  tracing: Use the return of str_has_prefix() to remove open coded numbers
  tracing: Have the historgram use the result of str_has_prefix() for len of prefix
  tracing: Use str_has_prefix() instead of using fixed sizes
  tracing: Use str_has_prefix() helper for histogram code
  string.h: Add str_has_prefix() helper function
  tracing: Make function ‘ftrace_exports’ static
  tracing: Simplify printf'ing in seq_print_sym
  tracing: Avoid -Wformat-nonliteral warning
  tracing: Merge seq_print_sym_short() and seq_print_sym_offset()
  tracing: Add hist trigger comments for variable-related fields
  tracing: Remove hist trigger synth_var_refs
  tracing: Use hist trigger's var_ref array to destroy var_refs
  tracing: Remove open-coding of hist trigger var_ref management
  tracing: Use var_refs[] for hist trigger reference checking
  tracing: Change strlen to sizeof for hist trigger static strings
  tracing: Remove unnecessary hist trigger struct field
  tracing: Fix ftrace_graph_get_ret_stack() to use task and not current
  seq_buf: Use size_t for len in seq_buf_puts()
  seq_buf: Make seq_buf_puts() null-terminate the buffer
  arm64: Use ftrace_graph_get_ret_stack() instead of curr_ret_stack
  ...
2018-12-31 11:46:59 -08:00
Christophe Leroy cc0282975b kgdb/treewide: constify struct kgdb_arch arch_kgdb_ops
checkpatch.pl reports the following:

  WARNING: struct kgdb_arch should normally be const
  #28: FILE: arch/mips/kernel/kgdb.c:397:
  +struct kgdb_arch arch_kgdb_ops = {

This report makes sense, as all other ops struct, this
one should also be const. This patch does the change.

Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Acked-by: Paul Burton <paul.burton@mips.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2018-12-30 08:33:06 +00:00
Douglas Anderson 9ef7fa507d kgdb: Remove irq flags from roundup
The function kgdb_roundup_cpus() was passed a parameter that was
documented as:

> the flags that will be used when restoring the interrupts. There is
> local_irq_save() call before kgdb_roundup_cpus().

Nobody used those flags.  Anyone who wanted to temporarily turn on
interrupts just did local_irq_enable() and local_irq_disable() without
looking at them.  So we can definitely remove the flags.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2018-12-30 08:24:21 +00:00
Linus Torvalds 195303136f Kconfig file consolidation for v4.21
Consolidation of bus (PCI, PCMCIA, EISA, RapidIO) config entries
 by Christoph Hellwig.
 
 Currently, every architecture that wants to provide common peripheral
 busses needs to add some boilerplate code and include the right Kconfig
 files. This series instead just selects the presence (when needed) and
 then handles everything in the bus-specific Kconfig file under drivers/.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJcJilwAAoJED2LAQed4NsGt1YP/RMTEUqbCSwS/CnTLrE+aVTC
 O2aWwB80ZlVwpeBbHLW5/M88OvOev0UaCr+gyzgpFRl5ITzS7Jevb8VbpGzblbH7
 bFxIEyZFGQiy9oEWw3Lfu9JRSsLm3jNo7hkmdBSn2Rw3KkEd/YF7K3q9GuA7BpCS
 ZxAirebvEpr4KYEzkuc57NqCYx2Tc8G+JWr5D7pZCFaq9vxYt3TddGqw/c7iQVSQ
 1Og1809IdhGyCSlA/ExfaqaBMaJHMRAOHX5GgkqZw1EbFcizUFhAAsKCrGL5nBtX
 NiWF9jhgHR1M+L69jfctOstrmGQD2KicNgWQf1aS5RQkPfjuqIKGT/i9g6J1pVyX
 TaW1J36Hcl8PpsKoPBnnrixd1T41O3/PuqtEJRm7LCBYOQiwS9sEmLO09RDRjER8
 SPAAyvkhE8oq+0RHiTYN4tm8dyJc1djZ5wzgLnwFPAnU6SR+mbN02RzBMsYZXD+x
 RNbBSGBRJFQDBw6Rn+ktcIQvcKYmUqe1k1YNHMy6kG3QqvhBaDy+8PA/YjIKPQYQ
 B/NNUAMEJMys1OQrRL2UDXb2ysaCpzwMmlrBW2IwYsQrX5OwbPkNuQ5Mbe1Lr+mc
 4NXR+HubvojsHaAby+OhFbrUX2Jcz3wqYj7aannb9sMRmw0VJXV5dPYUqje3ZhPS
 P2AovKT8O9nWsEttqER5
 =WxId
 -----END PGP SIGNATURE-----

Merge tag 'kconfig-v4.21-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kconfig file consolidation from Masahiro Yamada:
 "Consolidation of bus (PCI, PCMCIA, EISA, RapidIO) config entries by
  Christoph Hellwig.

  Currently, every architecture that wants to provide common peripheral
  busses needs to add some boilerplate code and include the right
  Kconfig files. This series instead just selects the presence (when
  needed) and then handles everything in the bus-specific Kconfig file
  under drivers/"

* tag 'kconfig-v4.21-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  pcmcia: remove per-arch PCMCIA config entry
  eisa: consolidate EISA Kconfig entry in drivers/eisa
  rapidio: consolidate RAPIDIO config entry in drivers/rapidio
  pcmcia: allow PCMCIA support independent of the architecture
  PCI: consolidate the PCI_SYSCALL symbol
  PCI: consolidate the PCI_DOMAINS and PCI_DOMAINS_GENERIC config options
  PCI: consolidate PCI config entry in drivers/pci
  MIPS: remove the HT_PCI config option
2018-12-29 13:40:29 -08:00
Linus Torvalds 769e47094d Kconfig updates for v4.21
- support -y option for merge_config.sh to avoid downgrading =y to =m
 
  - remove S_OTHER symbol type, and touch include/config/*.h files correctly
 
  - fix file name and line number in lexer warnings
 
  - fix memory leak when EOF is encountered in quotation
 
  - resolve all shift/reduce conflicts of the parser
 
  - warn no new line at end of file
 
  - make 'source' statement more strict to take only string literal
 
  - rewrite the lexer and remove the keyword lookup table
 
  - convert to SPDX License Identifier
 
  - compile C files independently instead of including them from zconf.y
 
  - fix various warnings of gconfig
 
  - misc cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJcJieuAAoJED2LAQed4NsGHlIP/1s0fQ86XD9dIMyHzAO0gh2f
 7rylfe2kEXJgIzJ0DyZdLu4iZtwbkEUqTQrRS1abriNGVemPkfBAnZdM5d92lOQX
 3iREa700AJ2xo7V7gYZ6AbhZoG3p0S9U9Q2qE5S+tFTe8c2Gy4xtjnODF+Vel85r
 S0P8tF5sE1/d00lm+yfMI/CJVfDjyNaMm+aVEnL0kZTPiRkaktjWgo6Fc2p4z1L5
 HFmMMP6/iaXmRZ+tHJGPQ2AT70GFVZw5ePxPcl50EotUP25KHbuUdzs8wDpYm3U/
 rcESVsIFpgqHWmTsdBk6dZk0q8yFZNkMlkaP/aYukVZpUn/N6oAXgTFckYl8dmQL
 fQBkQi6DTfr9EBPVbj18BKm7xI3Y4DdQ2fzTfYkJ2XwNRGFA5r9N3sjd7ZTVGjxC
 aeeMHCwvGdSx1x8PeZAhZfsUHW8xVDMSQiT713+ljBY+6cwzA+2NF0kP7B6OAqwr
 ETFzd4Xu2/lZcL7gQRH8WU3L2S5iedmDG6RnZgJMXI0/9V4qAA+nlsWaCgnl1TgA
 mpxYlLUMrd6AUJevE34FlnyFdk8IMn9iKRFsvF0f3doO5C7QzTVGqFdJu5a0CuWO
 4NBJvZjFT8/4amoWLfnDlfApWXzTfwLbKG+r6V2F30fLuXpYg5LxWhBoGRPYLZSq
 oi4xN1Mpx3TvXz6WcKVZ
 =r3Fl
 -----END PGP SIGNATURE-----

Merge tag 'kconfig-v4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kconfig updates from Masahiro Yamada:

 - support -y option for merge_config.sh to avoid downgrading =y to =m

 - remove S_OTHER symbol type, and touch include/config/*.h files correctly

 - fix file name and line number in lexer warnings

 - fix memory leak when EOF is encountered in quotation

 - resolve all shift/reduce conflicts of the parser

 - warn no new line at end of file

 - make 'source' statement more strict to take only string literal

 - rewrite the lexer and remove the keyword lookup table

 - convert to SPDX License Identifier

 - compile C files independently instead of including them from zconf.y

 - fix various warnings of gconfig

 - misc cleanups

* tag 'kconfig-v4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (39 commits)
  kconfig: surround dbg_sym_flags with #ifdef DEBUG to fix gconf warning
  kconfig: split images.c out of qconf.cc/gconf.c to fix gconf warnings
  kconfig: add static qualifiers to fix gconf warnings
  kconfig: split the lexer out of zconf.y
  kconfig: split some C files out of zconf.y
  kconfig: convert to SPDX License Identifier
  kconfig: remove keyword lookup table entirely
  kconfig: update current_pos in the second lexer
  kconfig: switch to ASSIGN_VAL state in the second lexer
  kconfig: stop associating kconf_id with yylval
  kconfig: refactor end token rules
  kconfig: stop supporting '.' and '/' in unquoted words
  treewide: surround Kconfig file paths with double quotes
  microblaze: surround string default in Kconfig with double quotes
  kconfig: use T_WORD instead of T_VARIABLE for variables
  kconfig: use specific tokens instead of T_ASSIGN for assignments
  kconfig: refactor scanning and parsing "option" properties
  kconfig: use distinct tokens for type and default properties
  kconfig: remove redundant token defines
  kconfig: rename depends_list to comment_option_list
  ...
2018-12-29 13:03:29 -08:00
Linus Torvalds 668c35f69c Kbuild updates for v4.21
Kbuild core:
  - remove unneeded $(call cc-option,...) switches
  - consolidate Clang compiler flags into CLANG_FLAGS
  - announce the deprecation of SUBDIRS
  - fix single target build for external module
  - simplify the dependencies of 'prepare' stage targets
  - allow fixdep to directly write to .*.cmd files
  - simplify dependency generation for CONFIG_TRIM_UNUSED_KSYMS
  - change if_changed_rule to accept multi-line recipe
  - move .SECONDARY special target to scripts/Kbuild.include
  - remove redundant 'set -e'
  - improve parallel execution for CONFIG_HEADERS_CHECK
  - misc cleanups
 
 Treewide fixes and cleanups
  - set Clang flags correctly for PowerPC boot images
  - fix UML build error with CONFIG_GCC_PLUGINS
  - remove unneeded patterns from .gitignore files
  - refactor firmware/Makefile
  - remove unneeded rules for *offsets.s
  - avoid unneeded regeneration of intermediate .s files
  - clean up ./Kbuild
 
 Modpost:
  - remove unused -M, -K options
  - fix false positive warnings about section mismatch
  - use simple devtable lookup instead of linker magic
  - misc cleanups
 
 Coccinelle:
  - relax boolinit.cocci checks for overall consistency
  - fix warning messages of boolinit.cocci
 
 Other tools:
  - improve -dirty check of scripts/setlocalversion
  - add a tool to generate compile_commands.json from .*.cmd files
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJcJiVeAAoJED2LAQed4NsGXEMP/1fdkBfZGxYlxp4w1tEBeRKZ
 GInxxMfHiHZlu7nFPQ3k1mncczZjeLbnKcLaybnPmqBSfQe1F7tN0ux1sHtWkVfI
 DvqoKaHObZ3lEMEVxCHXQ2bKrN/j9nB2xSxzr4dvc9DscW8dwKElZuKVE7nHKdhl
 z70xsalxHGEGY6hptJrucbv8KTBOSleZ8Gaat79sEDkDSLCTjxXB3WcVMWqDT0M/
 IqN5QCwiPjZC3UCwuqq6+vnG1gyvDUORcbrVgHrBIKxLYAABYzugB5IlLpi8B31C
 ZUDmijSTandHd4SG5gw5uZoyYuK4YhZeI7g4yNyXSqnPurmcJxrGdpiuwRcE7xet
 5yB2uaNbAFO6Fbz4gUdDEvryA9IZEPPn1Z0Btfpp7UOxiWEqE61oHReCNdkiad94
 Oonl+ROZw5UOT3AZD3xCZTf9bTnQoCFccHTmnbaKqqSiDWxxLUXum7sNfnJ6GXEb
 fNQDuwuxtsPStaWoU93QNrGyl5e8jYTKFdphUCnZZ2/hE8ygoQ06PYmeBAk+UKfN
 UI2IqR8PHgmA97XJg+fHhbw6X4nQcPsg/usLqxGItAZNO2O4uXgaN24dOcHG74Bu
 2Dk/gQQB2sVB/Dxfmlaxvvj98MVg7IMtpusAT2bQ7miiSS3EqPFT8KQMZYLICWuZ
 u7QQe20yCho3ZULmsRwH
 =0PCt
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:
 "Kbuild core:
   - remove unneeded $(call cc-option,...) switches
   - consolidate Clang compiler flags into CLANG_FLAGS
   - announce the deprecation of SUBDIRS
   - fix single target build for external module
   - simplify the dependencies of 'prepare' stage targets
   - allow fixdep to directly write to .*.cmd files
   - simplify dependency generation for CONFIG_TRIM_UNUSED_KSYMS
   - change if_changed_rule to accept multi-line recipe
   - move .SECONDARY special target to scripts/Kbuild.include
   - remove redundant 'set -e'
   - improve parallel execution for CONFIG_HEADERS_CHECK
   - misc cleanups

  Treewide fixes and cleanups
   - set Clang flags correctly for PowerPC boot images
   - fix UML build error with CONFIG_GCC_PLUGINS
   - remove unneeded patterns from .gitignore files
   - refactor firmware/Makefile
   - remove unneeded rules for *offsets.s
   - avoid unneeded regeneration of intermediate .s files
   - clean up ./Kbuild

  Modpost:
   - remove unused -M, -K options
   - fix false positive warnings about section mismatch
   - use simple devtable lookup instead of linker magic
   - misc cleanups

  Coccinelle:
   - relax boolinit.cocci checks for overall consistency
   - fix warning messages of boolinit.cocci

  Other tools:
   - improve -dirty check of scripts/setlocalversion
   - add a tool to generate compile_commands.json from .*.cmd files"

* tag 'kbuild-v4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (51 commits)
  kbuild: remove unused cmd_gentimeconst
  kbuild: remove $(obj)/ prefixes in ./Kbuild
  treewide: add intermediate .s files to targets
  treewide: remove explicit rules for *offsets.s
  firmware: refactor firmware/Makefile
  firmware: remove unnecessary patterns from .gitignore
  scripts: remove unnecessary ihex2fw and check-lc_ctypes from .gitignore
  um: remove unused filechk_gen_header in Makefile
  scripts: add a tool to produce a compile_commands.json file
  kbuild: add -Werror=implicit-int flag unconditionally
  kbuild: add -Werror=strict-prototypes flag unconditionally
  kbuild: add -fno-PIE flag unconditionally
  scripts: coccinelle: Correct warning message
  scripts: coccinelle: only suggest true/false in files that already use them
  kbuild: handle part-of-module correctly for *.ll and *.symtypes
  kbuild: refactor part-of-module
  kbuild: refactor quiet_modtag
  kbuild: remove redundant quiet_modtag for $(obj-m)
  kbuild: refactor Makefile.asm-generic
  user/Makefile: Fix typo and capitalization in comment section
  ...
2018-12-29 12:03:17 -08:00
Linus Torvalds f346b0becb Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:

 - large KASAN update to use arm's "software tag-based mode"

 - a few misc things

 - sh updates

 - ocfs2 updates

 - just about all of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (167 commits)
  kernel/fork.c: mark 'stack_vm_area' with __maybe_unused
  memcg, oom: notify on oom killer invocation from the charge path
  mm, swap: fix swapoff with KSM pages
  include/linux/gfp.h: fix typo
  mm/hmm: fix memremap.h, move dev_page_fault_t callback to hmm
  hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race
  hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
  memory_hotplug: add missing newlines to debugging output
  mm: remove __hugepage_set_anon_rmap()
  include/linux/vmstat.h: remove unused page state adjustment macro
  mm/page_alloc.c: allow error injection
  mm: migrate: drop unused argument of migrate_page_move_mapping()
  blkdev: avoid migration stalls for blkdev pages
  mm: migrate: provide buffer_migrate_page_norefs()
  mm: migrate: move migrate_page_lock_buffers()
  mm: migrate: lock buffers before migrate_page_move_mapping()
  mm: migration: factor out code to compute expected number of page references
  mm, page_alloc: enable pcpu_drain with zone capability
  kmemleak: add config to select auto scan
  mm/page_alloc.c: don't call kasan_free_pages() at deferred mem init
  ...
2018-12-28 16:55:46 -08:00
Linus Torvalds af7ddd8a62 DMA mapping updates for Linux 4.21
A huge update this time, but a lot of that is just consolidating or
 removing code:
 
  - provide a common DMA_MAPPING_ERROR definition and avoid indirect
    calls for dma_map_* error checking
  - use direct calls for the DMA direct mapping case, avoiding huge
    retpoline overhead for high performance workloads
  - merge the swiotlb dma_map_ops into dma-direct
  - provide a generic remapping DMA consistent allocator for architectures
    that have devices that perform DMA that is not cache coherent. Based
    on the existing arm64 implementation and also used for csky now.
  - improve the dma-debug infrastructure, including dynamic allocation
    of entries (Robin Murphy)
  - default to providing chaining scatterlist everywhere, with opt-outs
    for the few architectures (alpha, parisc, most arm32 variants) that
    can't cope with it
  - misc sparc32 dma-related cleanups
  - remove the dma_mark_clean arch hook used by swiotlb on ia64 and
    replace it with the generic noncoherent infrastructure
  - fix the return type of dma_set_max_seg_size (Niklas Söderlund)
  - move the dummy dma ops for not DMA capable devices from arm64 to
    common code (Robin Murphy)
  - ensure dma_alloc_coherent returns zeroed memory to avoid kernel data
    leaks through userspace.  We already did this for most common
    architectures, but this ensures we do it everywhere.
    dma_zalloc_coherent has been deprecated and can hopefully be
    removed after -rc1 with a coccinelle script.
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlwctQgLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYMxgQ//dBpAfS4/J76CdAbYry2zqgcOUU9hIrD6NHiEMWov
 ltJxyvEl3LsUmIdEj3aCrYL9jZN0qsnCzn5BVj2c3jDIVgD64fAr7HDf/PbEEfKb
 j6/GgEnVLPZV+sQMvhNA5jOzHrkseaqPa4/pNLFZ/l8jnuZ2d+btusDWJpMoVDer
 TXVwtIfgeIu0gTygYOShLYXd5qptWKWsZEpbTZOO2sE6+x+ZJX7yQYUxYDTlcOIj
 JWVO2l5QNHPc5T9o2at+6L5aNUvnZOxT79sWgyZLn0Kc+FagKAVwfLqUEl0v7foG
 8k/xca5/8p3afB1DfrIrtplJqis7cVgdyGxriwuuoO8X4F0nPyWwpGmxsBhrWwwl
 xTqC4UorEJ7QwoP6Azopk/vYI2QXIUBLjuCJCuFXZj9+2BGf4IfvBY1S2cLM9qLs
 HMcxQonuXJii044KEFS96ePEuiT+igVINweIFBKWcgNCEG0UQtyL6RQ1U5297ipF
 JiWZAqD+p9X52UdKS+oKfAiZEekMXn6Xyo97+YCiNpfOo0GP5eEcwhL+JpY4AiRq
 apPXtsRy2o1s8yfjdraUIM2Mc2n62vFKb35oUbGCd/QO9piPrFQHl6T0HHcHk4YR
 XrUXcHieFZBCYqh7ZVa4RL8Msq1wvGuTL4Dxl43mXdsMoUFRR6eSNWLoAV4IpOLZ
 WgA=
 =in72
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-4.21' of git://git.infradead.org/users/hch/dma-mapping

Pull DMA mapping updates from Christoph Hellwig:
 "A huge update this time, but a lot of that is just consolidating or
  removing code:

   - provide a common DMA_MAPPING_ERROR definition and avoid indirect
     calls for dma_map_* error checking

   - use direct calls for the DMA direct mapping case, avoiding huge
     retpoline overhead for high performance workloads

   - merge the swiotlb dma_map_ops into dma-direct

   - provide a generic remapping DMA consistent allocator for
     architectures that have devices that perform DMA that is not cache
     coherent. Based on the existing arm64 implementation and also used
     for csky now.

   - improve the dma-debug infrastructure, including dynamic allocation
     of entries (Robin Murphy)

   - default to providing chaining scatterlist everywhere, with opt-outs
     for the few architectures (alpha, parisc, most arm32 variants) that
     can't cope with it

   - misc sparc32 dma-related cleanups

   - remove the dma_mark_clean arch hook used by swiotlb on ia64 and
     replace it with the generic noncoherent infrastructure

   - fix the return type of dma_set_max_seg_size (Niklas Söderlund)

   - move the dummy dma ops for not DMA capable devices from arm64 to
     common code (Robin Murphy)

   - ensure dma_alloc_coherent returns zeroed memory to avoid kernel
     data leaks through userspace. We already did this for most common
     architectures, but this ensures we do it everywhere.
     dma_zalloc_coherent has been deprecated and can hopefully be
     removed after -rc1 with a coccinelle script"

* tag 'dma-mapping-4.21' of git://git.infradead.org/users/hch/dma-mapping: (73 commits)
  dma-mapping: fix inverted logic in dma_supported
  dma-mapping: deprecate dma_zalloc_coherent
  dma-mapping: zero memory returned from dma_alloc_*
  sparc/iommu: fix ->map_sg return value
  sparc/io-unit: fix ->map_sg return value
  arm64: default to the direct mapping in get_arch_dma_ops
  PCI: Remove unused attr variable in pci_dma_configure
  ia64: only select ARCH_HAS_DMA_COHERENT_TO_PFN if swiotlb is enabled
  dma-mapping: bypass indirect calls for dma-direct
  vmd: use the proper dma_* APIs instead of direct methods calls
  dma-direct: merge swiotlb_dma_ops into the dma_direct code
  dma-direct: use dma_direct_map_page to implement dma_direct_map_sg
  dma-direct: improve addressability error reporting
  swiotlb: remove dma_mark_clean
  swiotlb: remove SWIOTLB_MAP_ERROR
  ACPI / scan: Refactor _CCA enforcement
  dma-mapping: factor out dummy DMA ops
  dma-mapping: always build the direct mapping code
  dma-mapping: move dma_cache_sync out of line
  dma-mapping: move various slow path functions out of line
  ...
2018-12-28 14:12:21 -08:00
Will Deacon 8e2d43405b lib/ioremap: ensure break-before-make is used for huge p4d mappings
Whilst no architectures actually enable support for huge p4d mappings in
the vmap area, the code that is implemented should be using
break-before-make, as we do for pud and pmd huge entries.

Link: http://lkml.kernel.org/r/1544120495-17438-6-git-send-email-will.deacon@arm.com
Signed-off-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Chintan Pandya <cpandya@codeaurora.org>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 12:11:50 -08:00
Will Deacon 48e178ab0d x86/pgtable: drop pXd_none() checks from pXd_free_pYd_table()
The core code already has a check for pXd_none(), so remove it from the
architecture implementation.

Link: http://lkml.kernel.org/r/1544120495-17438-4-git-send-email-will.deacon@arm.com
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Chintan Pandya <cpandya@codeaurora.org>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 12:11:50 -08:00
Oscar Salvador 2c2a5af6fe mm, memory_hotplug: add nid parameter to arch_remove_memory
Patch series "Do not touch pages in hot-remove path", v2.

This patchset aims for two things:

 1) A better definition about offline and hot-remove stage
 2) Solving bugs where we can access non-initialized pages
    during hot-remove operations [2] [3].

This is achieved by moving all page/zone handling to the offline
stage, so we do not need to access pages when hot-removing memory.

[1] https://patchwork.kernel.org/cover/10691415/
[2] https://patchwork.kernel.org/patch/10547445/
[3] https://www.spinics.net/lists/linux-mm/msg161316.html

This patch (of 5):

This is a preparation for the following-up patches.  The idea of passing
the nid is that it will allow us to get rid of the zone parameter
afterwards.

Link: http://lkml.kernel.org/r/20181127162005.15833-2-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 12:11:49 -08:00
Alexey Dobriyan e5cb113f2d mm: make free_reserved_area() return "const char *"
and propagate through down the call stack.

Link: http://lkml.kernel.org/r/20181124091411.GC10969@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 12:11:48 -08:00
Arun KS ca79b0c211 mm: convert totalram_pages and totalhigh_pages variables to atomic
totalram_pages and totalhigh_pages are made static inline function.

Main motivation was that managed_page_count_lock handling was complicating
things.  It was discussed in length here,
https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes
better to remove the lock and convert variables to atomic, with preventing
poteintial store-to-read tearing as a bonus.

[akpm@linux-foundation.org: coding style fixes]
Link: http://lkml.kernel.org/r/1542090790-21750-4-git-send-email-arunks@codeaurora.org
Signed-off-by: Arun KS <arunks@codeaurora.org>
Suggested-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 12:11:47 -08:00
Arun KS 3d6357de8a mm: reference totalram_pages and managed_pages once per function
Patch series "mm: convert totalram_pages, totalhigh_pages and managed
pages to atomic", v5.

This series converts totalram_pages, totalhigh_pages and
zone->managed_pages to atomic variables.

totalram_pages, zone->managed_pages and totalhigh_pages updates are
protected by managed_page_count_lock, but readers never care about it.
Convert these variables to atomic to avoid readers potentially seeing a
store tear.

Main motivation was that managed_page_count_lock handling was complicating
things.  It was discussed in length here,
https://lore.kernel.org/patchwork/patch/995739/#1181785 It seemes better
to remove the lock and convert variables to atomic.  With the change,
preventing poteintial store-to-read tearing comes as a bonus.

This patch (of 4):

This is in preparation to a later patch which converts totalram_pages and
zone->managed_pages to atomic variables.  Please note that re-reading the
value might lead to a different value and as such it could lead to
unexpected behavior.  There are no known bugs as a result of the current
code but it is better to prevent from them in principle.

Link: http://lkml.kernel.org/r/1542090790-21750-2-git-send-email-arunks@codeaurora.org
Signed-off-by: Arun KS <arunks@codeaurora.org>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 12:11:47 -08:00
Andrey Konovalov 9577dd7486 kasan: rename kasan_zero_page to kasan_early_shadow_page
With tag based KASAN mode the early shadow value is 0xff and not 0x00, so
this patch renames kasan_zero_(page|pte|pmd|pud|p4d) to
kasan_early_shadow_(page|pte|pmd|pud|p4d) to avoid confusion.

Link: http://lkml.kernel.org/r/3fed313280ebf4f88645f5b89ccbc066d320e177.1544099024.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 12:11:43 -08:00
Linus Torvalds b71acb0e37 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "API:
   - Add 1472-byte test to tcrypt for IPsec
   - Reintroduced crypto stats interface with numerous changes
   - Support incremental algorithm dumps

  Algorithms:
   - Add xchacha12/20
   - Add nhpoly1305
   - Add adiantum
   - Add streebog hash
   - Mark cts(cbc(aes)) as FIPS allowed

  Drivers:
   - Improve performance of arm64/chacha20
   - Improve performance of x86/chacha20
   - Add NEON-accelerated nhpoly1305
   - Add SSE2 accelerated nhpoly1305
   - Add AVX2 accelerated nhpoly1305
   - Add support for 192/256-bit keys in gcmaes AVX
   - Add SG support in gcmaes AVX
   - ESN for inline IPsec tx in chcr
   - Add support for CryptoCell 703 in ccree
   - Add support for CryptoCell 713 in ccree
   - Add SM4 support in ccree
   - Add SM3 support in ccree
   - Add support for chacha20 in caam/qi2
   - Add support for chacha20 + poly1305 in caam/jr
   - Add support for chacha20 + poly1305 in caam/qi2
   - Add AEAD cipher support in cavium/nitrox"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (130 commits)
  crypto: skcipher - remove remnants of internal IV generators
  crypto: cavium/nitrox - Fix build with !CONFIG_DEBUG_FS
  crypto: salsa20-generic - don't unnecessarily use atomic walk
  crypto: skcipher - add might_sleep() to skcipher_walk_virt()
  crypto: x86/chacha - avoid sleeping under kernel_fpu_begin()
  crypto: cavium/nitrox - Added AEAD cipher support
  crypto: mxc-scc - fix build warnings on ARM64
  crypto: api - document missing stats member
  crypto: user - remove unused dump functions
  crypto: chelsio - Fix wrong error counter increments
  crypto: chelsio - Reset counters on cxgb4 Detach
  crypto: chelsio - Handle PCI shutdown event
  crypto: chelsio - cleanup:send addr as value in function argument
  crypto: chelsio - Use same value for both channel in single WR
  crypto: chelsio - Swap location of AAD and IV sent in WR
  crypto: chelsio - remove set but not used variable 'kctx_len'
  crypto: ux500 - Use proper enum in hash_set_dma_transfer
  crypto: ux500 - Use proper enum in cryp_set_dma_transfer
  crypto: aesni - Add scatter/gather avx stubs, and use them in C
  crypto: aesni - Introduce partial block macro
  ..
2018-12-27 13:53:32 -08:00
Linus Torvalds e0c38a4d1f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) New ipset extensions for matching on destination MAC addresses, from
    Stefano Brivio.

 2) Add ipv4 ttl and tos, plus ipv6 flow label and hop limit offloads to
    nfp driver. From Stefano Brivio.

 3) Implement GRO for plain UDP sockets, from Paolo Abeni.

 4) Lots of work from Michał Mirosław to eliminate the VLAN_TAG_PRESENT
    bit so that we could support the entire vlan_tci value.

 5) Rework the IPSEC policy lookups to better optimize more usecases,
    from Florian Westphal.

 6) Infrastructure changes eliminating direct manipulation of SKB lists
    wherever possible, and to always use the appropriate SKB list
    helpers. This work is still ongoing...

 7) Lots of PHY driver and state machine improvements and
    simplifications, from Heiner Kallweit.

 8) Various TSO deferral refinements, from Eric Dumazet.

 9) Add ntuple filter support to aquantia driver, from Dmitry Bogdanov.

10) Batch dropping of XDP packets in tuntap, from Jason Wang.

11) Lots of cleanups and improvements to the r8169 driver from Heiner
    Kallweit, including support for ->xmit_more. This driver has been
    getting some much needed love since he started working on it.

12) Lots of new forwarding selftests from Petr Machata.

13) Enable VXLAN learning in mlxsw driver, from Ido Schimmel.

14) Packed ring support for virtio, from Tiwei Bie.

15) Add new Aquantia AQtion USB driver, from Dmitry Bezrukov.

16) Add XDP support to dpaa2-eth driver, from Ioana Ciocoi Radulescu.

17) Implement coalescing on TCP backlog queue, from Eric Dumazet.

18) Implement carrier change in tun driver, from Nicolas Dichtel.

19) Support msg_zerocopy in UDP, from Willem de Bruijn.

20) Significantly improve garbage collection of neighbor objects when
    the table has many PERMANENT entries, from David Ahern.

21) Remove egdev usage from nfp and mlx5, and remove the facility
    completely from the tree as it no longer has any users. From Oz
    Shlomo and others.

22) Add a NETDEV_PRE_CHANGEADDR so that drivers can veto the change and
    therefore abort the operation before the commit phase (which is the
    NETDEV_CHANGEADDR event). From Petr Machata.

23) Add indirect call wrappers to avoid retpoline overhead, and use them
    in the GRO code paths. From Paolo Abeni.

24) Add support for netlink FDB get operations, from Roopa Prabhu.

25) Support bloom filter in mlxsw driver, from Nir Dotan.

26) Add SKB extension infrastructure. This consolidates the handling of
    the auxiliary SKB data used by IPSEC and bridge netfilter, and is
    designed to support the needs to MPTCP which could be integrated in
    the future.

27) Lots of XDP TX optimizations in mlx5 from Tariq Toukan.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1845 commits)
  net: dccp: fix kernel crash on module load
  drivers/net: appletalk/cops: remove redundant if statement and mask
  bnx2x: Fix NULL pointer dereference in bnx2x_del_all_vlans() on some hw
  net/net_namespace: Check the return value of register_pernet_subsys()
  net/netlink_compat: Fix a missing check of nla_parse_nested
  ieee802154: lowpan_header_create check must check daddr
  net/mlx4_core: drop useless LIST_HEAD
  mlxsw: spectrum: drop useless LIST_HEAD
  net/mlx5e: drop useless LIST_HEAD
  iptunnel: Set tun_flags in the iptunnel_metadata_reply from src
  net/mlx5e: fix semicolon.cocci warnings
  staging: octeon: fix build failure with XFRM enabled
  net: Revert recent Spectre-v1 patches.
  can: af_can: Fix Spectre v1 vulnerability
  packet: validate address length if non-zero
  nfc: af_nfc: Fix Spectre v1 vulnerability
  phonet: af_phonet: Fix Spectre v1 vulnerability
  net: core: Fix Spectre v1 vulnerability
  net: minor cleanup in skb_ext_add()
  net: drop the unused helper skb_ext_get()
  ...
2018-12-27 13:04:52 -08:00
Linus Torvalds fc2fd5f0f1 Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform update from Ingo Molnar:
 "An OLPC platform support simplification patch"

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/platform/olpc: Do not call of_platform_bus_probe()
2018-12-26 18:42:51 -08:00
Linus Torvalds e57d9f638a Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Update and clean up x86 fault handling, by Andy Lutomirski.

   - Drop usage of __flush_tlb_all() in kernel_physical_mapping_init()
     and related fallout, by Dan Williams.

   - CPA cleanups and reorganization by Peter Zijlstra: simplify the
     flow and remove a few warts.

   - Other misc cleanups"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits)
  x86/mm/dump_pagetables: Use DEFINE_SHOW_ATTRIBUTE()
  x86/mm/cpa: Rename @addrinarray to @numpages
  x86/mm/cpa: Better use CLFLUSHOPT
  x86/mm/cpa: Fold cpa_flush_range() and cpa_flush_array() into a single cpa_flush() function
  x86/mm/cpa: Make cpa_data::numpages invariant
  x86/mm/cpa: Optimize cpa_flush_array() TLB invalidation
  x86/mm/cpa: Simplify the code after making cpa->vaddr invariant
  x86/mm/cpa: Make cpa_data::vaddr invariant
  x86/mm/cpa: Add __cpa_addr() helper
  x86/mm/cpa: Add ARRAY and PAGES_ARRAY selftests
  x86/mm: Drop usage of __flush_tlb_all() in kernel_physical_mapping_init()
  x86/mm: Validate kernel_physical_mapping_init() PTE population
  generic/pgtable: Introduce set_pte_safe()
  generic/pgtable: Introduce {p4d,pgd}_same()
  generic/pgtable: Make {pmd, pud}_same() unconditionally available
  x86/fault: Clean up the page fault oops decoder a bit
  x86/fault: Decode page fault OOPSes better
  x86/vsyscall/64: Use X86_PF constants in the simulated #PF error code
  x86/oops: Show the correct CS value in show_regs()
  x86/fault: Don't try to recover from an implicit supervisor access
  ...
2018-12-26 18:08:18 -08:00
Linus Torvalds d6e867a6ae Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu updates from Ingo Molnar:
 "Misc preparatory changes for an upcoming FPU optimization that will
  delay the loading of FPU registers to return-to-userspace"

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu: Don't export __kernel_fpu_{begin,end}()
  x86/fpu: Update comment for __raw_xsave_addr()
  x86/fpu: Add might_fault() to user_insn()
  x86/pkeys: Make init_pkru_value static
  x86/thread_info: Remove _TIF_ALLWORK_MASK
  x86/process/32: Remove asm/math_emu.h include
  x86/fpu: Use unsigned long long shift in xfeature_uncompacted_offset()
2018-12-26 17:37:51 -08:00
Linus Torvalds db2ab474c4 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Ingo Molnar:
 "Misc changes:

   - Fix nr_cpus= boot option interaction bug with logical package
     management

   - Clean up UMIP detection messages

   - Add WBNOINVD instruction detection

   - Remove the unused get_scattered_cpuid_leaf() function"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/topology: Use total_cpus for max logical packages calculation
  x86/umip: Make the UMIP activated message generic
  x86/umip: Print UMIP line only once
  x86/cpufeatures: Add WBNOINVD feature definition
  x86/cpufeatures: Remove get_scattered_cpuid_leaf()
2018-12-26 17:35:41 -08:00
Linus Torvalds 312a466155 Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "Misc cleanups"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/kprobes: Remove trampoline_handler() prototype
  x86/kernel: Fix more -Wmissing-prototypes warnings
  x86: Fix various typos in comments
  x86/headers: Fix -Wmissing-prototypes warning
  x86/process: Avoid unnecessary NULL check in get_wchan()
  x86/traps: Complete prototype declarations
  x86/mce: Fix -Wmissing-prototypes warnings
  x86/gart: Rewrite early_gart_iommu_check() comment
2018-12-26 17:03:51 -08:00
Linus Torvalds 6e54df001a Merge branch 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build updates from Ingo Molnar:

 - Resolve LLVM build bug by removing redundant GNU specific flag

 - Remove obsolete -funit-at-a-time and -fno-unit-at-a-time use from x86
   PowerPC and UM.

   The UML change was seen and acked by UML maintainer Richard
   Weinberger.

* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/um/vdso: Drop implicit common-page-size linker flag
  x86, powerpc: Remove -funit-at-a-time compiler option entirely
  x86/um: Remove -fno-unit-at-a-time workaround for pre-4.0 GCC
2018-12-26 16:57:27 -08:00
Linus Torvalds 9a126e788a Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
 "Two cleanups"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Add missing va_end() to die()
  x86/boot: Simplify the detect_memory*() control flow
2018-12-26 16:56:00 -08:00
Linus Torvalds 38fabca18f Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
 "Two changes:

   - Remove (some) remnants of the vDSO's fake section table mechanism
     that were left behind when the vDSO build process reverted to using
     "objdump -S" to strip the userspace image.

   - Remove hardcoded POPCNT mnemonics now that the minimum binutils
     version supports the symbolic form"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vdso: Remove a stale/misleading comment from the linker script
  x86/vdso: Remove obsolete "fake section table" reservation
  x86: Use POPCNT mnemonics in arch_hweight.h
2018-12-26 16:25:06 -08:00
Linus Torvalds 8465625ab4 Merge branch 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 AMD northbridge updates from Ingo Molnar:
 "Update DF/SMN access and k10temp for AMD F17h M30h, by Brian Woods:

   'Updates the data fabric/system management network code needed to get
    k10temp working for M30h. Since there are now processors which have
    multiple roots per DF/SMN interface, there needs to some logic which
    skips N-1 root complexes per DF/SMN interface. This is because the
    root complexes per interface are redundant (as far as DF/SMN goes).
    These changes shouldn't effect past processors and, for F17h M0Xh,
    the mappings stay the same.'

  The hwmon changes were seen and acked by hwmon maintainer Guenter Roeck"

* 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hwmon/k10temp: Add support for AMD family 17h, model 30h CPUs
  x86/amd_nb: Add PCI device IDs for family 17h, model 30h
  x86/amd_nb: Add support for newer PCI topologies
  hwmon/k10temp, x86/amd_nb: Consolidate shared device IDs
2018-12-26 16:12:50 -08:00
Linus Torvalds 116b081c28 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "The main changes in this cycle on the kernel side:

   - rework kprobes blacklist handling (Masami Hiramatsu)

   - misc cleanups

  on the tooling side these areas were the main focus:

   - 'perf trace'     enhancements (Arnaldo Carvalho de Melo)

   - 'perf bench'     enhancements (Davidlohr Bueso)

   - 'perf record'    enhancements (Alexey Budankov)

   - 'perf annotate'  enhancements (Jin Yao)

   - 'perf top'       enhancements (Jiri Olsa)

   - Intel hw tracing enhancements (Adrian Hunter)

   - ARM hw tracing   enhancements (Leo Yan, Mathieu Poirier)

   - ... plus lots of other enhancements, cleanups and fixes"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (171 commits)
  tools uapi asm: Update asm-generic/unistd.h copy
  perf symbols: Relax checks on perf-PID.map ownership
  perf trace: Wire up the fadvise 'advice' table generator
  perf beauty: Add generator for fadvise64's 'advice' arg constants
  tools headers uapi: Grab a copy of fadvise.h
  perf beauty mmap: Print mmap's 'offset' arg in hexadecimal
  perf beauty mmap: Print PROT_READ before PROT_EXEC to match strace output
  perf trace beauty: Beautify arch_prctl()'s arguments
  perf trace: When showing string prefixes show prefix + ??? for unknown entries
  perf trace: Move strarrays to beauty.h for further reuse
  perf beauty: Wire up the x86_arch prctl code table generator
  perf beauty: Add a string table generator for x86's 'arch_prctl' codes
  tools include arch: Grab a copy of x86's prctl.h
  perf trace: Show NULL when syscall pointer args are 0
  perf trace: Enclose the errno strings with ()
  perf augmented_raw_syscalls: Copy 'access' arg as well
  perf trace: Add alignment spaces after the closing parens
  perf trace beauty: Print O_RDONLY when (flags & O_ACCMODE) == 0
  perf trace: Allow asking for not suppressing common string prefixes
  perf trace: Add a prefix member to the strarray class
  ...
2018-12-26 14:45:18 -08:00
Linus Torvalds 684019dd1f Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Allocate the E820 buffer before doing the
     GetMemoryMap/ExitBootServices dance so we don't run out of space

   - Clear EFI boot services mappings when freeing the memory

   - Harden efivars against callers that invoke it on non-EFI boots

   - Reduce the number of memblock reservations resulting from extensive
     use of the new efi_mem_reserve_persistent() API

   - Other assorted fixes and cleanups"

* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Don't unmap EFI boot services code/data regions for EFI_OLD_MEMMAP and EFI_MIXED_MODE
  efi: Reduce the amount of memblock reservations for persistent allocations
  efi: Permit multiple entries in persistent memreserve data structure
  efi/libstub: Disable some warnings for x86{,_64}
  x86/efi: Move efi_<reserve/free>_boot_services() to arch/x86
  x86/efi: Unmap EFI boot services code/data regions from efi_pgd
  x86/mm/pageattr: Introduce helper function to unmap EFI boot services
  efi/fdt: Simplify the get_fdt() flow
  efi/fdt: Indentation fix
  firmware/efi: Add NULL pointer checks in efivars API functions
2018-12-26 13:38:38 -08:00
Linus Torvalds 792bf4d871 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The biggest RCU changes in this cycle were:

   - Convert RCU's BUG_ON() and similar calls to WARN_ON() and similar.

   - Replace calls of RCU-bh and RCU-sched update-side functions to
     their vanilla RCU counterparts. This series is a step towards
     complete removal of the RCU-bh and RCU-sched update-side functions.

     ( Note that some of these conversions are going upstream via their
       respective maintainers. )

   - Documentation updates, including a number of flavor-consolidation
     updates from Joel Fernandes.

   - Miscellaneous fixes.

   - Automate generation of the initrd filesystem used for rcutorture
     testing.

   - Convert spin_is_locked() assertions to instead use lockdep.

     ( Note that some of these conversions are going upstream via their
       respective maintainers. )

   - SRCU updates, especially including a fix from Dennis Krein for a
     bag-on-head-class bug.

   - RCU torture-test updates"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (112 commits)
  rcutorture: Don't do busted forward-progress testing
  rcutorture: Use 100ms buckets for forward-progress callback histograms
  rcutorture: Recover from OOM during forward-progress tests
  rcutorture: Print forward-progress test age upon failure
  rcutorture: Print time since GP end upon forward-progress failure
  rcutorture: Print histogram of CB invocation at OOM time
  rcutorture: Print GP age upon forward-progress failure
  rcu: Print per-CPU callback counts for forward-progress failures
  rcu: Account for nocb-CPU callback counts in RCU CPU stall warnings
  rcutorture: Dump grace-period diagnostics upon forward-progress OOM
  rcutorture: Prepare for asynchronous access to rcu_fwd_startat
  torture: Remove unnecessary "ret" variables
  rcutorture: Affinity forward-progress test to avoid housekeeping CPUs
  rcutorture: Break up too-long rcu_torture_fwd_prog() function
  rcutorture: Remove cbflood facility
  torture: Bring any extra CPUs online during kernel startup
  rcutorture: Add call_rcu() flooding forward-progress tests
  rcutorture/formal: Replace synchronize_sched() with synchronize_rcu()
  tools/kernel.h: Replace synchronize_sched() with synchronize_rcu()
  net/decnet: Replace rcu_barrier_bh() with rcu_barrier()
  ...
2018-12-26 13:07:19 -08:00
Linus Torvalds eed9688f85 Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS updates from Borislav Petkov:
 "This time around we have a subsystem reorganization to offer, with the
  new directory being

    arch/x86/kernel/cpu/mce/

  and all compilation units' names streamlined under it"

* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Restore MCE injector's module name
  x86/mce: Unify pr_* prefix
  x86/mce: Streamline MCE subsystem's naming
2018-12-26 13:03:47 -08:00
Linus Torvalds 72af84151f Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode loading updates from Borislav Petkov:
 "This update contains work started by Maciej to make the microcode
  container verification more robust against all kinds of corruption and
  also unify verification paths between early and late loading.

  The result is a set of verification routines which validate the
  microcode blobs before loading it on the CPU. In addition, the code is
  a lot more streamlined and unified.

  In the process, some of the aspects of patch handling and loading were
  simplified.

  All provided by Maciej S. Szmigiero and Borislav Petkov"

* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/AMD: Update copyright
  x86/microcode/AMD: Check the equivalence table size when scanning it
  x86/microcode/AMD: Convert CPU equivalence table variable into a struct
  x86/microcode/AMD: Check microcode container data in the late loader
  x86/microcode/AMD: Fix container size's type
  x86/microcode/AMD: Convert early parser to the new verification routines
  x86/microcode/AMD: Change verify_patch()'s return value
  x86/microcode/AMD: Move chipset-specific check into verify_patch()
  x86/microcode/AMD: Move patch family check to verify_patch()
  x86/microcode/AMD: Simplify patch family detection
  x86/microcode/AMD: Concentrate patch verification
  x86/microcode/AMD: Cleanup verify_patch_size() more
  x86/microcode/AMD: Clean up per-family patch size checks
  x86/microcode/AMD: Move verify_patch_size() up in the file
  x86/microcode/AMD: Add microcode container verification
  x86/microcode/AMD: Subtract SECTION_HDR_SIZE from file leftover length
2018-12-26 12:55:57 -08:00
Linus Torvalds a52fb43a5f Merge branch 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cache control updates from Borislav Petkov:

 - The generalization of the RDT code to accommodate the addition of
   AMD's very similar implementation of the cache monitoring feature.

   This entails a subsystem move into a separate and generic
   arch/x86/kernel/cpu/resctrl/ directory along with adding
   vendor-specific initialization and feature detection helpers.

   Ontop of that is the unification of user-visible strings, both in the
   resctrl filesystem error handling and Kconfig.

   Provided by Babu Moger and Sherry Hurwitz.

 - Code simplifications and error handling improvements by Reinette
   Chatre.

* 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Fix rdt_find_domain() return value and checks
  x86/resctrl: Remove unnecessary check for cbm_validate()
  x86/resctrl: Use rdt_last_cmd_puts() where possible
  MAINTAINERS: Update resctrl filename patterns
  Documentation: Rename and update intel_rdt_ui.txt to resctrl_ui.txt
  x86/resctrl: Introduce AMD QOS feature
  x86/resctrl: Fixup the user-visible strings
  x86/resctrl: Add AMD's X86_FEATURE_MBA to the scattered CPUID features
  x86/resctrl: Rename the config option INTEL_RDT to RESCTRL
  x86/resctrl: Add vendor check for the MBA software controller
  x86/resctrl: Bring cbm_validate() into the resource structure
  x86/resctrl: Initialize the vendor-specific resource functions
  x86/resctrl: Move all the macros to resctrl/internal.h
  x86/resctrl: Re-arrange the RDT init code
  x86/resctrl: Rename the RDT functions and definitions
  x86/resctrl: Rename and move rdt files to a separate directory
2018-12-26 12:17:43 -08:00
Linus Torvalds 42b00f122c * ARM: selftests improvements, large PUD support for HugeTLB,
single-stepping fixes, improved tracing, various timer and vGIC
 fixes
 
 * x86: Processor Tracing virtualization, STIBP support, some correctness fixes,
 refactorings and splitting of vmx.c, use the Hyper-V range TLB flush hypercall,
 reduce order of vcpu struct, WBNOINVD support, do not use -ftrace for __noclone
 functions, nested guest support for PAUSE filtering on AMD, more Hyper-V
 enlightenments (direct mode for synthetic timers)
 
 * PPC: nested VFIO
 
 * s390: bugfixes only this time
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJcH0vFAAoJEL/70l94x66Dw/wH/2FZp1YOM5OgiJzgqnXyDbyf
 dNEfWo472MtNiLsuf+ZAfJojVIu9cv7wtBfXNzW+75XZDfh/J88geHWNSiZDm3Fe
 aM4MOnGG0yF3hQrRQyEHe4IFhGFNERax8Ccv+OL44md9CjYrIrsGkRD08qwb+gNh
 P8T/3wJEKwUcVHA/1VHEIM8MlirxNENc78p6JKd/C7zb0emjGavdIpWFUMr3SNfs
 CemabhJUuwOYtwjRInyx1y34FzYwW3Ejuc9a9UoZ+COahUfkuxHE8u+EQS7vLVF6
 2VGVu5SA0PqgmLlGhHthxLqVgQYo+dB22cRnsLtXlUChtVAq8q9uu5sKzvqEzuE=
 =b4Jx
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "ARM:
   - selftests improvements
   - large PUD support for HugeTLB
   - single-stepping fixes
   - improved tracing
   - various timer and vGIC fixes

  x86:
   - Processor Tracing virtualization
   - STIBP support
   - some correctness fixes
   - refactorings and splitting of vmx.c
   - use the Hyper-V range TLB flush hypercall
   - reduce order of vcpu struct
   - WBNOINVD support
   - do not use -ftrace for __noclone functions
   - nested guest support for PAUSE filtering on AMD
   - more Hyper-V enlightenments (direct mode for synthetic timers)

  PPC:
   -  nested VFIO

  s390:
   - bugfixes only this time"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (171 commits)
  KVM: x86: Add CPUID support for new instruction WBNOINVD
  kvm: selftests: ucall: fix exit mmio address guessing
  Revert "compiler-gcc: disable -ftracer for __noclone functions"
  KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines
  KVM: VMX: Explicitly reference RCX as the vmx_vcpu pointer in asm blobs
  KVM: x86: Use jmp to invoke kvm_spurious_fault() from .fixup
  MAINTAINERS: Add arch/x86/kvm sub-directories to existing KVM/x86 entry
  KVM/x86: Use SVM assembly instruction mnemonics instead of .byte streams
  KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()
  KVM/MMU: Flush tlb directly in kvm_set_pte_rmapp()
  KVM/MMU: Move tlb flush in kvm_set_pte_rmapp() to kvm_mmu_notifier_change_pte()
  KVM: Make kvm_set_spte_hva() return int
  KVM: Replace old tlb flush function with new one to flush a specified range.
  KVM/MMU: Add tlb flush with range helper function
  KVM/VMX: Add hv tlb range flush support
  x86/hyper-v: Add HvFlushGuestAddressList hypercall support
  KVM: Add tlb_remote_flush_with_range callback in kvm_x86_ops
  KVM: x86: Disable Intel PT when VMXON in L1 guest
  KVM: x86: Set intercept for Intel PT MSRs read/write
  KVM: x86: Implement Intel PT MSRs read/write emulation
  ...
2018-12-26 11:46:28 -08:00
Linus Torvalds 460023a5d1 xen: features and fixes for 4.21
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCXBvKlAAKCRCAXGG7T9hj
 vmIoAP0XpLCE+0Z1hhxcDcJ0hKah1NIniRSIGGr6Af+gxe8F4wEA0Vm55gtEZerU
 9mL5S7e2EcuTo93XCIjsxU8uPLGtegQ=
 =59wi
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.21-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:
 "Xen features and fixes:

   - a series to enable KVM guests to be booted by qemu via the Xen PVH
     boot entry for speeding up KVM guest tests

   - a series for a common driver to be used by Xen PV frontends (right
     now drm and sound)

   - two other fixes in Xen related code"

* tag 'for-linus-4.21-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  ALSA: xen-front: Use Xen common shared buffer implementation
  drm/xen-front: Use Xen common shared buffer implementation
  xen: Introduce shared buffer helpers for page directory...
  xen/pciback: Check dev_data before using it
  kprobes/x86/xen: blacklist non-attachable xen interrupt functions
  KVM: x86: Allow Qemu/KVM to use PVH entry point
  xen/pvh: Add memory map pointer to hvm_start_info struct
  xen/pvh: Move Xen code for getting mem map via hcall out of common file
  xen/pvh: Move Xen specific PVH VM initialization out of common file
  xen/pvh: Create a new file for Xen specific PVH code
  xen/pvh: Move PVH entry code out of Xen specific tree
  xen/pvh: Split CONFIG_XEN_PVH into CONFIG_PVH and CONFIG_XEN_PVH
2018-12-26 11:35:07 -08:00
Linus Torvalds 5694cecdb0 arm64 festive updates for 4.21
In the end, we ended up with quite a lot more than I expected:
 
 - Support for ARMv8.3 Pointer Authentication in userspace (CRIU and
   kernel-side support to come later)
 
 - Support for per-thread stack canaries, pending an update to GCC that
   is currently undergoing review
 
 - Support for kexec_file_load(), which permits secure boot of a kexec
   payload but also happens to improve the performance of kexec
   dramatically because we can avoid the sucky purgatory code from
   userspace. Kdump will come later (requires updates to libfdt).
 
 - Optimisation of our dynamic CPU feature framework, so that all
   detected features are enabled via a single stop_machine() invocation
 
 - KPTI whitelisting of Cortex-A CPUs unaffected by Meltdown, so that
   they can benefit from global TLB entries when KASLR is not in use
 
 - 52-bit virtual addressing for userspace (kernel remains 48-bit)
 
 - Patch in LSE atomics for per-cpu atomic operations
 
 - Custom preempt.h implementation to avoid unconditional calls to
   preempt_schedule() from preempt_enable()
 
 - Support for the new 'SB' Speculation Barrier instruction
 
 - Vectorised implementation of XOR checksumming and CRC32 optimisations
 
 - Workaround for Cortex-A76 erratum #1165522
 
 - Improved compatibility with Clang/LLD
 
 - Support for TX2 system PMUS for profiling the L3 cache and DMC
 
 - Reflect read-only permissions in the linear map by default
 
 - Ensure MMIO reads are ordered with subsequent calls to Xdelay()
 
 - Initial support for memory hotplug
 
 - Tweak the threshold when we invalidate the TLB by-ASID, so that
   mremap() performance is improved for ranges spanning multiple PMDs.
 
 - Minor refactoring and cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABCgAGBQJcE4TmAAoJELescNyEwWM0Nr0H/iaU7/wQSzHyNXtZoImyKTul
 Blu2ga4/EqUrTU7AVVfmkl/3NBILWlgQVpY6tH6EfXQuvnxqD7CizbHyLdyO+z0S
 B5PsFUH2GLMNAi48AUNqGqkgb2knFbg+T+9IimijDBkKg1G/KhQnRg6bXX32mLJv
 Une8oshUPBVJMsHN1AcQknzKariuoE3u0SgJ+eOZ9yA2ZwKxP4yy1SkDt3xQrtI0
 lojeRjxcyjTP1oGRNZC+BWUtGOT35p7y6cGTnBd/4TlqBGz5wVAJUcdoxnZ6JYVR
 O8+ob9zU+4I0+SKt80s7pTLqQiL9rxkKZ5joWK1pr1g9e0s5N5yoETXKFHgJYP8=
 =sYdt
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 festive updates from Will Deacon:
 "In the end, we ended up with quite a lot more than I expected:

   - Support for ARMv8.3 Pointer Authentication in userspace (CRIU and
     kernel-side support to come later)

   - Support for per-thread stack canaries, pending an update to GCC
     that is currently undergoing review

   - Support for kexec_file_load(), which permits secure boot of a kexec
     payload but also happens to improve the performance of kexec
     dramatically because we can avoid the sucky purgatory code from
     userspace. Kdump will come later (requires updates to libfdt).

   - Optimisation of our dynamic CPU feature framework, so that all
     detected features are enabled via a single stop_machine()
     invocation

   - KPTI whitelisting of Cortex-A CPUs unaffected by Meltdown, so that
     they can benefit from global TLB entries when KASLR is not in use

   - 52-bit virtual addressing for userspace (kernel remains 48-bit)

   - Patch in LSE atomics for per-cpu atomic operations

   - Custom preempt.h implementation to avoid unconditional calls to
     preempt_schedule() from preempt_enable()

   - Support for the new 'SB' Speculation Barrier instruction

   - Vectorised implementation of XOR checksumming and CRC32
     optimisations

   - Workaround for Cortex-A76 erratum #1165522

   - Improved compatibility with Clang/LLD

   - Support for TX2 system PMUS for profiling the L3 cache and DMC

   - Reflect read-only permissions in the linear map by default

   - Ensure MMIO reads are ordered with subsequent calls to Xdelay()

   - Initial support for memory hotplug

   - Tweak the threshold when we invalidate the TLB by-ASID, so that
     mremap() performance is improved for ranges spanning multiple PMDs.

   - Minor refactoring and cleanups"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (125 commits)
  arm64: kaslr: print PHYS_OFFSET in dump_kernel_offset()
  arm64: sysreg: Use _BITUL() when defining register bits
  arm64: cpufeature: Rework ptr auth hwcaps using multi_entry_cap_matches
  arm64: cpufeature: Reduce number of pointer auth CPU caps from 6 to 4
  arm64: docs: document pointer authentication
  arm64: ptr auth: Move per-thread keys from thread_info to thread_struct
  arm64: enable pointer authentication
  arm64: add prctl control for resetting ptrauth keys
  arm64: perf: strip PAC when unwinding userspace
  arm64: expose user PAC bit positions via ptrace
  arm64: add basic pointer authentication support
  arm64/cpufeature: detect pointer authentication
  arm64: Don't trap host pointer auth use to EL2
  arm64/kvm: hide ptrauth from guests
  arm64/kvm: consistently handle host HCR_EL2 flags
  arm64: add pointer authentication register bits
  arm64: add comments about EC exception levels
  arm64: perf: Treat EXCLUDE_EL* bit definitions as unsigned
  arm64: kpti: Whitelist Cortex-A CPUs that don't implement the CSV3 field
  arm64: enable per-task stack canaries
  ...
2018-12-25 17:41:56 -08:00
Linus Torvalds 13e1ad2be3 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 pti updates from Thomas Gleixner:
 "No point in speculating what's in this parcel:

   - Drop the swap storage limit when L1TF is disabled so the full space
     is available

   - Add support for the new AMD STIBP always on mitigation mode

   - Fix a bunch of STIPB typos"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/speculation: Add support for STIBP always-on preferred mode
  x86/speculation/l1tf: Drop the swap storage limit restriction when l1tf=off
  x86/speculation: Change misspelled STIPB to STIBP
2018-12-25 16:26:42 -08:00
Linus Torvalds e6d1315006 ACPI updates for 4.21-rc1
- Update the ACPICA code in the kernel to the 20181213 upstream
    revision including:
    * New Windows _OSI strings (Bob Moore, Jung-uk Kim).
    * Buffers-to-string conversions update (Bob Moore).
    * Removal of support for expressions in package elements (Bob
      Moore).
    * New option to display method/object evaluation in debug output
      (Bob Moore).
    * Compiler improvements (Bob Moore, Erik Schmauss).
    * Minor debugger fix (Erik Schmauss).
    * Disassembler improvement (Erik Schmauss).
    * Assorted cleanups (Bob Moore, Colin Ian King, Erik Schmauss).
 
  - Add support for a new OEM _OSI string to indicate special handling
    of secondary graphics adapters on some systems (Alex Hung).
 
  - Make it possible to build the ACPI subystem without PCI support
    (Sinan Kaya).
 
  - Make the SPCR table handling regard baud rate 0 in accordance with
    the specification of it and make the DSDT override code support
    DSDT code names generated by recent ACPICA (Andy Shevchenko, Wang
    Dongsheng, Nathan Chancellor).
 
  - Add clock frequency for Hisilicon Hip08 SPI controller to the ACPI
    driver for AMD SoCs (APD) (Jay Fang).
 
  - Fix the PM handling during device init in the ACPI driver for
    Intel SoCs (LPSS) (Hans de Goede).
 
  - Avoid double panic()s by clearing the APEI GHES block_status
    before panic() (Lenny Szubowicz).
 
  - Clean up a function invocation in the ACPI core and get rid of
    some code duplication by using the DEFINE_SHOW_ATTRIBUTE macro
    in the APEI support code (Alexey Dobriyan, Yangtao Li).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJcHMSBAAoJEILEb/54YlRxZmEQAIbRXKOwvvt3my9HLBC/6V1u
 +Wed0yNBQ9HkVWQzFuppDq97/kk5DRODnPNu9RaeS7QXxVOBfwElinm8NhzVI7Fm
 FP5iPwnNq8EAkDTBOoG139Fs82EkaVSa2x9FHy84Jge3BXmauQM13bWP/kF5TjCn
 Frjuh0TfhQ+ub853GisAr/SW7ixCWp81FZaW/xFcDuJU2E6AvjNQusdiAocgAqQ8
 rnl8D0gjSW6m6HcauaTizRMXOIyePkfT86xQKwU7259ByRW20iQtsl/6+Rnyy3wG
 cCrlsaHd0bP6qwVAQyh6cURq8hdLAUYI9tzBW0EL+UEpJ289j51s+RSh2nZNyIKO
 wfbr2DdK3aaWcUygSxoP4FFHqINch/IRwaP2huT9szO1yLCikAN8Xmrb1BPZvOIK
 m6Lywb1B+SOfGgJl4Z1GjzIc6dimrXVbgxjN1+Bpe1NeKqe/M6vMdbcvPIsMs7b8
 iE/1gJPeJ5pvAgsQiWncZvyaOKaSmrLWbaw/ITQnNXVLDlTI3hIQExiPPl5hJ00v
 Z4egVMdCCxYqZxxkZKEYnEe/lb9BRAMIvbkkocPBdmtNAWPuVnCqdR26BppaEt7i
 r2tnEd84aISCDcBc2sIpo/pVUwncw5GtK20z8Ke+3rlg8lDZ0hAdHQWgBtj4xnnw
 grImzXnKvSdajfZnvjRg
 =yxXc
 -----END PGP SIGNATURE-----

Merge tag 'acpi-4.21-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI updates from Rafael Wysocki:
 "These update the ACPICA code in the kernel to the 20181213 upstream
  revision, make it possible to build the ACPI subsystem without PCI
  support, and a new OEM _OSI string, add a new device support to the
  ACPI driver for AMD SoCs and fix PM handling in the ACPI driver for
  Intel SoCs, fix the SPCR table handling and do some assorted fixes and
  cleanups.

  Specifics:

   - Update the ACPICA code in the kernel to the 20181213 upstream
     revision including:
      * New Windows _OSI strings (Bob Moore, Jung-uk Kim).
      * Buffers-to-string conversions update (Bob Moore).
      * Removal of support for expressions in package elements (Bob
        Moore).
      * New option to display method/object evaluation in debug output
        (Bob Moore).
      * Compiler improvements (Bob Moore, Erik Schmauss).
      * Minor debugger fix (Erik Schmauss).
      * Disassembler improvement (Erik Schmauss).
      * Assorted cleanups (Bob Moore, Colin Ian King, Erik Schmauss).

   - Add support for a new OEM _OSI string to indicate special handling
     of secondary graphics adapters on some systems (Alex Hung).

   - Make it possible to build the ACPI subystem without PCI support
     (Sinan Kaya).

   - Make the SPCR table handling regard baud rate 0 in accordance with
     the specification of it and make the DSDT override code support
     DSDT code names generated by recent ACPICA (Andy Shevchenko, Wang
     Dongsheng, Nathan Chancellor).

   - Add clock frequency for Hisilicon Hip08 SPI controller to the ACPI
     driver for AMD SoCs (APD) (Jay Fang).

   - Fix the PM handling during device init in the ACPI driver for Intel
     SoCs (LPSS) (Hans de Goede).

   - Avoid double panic()s by clearing the APEI GHES block_status before
     panic() (Lenny Szubowicz).

   - Clean up a function invocation in the ACPI core and get rid of some
     code duplication by using the DEFINE_SHOW_ATTRIBUTE macro in the
     APEI support code (Alexey Dobriyan, Yangtao Li)"

* tag 'acpi-4.21-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (31 commits)
  ACPI / tables: Add an ifdef around amlcode and dsdt_amlcode
  ACPI/APEI: Clear GHES block_status before panic()
  ACPI: Make PCI slot detection driver depend on PCI
  ACPI/IORT: Stub out ACS functions when CONFIG_PCI is not set
  arm64: select ACPI PCI code only when both features are enabled
  PCI/ACPI: Allow ACPI to be built without CONFIG_PCI set
  ACPICA: Remove PCI bits from ACPICA when CONFIG_PCI is unset
  ACPI: Allow CONFIG_PCI to be unset for reboot
  ACPI: Move PCI reset to a separate function
  ACPI / OSI: Add OEM _OSI string to enable dGPU direct output
  ACPI / tables: add DSDT AmlCode new declaration name support
  ACPICA: Update version to 20181213
  ACPICA: change coding style to match ACPICA, no functional change
  ACPICA: Debug output: Add option to display method/object evaluation
  ACPICA: disassembler: disassemble OEMx tables as AML
  ACPICA: Add "Windows 2018.2" string in the _OSI support
  ACPICA: Expressions in package elements are not supported
  ACPICA: Update buffer-to-string conversions
  ACPICA: add comments, no functional change
  ACPICA: Remove defines that use deprecated flag
  ...
2018-12-25 14:21:18 -08:00
Eric Biggers f9c9bdb513 crypto: x86/chacha - avoid sleeping under kernel_fpu_begin()
Passing atomic=true to skcipher_walk_virt() only makes the later
skcipher_walk_done() calls use atomic memory allocations, not
skcipher_walk_virt() itself.  Thus, we have to move it outside of the
preemption-disabled region (kernel_fpu_begin()/kernel_fpu_end()).

(skcipher_walk_virt() only allocates memory for certain layouts of the
input scatterlist, hence why I didn't notice this earlier...)

Reported-by: syzbot+9bf843c33f782d73ae7d@syzkaller.appspotmail.com
Fixes: 4af7826187 ("crypto: x86/chacha20 - add XChaCha20 support")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-23 11:52:44 +08:00
Dave Watson 603f8c3b0d crypto: aesni - Add scatter/gather avx stubs, and use them in C
Add the appropriate scatter/gather stubs to the avx asm.
In the C code, we can now always use crypt_by_sg, since both
sse and asm code now support scatter/gather.

Introduce a new struct, aesni_gcm_tfm, that is initialized on
startup to point to either the SSE, AVX, or AVX2 versions of the
four necessary encryption/decryption routines.

GENX_OPTSIZE is still checked at the start of crypt_by_sg.  The
total size of the data is checked, since the additional overhead
is in the init function, calculating additional HashKeys.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-23 11:52:43 +08:00
Dave Watson e044d50563 crypto: aesni - Introduce partial block macro
Before this diff, multiple calls to GCM_ENC_DEC will
succeed, but only if all calls are a multiple of 16 bytes.

Handle partial blocks at the start of GCM_ENC_DEC, and update
aadhash as appropriate.

The data offset %r11 is also updated after the partial block.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-23 11:52:42 +08:00
Dave Watson ec8c02d9a3 crypto: aesni - Introduce READ_PARTIAL_BLOCK macro
Introduce READ_PARTIAL_BLOCK macro, and use it in the two existing
partial block cases: AAD and the end of ENC_DEC.   In particular,
the ENC_DEC case should be faster, since we read by 8/4 bytes if
possible.

This macro will also be used to read partial blocks between
enc_update and dec_update calls.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-23 11:52:42 +08:00
Dave Watson 517a448e09 crypto: aesni - Move ghash_mul to GCM_COMPLETE
Prepare to handle partial blocks between scatter/gather calls.
For the last partial block, we only want to calculate the aadhash
in GCM_COMPLETE, and a new partial block macro will handle both
aadhash update and encrypting partial blocks between calls.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-23 11:52:42 +08:00
Dave Watson a44b419fe5 crypto: aesni - Fill in new context data structures
Fill in aadhash, aadlen, pblocklen, curcount with appropriate values.
pblocklen, aadhash, and pblockenckey are also updated at the end
of each scatter/gather operation, to be carried over to the next
operation.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-23 11:52:42 +08:00
Dave Watson 1cb1bcbb56 crypto: aesni - Merge avx precompute functions
The precompute functions differ only by the sub-macros
they call, merge them to a single macro.   Later diffs
add more code to fill in the gcm_context_data structure,
this allows changes in a single place.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-23 11:52:42 +08:00
Dave Watson 38003cd26c crypto: aesni - Split AAD hash calculation to separate macro
AAD hash only needs to be calculated once for each scatter/gather operation.
Move it to its own macro, and call it from GCM_INIT instead of
INITIAL_BLOCKS.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-23 11:52:42 +08:00
Dave Watson e377bedb09 crypto: aesni - Add GCM_COMPLETE macro
Merge encode and decode tag calculations in GCM_COMPLETE macro.
Scatter/gather routines will call this once at the end of encryption
or decryption.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-23 11:52:42 +08:00
Dave Watson 5350b0f563 crypto: aesni - support 256 byte keys in avx asm
Add support for 192/256-bit keys using the avx gcm/aes routines.
The sse routines were previously updated in e31ac32d3b (Add support
for 192 & 256 bit keys to AESNI RFC4106).

Instead of adding an additional loop in the hotpath as in e31ac32d3b,
this diff instead generates separate versions of the code using macros,
and the entry routines choose which version once.   This results
in a 5% performance improvement vs. adding a loop to the hot path.
This is the same strategy chosen by the intel isa-l_crypto library.

The key size checks are removed from the c code where appropriate.

Note that this diff depends on using gcm_context_data - 256 bit keys
require 16 HashKeys + 15 expanded keys, which is larger than
struct crypto_aes_ctx, so they are stored in struct gcm_context_data.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-23 11:52:42 +08:00
Dave Watson 2426f64bc5 crypto: aesni - Macro-ify func save/restore
Macro-ify function save and restore.  These will be used in new functions
added for scatter/gather update operations.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-23 11:52:42 +08:00
Dave Watson de85fc46b1 crypto: aesni - Introduce gcm_context_data
Add the gcm_context_data structure to the avx asm routines.
This will be necessary to support both 256 bit keys and
scatter/gather.

The pre-computed HashKeys are now stored in the gcm_context_data
struct, which is expanded to hold the greater number of hashkeys
necessary for avx.

Loads and stores to the new struct are always done unlaligned to
avoid compiler issues, see e5b954e8 "Use unaligned loads from
gcm_context_data"

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-23 11:52:42 +08:00
Dave Watson f9b1d64678 crypto: aesni - Merge GCM_ENC_DEC
The GCM_ENC_DEC routines for AVX and AVX2 are identical, except they
call separate sub-macros.  Pass the macros as arguments, and merge them.
This facilitates additional refactoring, by requiring changes in only
one place.

The GCM_ENC_DEC macro was moved above the CONFIG_AS_AVX* ifdefs,
since it will be used by both AVX and AVX2.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-23 11:52:41 +08:00
Masahiro Yamada 2c667d77fc treewide: add intermediate .s files to targets
Avoid unneeded recreation of these in the incremental build.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-12-23 10:12:08 +09:00
Sai Praneeth Prakhya 1debf0958f x86/efi: Don't unmap EFI boot services code/data regions for EFI_OLD_MEMMAP and EFI_MIXED_MODE
The following commit:

  d5052a7130a6 ("x86/efi: Unmap EFI boot services code/data regions from efi_pgd")

forgets to take two EFI modes into consideration, namely EFI_OLD_MEMMAP and
EFI_MIXED_MODE:

- EFI_OLD_MEMMAP is a legacy way of mapping EFI regions into swapper_pg_dir
  using ioremap() and init_memory_mapping(). This feature can be enabled by
  passing "efi=old_map" as kernel command line argument. But,
  efi_unmap_pages() unmaps EFI boot services code/data regions *only* from
  efi_pgd and hence cannot be used for unmapping EFI boot services code/data
  regions from swapper_pg_dir.

Introduce a temporary fix to not unmap EFI boot services code/data regions
when EFI_OLD_MEMMAP is enabled while working on a real fix.

- EFI_MIXED_MODE is another feature where a 64-bit kernel runs on a
  64-bit platform crippled by a 32-bit firmware. To support EFI_MIXED_MODE,
  all RAM (i.e. namely EFI regions like EFI_CONVENTIONAL_MEMORY,
  EFI_LOADER_<CODE/DATA>, EFI_BOOT_SERVICES_<CODE/DATA> and
  EFI_RUNTIME_CODE/DATA regions) is mapped into efi_pgd all the time to
  facilitate EFI runtime calls access it's arguments in 1:1 mode.

Hence, don't unmap EFI boot services code/data regions when booted in mixed mode.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20181222022234.7573-1-sai.praneeth.prakhya@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-22 20:58:30 +01:00
David S. Miller ce28bb4453 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-12-21 15:06:20 -08:00
Linus Torvalds 5092adb227 Unbreak AMD nested virtualization.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJcHOSgAAoJEL/70l94x66DAw4H/jQjdRjT1DAf4vswXwMD6lpJ
 qHcSyAYL4d/PFbcovfAm2ca8F0HJylVWDeZcqQRP3zdX53diqJ4gyYMaNuuY0niX
 zKvzNhFw1oaZK93rwrF6BX1jl4Virw2uC4qL9bhgV/OfkmvTPvIFkP8gJGVDt9YY
 Kn5yhWnJOpHOCQs3GW8zOy2LWtiuCrp7epSrMGjGsWrp50ccW1tTioxYyDmBr3mF
 GizAIgDD2xMwIeOlj4IngQhDTahwekOA9XzhSMKjm0/GMcZ33TXPcnUdoa0Yxguj
 Uu3cXLfcEUfakZdefi3FB5eDB2knDe3kbmKviok2giAAY1hBvEO5b6bHrn+5W2g=
 =l4oP
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fix from Paolo Bonzini:
 "A simple patch for a pretty bad bug: Unbreak AMD nested
  virtualization."

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: nSVM: fix switch to guest mmu
2018-12-21 11:15:36 -08:00
Linus Torvalds 70ad6368e8 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "The biggest part is a series of reverts for the macro based GCC
  inlining workarounds. It caused regressions in distro build and other
  kernel tooling environments, and the GCC project was very receptive to
  fixing the underlying inliner weaknesses - so as time ran out we
  decided to do a reasonably straightforward revert of the patches. The
  plan is to rely on the 'asm inline' GCC 9 feature, which might be
  backported to GCC 8 and could thus become reasonably widely available
  on modern distros.

  Other than those reverts, there's misc fixes from all around the
  place.

  I wish our final x86 pull request for v4.20 was smaller..."

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs"
  Revert "x86/objtool: Use asm macros to work around GCC inlining bugs"
  Revert "x86/refcount: Work around GCC inlining bug"
  Revert "x86/alternatives: Macrofy lock prefixes to work around GCC inlining bugs"
  Revert "x86/bug: Macrofy the BUG table section handling, to work around GCC inlining bugs"
  Revert "x86/paravirt: Work around GCC inlining bugs when compiling paravirt ops"
  Revert "x86/extable: Macrofy inline assembly code to work around GCC inlining bugs"
  Revert "x86/cpufeature: Macrofy inline assembly code to work around GCC inlining bugs"
  Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"
  x86/mtrr: Don't copy uninitialized gentry fields back to userspace
  x86/fsgsbase/64: Fix the base write helper functions
  x86/mm/cpa: Fix cpa_flush_array() TLB invalidation
  x86/vdso: Pass --eh-frame-hdr to the linker
  x86/mm: Fix decoy address handling vs 32-bit builds
  x86/intel_rdt: Ensure a CPU remains online for the region's pseudo-locking sequence
  x86/dump_pagetables: Fix LDT remap address marker
  x86/mm: Fix guard hole handling
2018-12-21 09:22:24 -08:00
Masahiro Yamada 8636a1f967 treewide: surround Kconfig file paths with double quotes
The Kconfig lexer supports special characters such as '.' and '/' in
the parameter context. In my understanding, the reason is just to
support bare file paths in the source statement.

I do not see a good reason to complicate Kconfig for the room of
ambiguity.

The majority of code already surrounds file paths with double quotes,
and it makes sense since file paths are constant string literals.

Make it treewide consistent now.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
2018-12-22 00:25:54 +09:00
Robert Hoo a0aea130af KVM: x86: Add CPUID support for new instruction WBNOINVD
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 14:26:32 +01:00
Sean Christopherson 453eafbe65 KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines
Transitioning to/from a VMX guest requires KVM to manually save/load
the bulk of CPU state that the guest is allowed to direclty access,
e.g. XSAVE state, CR2, GPRs, etc...  For obvious reasons, loading the
guest's GPR snapshot prior to VM-Enter and saving the snapshot after
VM-Exit is done via handcoded assembly.  The assembly blob is written
as inline asm so that it can easily access KVM-defined structs that
are used to hold guest state, e.g. moving the blob to a standalone
assembly file would require generating defines for struct offsets.

The other relevant aspect of VMX transitions in KVM is the handling of
VM-Exits.  KVM doesn't employ a separate VM-Exit handler per se, but
rather treats the VMX transition as a mega instruction (with many side
effects), i.e. sets the VMCS.HOST_RIP to a label immediately following
VMLAUNCH/VMRESUME.  The label is then exposed to C code via a global
variable definition in the inline assembly.

Because of the global variable, KVM takes steps to (attempt to) ensure
only a single instance of the owning C function, e.g. vmx_vcpu_run, is
generated by the compiler.  The earliest approach placed the inline
assembly in a separate noinline function[1].  Later, the assembly was
folded back into vmx_vcpu_run() and tagged with __noclone[2][3], which
is still used today.

After moving to __noclone, an edge case was encountered where GCC's
-ftracer optimization resulted in the inline assembly blob being
duplicated.  This was "fixed" by explicitly disabling -ftracer in the
__noclone definition[4].

Recently, it was found that disabling -ftracer causes build warnings
for unsuspecting users of __noclone[5], and more importantly for KVM,
prevents the compiler for properly optimizing vmx_vcpu_run()[6].  And
perhaps most importantly of all, it was pointed out that there is no
way to prevent duplication of a function with 100% reliability[7],
i.e. more edge cases may be encountered in the future.

So to summarize, the only way to prevent the compiler from duplicating
the global variable definition is to move the variable out of inline
assembly, which has been suggested several times over[1][7][8].

Resolve the aforementioned issues by moving the VMLAUNCH+VRESUME and
VM-Exit "handler" to standalone assembly sub-routines.  Moving only
the core VMX transition codes allows the struct indexing to remain as
inline assembly and also allows the sub-routines to be used by
nested_vmx_check_vmentry_hw().  Reusing the sub-routines has a happy
side-effect of eliminating two VMWRITEs in the nested_early_check path
as there is no longer a need to dynamically change VMCS.HOST_RIP.

Note that callers to vmx_vmenter() must account for the CALL modifying
RSP, e.g. must subtract op-size from RSP when synchronizing RSP with
VMCS.HOST_RSP and "restore" RSP prior to the CALL.  There are no great
alternatives to fudging RSP.  Saving RSP in vmx_enter() is difficult
because doing so requires a second register (VMWRITE does not provide
an immediate encoding for the VMCS field and KVM supports Hyper-V's
memory-based eVMCS ABI).  The other more drastic alternative would be
to use eschew VMCS.HOST_RSP and manually save/load RSP using a per-cpu
variable (which can be encoded as e.g. gs:[imm]).  But because a valid
stack is needed at the time of VM-Exit (NMIs aren't blocked and a user
could theoretically insert INT3/INT1ICEBRK at the VM-Exit handler), a
dedicated per-cpu VM-Exit stack would be required.  A dedicated stack
isn't difficult to implement, but it would require at least one page
per CPU and knowledge of the stack in the dumpstack routines.  And in
most cases there is essentially zero overhead in dynamically updating
VMCS.HOST_RSP, e.g. the VMWRITE can be avoided for all but the first
VMLAUNCH unless nested_early_check=1, which is not a fast path.  In
other words, avoiding the VMCS.HOST_RSP by using a dedicated stack
would only make the code marginally less ugly while requiring at least
one page per CPU and forcing the kernel to be aware (and approve) of
the VM-Exit stack shenanigans.

[1] cea15c24ca39 ("KVM: Move KVM context switch into own function")
[2] a3b5ba49a8 ("KVM: VMX: add the __noclone attribute to vmx_vcpu_run")
[3] 104f226bfd ("KVM: VMX: Fold __vmx_vcpu_run() into vmx_vcpu_run()")
[4] 95272c2937 ("compiler-gcc: disable -ftracer for __noclone functions")
[5] https://lkml.kernel.org/r/20181218140105.ajuiglkpvstt3qxs@treble
[6] https://patchwork.kernel.org/patch/8707981/#21817015
[7] https://lkml.kernel.org/r/ri6y38lo23g.fsf@suse.cz
[8] https://lkml.kernel.org/r/20181218212042.GE25620@tassilo.jf.intel.com

Suggested-by: Andi Kleen <ak@linux.intel.com>
Suggested-by: Martin Jambor <mjambor@suse.cz>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Martin Jambor <mjambor@suse.cz>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 12:02:50 +01:00
Sean Christopherson 051a2d3e59 KVM: VMX: Explicitly reference RCX as the vmx_vcpu pointer in asm blobs
Use '%% " _ASM_CX"' instead of '%0' to dereference RCX, i.e. the
'struct vcpu_vmx' pointer, in the VM-Enter asm blobs of vmx_vcpu_run()
and nested_vmx_check_vmentry_hw().  Using the symbolic name means that
adding/removing an output parameter(s) requires "rewriting" almost all
of the asm blob, which makes it nearly impossible to understand what's
being changed in even the most minor patches.

Opportunistically improve the code comments.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 12:02:43 +01:00
Sean Christopherson e814349950 KVM: x86: Use jmp to invoke kvm_spurious_fault() from .fixup
____kvm_handle_fault_on_reboot() provides a generic exception fixup
handler that is used to cleanly handle faults on VMX/SVM instructions
during reboot (or at least try to).  If there isn't a reboot in
progress, ____kvm_handle_fault_on_reboot() treats any exception as
fatal to KVM and invokes kvm_spurious_fault(), which in turn generates
a BUG() to get a stack trace and die.

When it was originally added by commit 4ecac3fd6d ("KVM: Handle
virtualization instruction #UD faults during reboot"), the "call" to
kvm_spurious_fault() was handcoded as PUSH+JMP, where the PUSH'd value
is the RIP of the faulting instructing.

The PUSH+JMP trickery is necessary because the exception fixup handler
code lies outside of its associated function, e.g. right after the
function.  An actual CALL from the .fixup code would show a slightly
bogus stack trace, e.g. an extra "random" function would be inserted
into the trace, as the return RIP on the stack would point to no known
function (and the unwinder will likely try to guess who owns the RIP).

Unfortunately, the JMP was replaced with a CALL when the macro was
reworked to not spin indefinitely during reboot (commit b7c4145ba2
"KVM: Don't spin on virt instruction faults during reboot").  This
causes the aforementioned behavior where a bogus function is inserted
into the stack trace, e.g. my builds like to blame free_kvm_area().

Revert the CALL back to a JMP.  The changelog for commit b7c4145ba2
("KVM: Don't spin on virt instruction faults during reboot") contains
nothing that indicates the switch to CALL was deliberate.  This is
backed up by the fact that the PUSH <insn RIP> was left intact.

Note that an alternative to the PUSH+JMP magic would be to JMP back
to the "real" code and CALL from there, but that would require adding
a JMP in the non-faulting path to avoid calling kvm_spurious_fault()
and would add no value, i.e. the stack trace would be the same.

Using CALL:

------------[ cut here ]------------
kernel BUG at /home/sean/go/src/kernel.org/linux/arch/x86/kvm/x86.c:356!
invalid opcode: 0000 [#1] SMP
CPU: 4 PID: 1057 Comm: qemu-system-x86 Not tainted 4.20.0-rc6+ #75
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:kvm_spurious_fault+0x5/0x10 [kvm]
Code: <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41
RSP: 0018:ffffc900004bbcc8 EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff888273fd8000 R08: 00000000000003e8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000784 R12: ffffc90000371fb0
R13: 0000000000000000 R14: 000000026d763cf4 R15: ffff888273fd8000
FS:  00007f3d69691700(0000) GS:ffff888277800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055f89bc56fe0 CR3: 0000000271a5a001 CR4: 0000000000362ee0
Call Trace:
 free_kvm_area+0x1044/0x43ea [kvm_intel]
 ? vmx_vcpu_run+0x156/0x630 [kvm_intel]
 ? kvm_arch_vcpu_ioctl_run+0x447/0x1a40 [kvm]
 ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm]
 ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm]
 ? __set_task_blocked+0x38/0x90
 ? __set_current_blocked+0x50/0x60
 ? __fpu__restore_sig+0x97/0x490
 ? do_vfs_ioctl+0xa1/0x620
 ? __x64_sys_futex+0x89/0x180
 ? ksys_ioctl+0x66/0x70
 ? __x64_sys_ioctl+0x16/0x20
 ? do_syscall_64+0x4f/0x100
 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
Modules linked in: vhost_net vhost tap kvm_intel kvm irqbypass bridge stp llc
---[ end trace 9775b14b123b1713 ]---

Using JMP:

------------[ cut here ]------------
kernel BUG at /home/sean/go/src/kernel.org/linux/arch/x86/kvm/x86.c:356!
invalid opcode: 0000 [#1] SMP
CPU: 6 PID: 1067 Comm: qemu-system-x86 Not tainted 4.20.0-rc6+ #75
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:kvm_spurious_fault+0x5/0x10 [kvm]
Code: <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41
RSP: 0018:ffffc90000497cd0 EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff88827058bd40 R08: 00000000000003e8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000784 R12: ffffc90000369fb0
R13: 0000000000000000 R14: 00000003c8fc6642 R15: ffff88827058bd40
FS:  00007f3d7219e700(0000) GS:ffff888277900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f3d64001000 CR3: 0000000271c6b004 CR4: 0000000000362ee0
Call Trace:
 vmx_vcpu_run+0x156/0x630 [kvm_intel]
 ? kvm_arch_vcpu_ioctl_run+0x447/0x1a40 [kvm]
 ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm]
 ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm]
 ? __set_task_blocked+0x38/0x90
 ? __set_current_blocked+0x50/0x60
 ? __fpu__restore_sig+0x97/0x490
 ? do_vfs_ioctl+0xa1/0x620
 ? __x64_sys_futex+0x89/0x180
 ? ksys_ioctl+0x66/0x70
 ? __x64_sys_ioctl+0x16/0x20
 ? do_syscall_64+0x4f/0x100
 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
Modules linked in: vhost_net vhost tap kvm_intel kvm irqbypass bridge stp llc
---[ end trace f9daedb85ab3ddba ]---

Fixes: b7c4145ba2 ("KVM: Don't spin on virt instruction faults during reboot")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 11:48:23 +01:00
Uros Bizjak ac5ffda244 KVM/x86: Use SVM assembly instruction mnemonics instead of .byte streams
Recently the minimum required version of binutils was changed to 2.20,
which supports all SVM instruction mnemonics. The patch removes
all .byte #defines and uses real instruction mnemonics instead.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 11:28:44 +01:00
Lan Tianyu 71883a62fc KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()
Originally, flush tlb is done by slot_handle_level_range(). This patch
moves the flush directly to kvm_zap_gfn_range() when range flush is
available, so that only the requested range can be flushed.

Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 11:28:43 +01:00
Lan Tianyu 3cc5ea94de KVM/MMU: Flush tlb directly in kvm_set_pte_rmapp()
This patch is to flush tlb directly in kvm_set_pte_rmapp()
function when Hyper-V remote TLB flush is available, returning 0
so that kvm_mmu_notifier_change_pte() does not flush again.

Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 11:28:42 +01:00
Lan Tianyu 0cf853c5e2 KVM/MMU: Move tlb flush in kvm_set_pte_rmapp() to kvm_mmu_notifier_change_pte()
This patch is to move tlb flush in kvm_set_pte_rmapp() to
kvm_mmu_notifier_change_pte() in order to avoid redundant tlb flush.

Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 11:28:42 +01:00
Lan Tianyu 748c0e312f KVM: Make kvm_set_spte_hva() return int
The patch is to make kvm_set_spte_hva() return int and caller can
check return value to determine flush tlb or not.

Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 11:28:41 +01:00
Lan Tianyu c3134ce240 KVM: Replace old tlb flush function with new one to flush a specified range.
This patch is to replace kvm_flush_remote_tlbs() with kvm_flush_
remote_tlbs_with_address() in some functions without logic change.

Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 11:28:41 +01:00
Lan Tianyu 40ef75a758 KVM/MMU: Add tlb flush with range helper function
This patch is to add wrapper functions for tlb_remote_flush_with_range
callback and flush tlb directly in kvm_mmu_zap_collapsible_spte().
kvm_mmu_zap_collapsible_spte() returns flush request to the
slot_handle_leaf() and the latter does flush on demand. When
range flush is available, make kvm_mmu_zap_collapsible_spte()
to flush tlb with range directly to avoid returning range back
to slot_handle_leaf().

Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 11:28:40 +01:00
Lan Tianyu 1f3a3e46cc KVM/VMX: Add hv tlb range flush support
This patch is to register tlb_remote_flush_with_range callback with
hv tlb range flush interface.

Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 11:28:39 +01:00
Lan Tianyu cc4edae4b9 x86/hyper-v: Add HvFlushGuestAddressList hypercall support
Hyper-V provides HvFlushGuestAddressList() hypercall to flush EPT tlb
with specified ranges. This patch is to add the hypercall support.

Reviewed-by:  Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 11:28:39 +01:00
Lan Tianyu a49b96352e KVM: Add tlb_remote_flush_with_range callback in kvm_x86_ops
Add flush range call back in the kvm_x86_ops and platform can use it
to register its associated function. The parameter "kvm_tlb_range"
accepts a single range and flush list which contains a list of ranges.

Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 11:28:38 +01:00
Luwei Kang ee85dec2fe KVM: x86: Disable Intel PT when VMXON in L1 guest
Currently, Intel Processor Trace do not support tracing in L1 guest
VMX operation(IA32_VMX_MISC[bit 14] is 0). As mentioned in SDM,
on these type of processors, execution of the VMXON instruction will
clears IA32_RTIT_CTL.TraceEn and any attempt to write IA32_RTIT_CTL
causes a general-protection exception (#GP).

Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 11:28:38 +01:00
Chao Peng b08c28960f KVM: x86: Set intercept for Intel PT MSRs read/write
To save performance overhead, disable intercept Intel PT MSRs
read/write when Intel PT is enabled in guest.
MSR_IA32_RTIT_CTL is an exception that will always be intercepted.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 11:28:37 +01:00
Chao Peng bf8c55d8dc KVM: x86: Implement Intel PT MSRs read/write emulation
This patch implement Intel Processor Trace MSRs read/write
emulation.
Intel PT MSRs read/write need to be emulated when Intel PT
MSRs is intercepted in guest and during live migration.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 11:28:36 +01:00
Luwei Kang 6c0f0bba85 KVM: x86: Introduce a function to initialize the PT configuration
Initialize the Intel PT configuration when cpuid update.
Include cpuid inforamtion, rtit_ctl bit mask and the number of
address ranges.

Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 11:28:36 +01:00
Chao Peng 2ef444f160 KVM: x86: Add Intel PT context switch for each vcpu
Load/Store Intel Processor Trace register in context switch.
MSR IA32_RTIT_CTL is loaded/stored automatically from VMCS.
In Host-Guest mode, we need load/resore PT MSRs only when PT
is enabled in guest.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 11:28:35 +01:00
Chao Peng 86f5201df0 KVM: x86: Add Intel Processor Trace cpuid emulation
Expose Intel Processor Trace to guest only when
the PT works in Host-Guest mode.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 11:28:35 +01:00
Chao Peng f99e3daf94 KVM: x86: Add Intel PT virtualization work mode
Intel Processor Trace virtualization can be work in one
of 2 possible modes:

a. System-Wide mode (default):
   When the host configures Intel PT to collect trace packets
   of the entire system, it can leave the relevant VMX controls
   clear to allow VMX-specific packets to provide information
   across VMX transitions.
   KVM guest will not aware this feature in this mode and both
   host and KVM guest trace will output to host buffer.

b. Host-Guest mode:
   Host can configure trace-packet generation while in
   VMX non-root operation for guests and root operation
   for native executing normally.
   Intel PT will be exposed to KVM guest in this mode, and
   the trace output to respective buffer of host and guest.
   In this mode, tht status of PT will be saved and disabled
   before VM-entry and restored after VM-exit if trace
   a virtual machine.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 11:28:34 +01:00
Luwei Kang e0018afec5 perf/x86/intel/pt: add new capability for Intel PT
This adds support for "output to Trace Transport subsystem"
capability of Intel PT. It means that PT can output its
trace to an MMIO address range rather than system memory buffer.

Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 11:28:33 +01:00
Luwei Kang 69843a913f perf/x86/intel/pt: Add new bit definitions for PT MSRs
Add bit definitions for Intel PT MSRs to support trace output
directed to the memeory subsystem and holds a count if packet
bytes that have been sent out.

These are required by the upcoming PT support in KVM guests
for MSRs read/write emulation.

Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 11:28:33 +01:00
Luwei Kang 61be2998ca perf/x86/intel/pt: Introduce intel_pt_validate_cap()
intel_pt_validate_hw_cap() validates whether a given PT capability is
supported by the hardware. It checks the PT capability array which
reflects the capabilities of the hardware on which the code is executed.

For setting up PT for KVM guests this is not correct as the capability
array for the guest can be different from the host array.

Provide a new function to check against a given capability array.

Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 11:28:32 +01:00
Chao Peng f6d079ce86 perf/x86/intel/pt: Export pt_cap_get()
pt_cap_get() is required by the upcoming PT support in KVM guests.

Export it and move the capabilites enum to a global header.

As a global functions, "pt_*" is already used for ptrace and
other things, so it makes sense to use "intel_pt_*" as a prefix.

Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 11:28:32 +01:00
Chao Peng 887eda13b5 perf/x86/intel/pt: Move Intel PT MSRs bit defines to global header
The Intel Processor Trace (PT) MSR bit defines are in a private
header. The upcoming support for PT virtualization requires these defines
to be accessible from KVM code.

Move them to the global MSR header file.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 11:28:31 +01:00
Wei Yang bdd303cb1b KVM: fix some typos
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
[Preserved the iff and a probably intentional weird bracket notation.
 Also dropped the style change to make a single-purpose patch. - Radim]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-12-21 11:28:26 +01:00
Peng Hao 649472a169 x86/kvmclock: convert to SPDX identifiers
Update the verbose license text with the matching SPDX
license identifier.

Signed-off-by: Peng Hao <peng.hao2@zte.com.cn>
[Changed deprecated GPL-2.0+ to GPL-2.0-or-later. - Radim]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-12-21 11:28:25 +01:00
Sean Christopherson 9b7ebff23c KVM: x86: Remove KF() macro placeholder
Although well-intentioned, keeping the KF() definition as a hint for
handling scattered CPUID features may be counter-productive.  Simply
redefining the bit position only works for directly manipulating the
guest's CPUID leafs, e.g. it doesn't make guest_cpuid_has() magically
work.  Taking an alternative approach, e.g. ensuring the bit position
is identical between the Linux-defined and hardware-defined features,
may be a simpler and/or more effective method of exposing scattered
features to the guest.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-12-21 11:28:25 +01:00
Jim Mattson 788fc1e9ad kvm: vmx: Allow guest read access to IA32_TSC
Let the guest read the IA32_TSC MSR with the generic RDMSR instruction
as well as the specific RDTSC(P) instructions. Note that the hardware
applies the TSC multiplier and offset (when applicable) to the result of
RDMSR(IA32_TSC), just as it does to the result of RDTSC(P).

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Marc Orr <marcorr@google.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-12-21 11:28:24 +01:00
Jim Mattson 9ebdfe5230 kvm: nVMX: NMI-window and interrupt-window exiting should wake L2 from HLT
According to the SDM, "NMI-window exiting" VM-exits wake a logical
processor from the same inactive states as would an NMI and
"interrupt-window exiting" VM-exits wake a logical processor from the
same inactive states as would an external interrupt. Specifically, they
wake a logical processor from the shutdown state and from the states
entered using the HLT and MWAIT instructions.

Fixes: 6dfacadd58 ("KVM: nVMX: Add support for activity state HLT")
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
[Squashed comments of two Jim's patches and used the simplified code
 hunk provided by Sean. - Radim]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-12-21 11:28:24 +01:00
Tambe, William e081354d6a KVM: nSVM: Fix nested guest support for PAUSE filtering.
Currently, the nested guest's PAUSE intercept intentions are not being
honored.  Instead, since the L0 hypervisor's pause_filter_count and
pause_filter_thresh values are still in place, these values are used
instead of those programmed in the VMCB by the L1 hypervisor.

To honor the desired PAUSE intercept support of the L1 hypervisor, the L0
hypervisor must use the PAUSE filtering fields of the L1 hypervisor. This
requires saving and restoring of both the L0 and L1 hypervisor's PAUSE
filtering fields.

Signed-off-by: William Tambe <william.tambe@amd.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-12-21 11:28:23 +01:00
YueHaibing ba7424b200 KVM: VMX: Remove duplicated include from vmx.c
Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-12-21 11:28:21 +01:00
Vitaly Kuznetsov e87555e550 KVM: x86: svm: report MSR_IA32_MCG_EXT_CTL as unsupported
AMD doesn't seem to implement MSR_IA32_MCG_EXT_CTL and svm code in kvm
knows nothing about it, however, this MSR is among emulated_msrs and
thus returned with KVM_GET_MSR_INDEX_LIST. The consequent KVM_GET_MSRS,
of course, fails.

Report the MSR as unsupported to not confuse userspace.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-12-21 11:28:20 +01:00
Paolo Bonzini ed8e481227 KVM: x86: fix size of x86_fpu_cache objects
The memory allocation in b666a4b697 ("kvm: x86: Dynamically allocate
guest_fpu", 2018-11-06) is wrong, there are other members in struct fpu
before the fpregs_state union and the patch should be doing something
similar to the code in fpu__init_task_struct_size.  It's enough to run
a guest and then rmmod kvm to see slub errors which are actually caused
by memory corruption.

For now let's revert it to sizeof(struct fpu), which is conservative.
I have plans to move fsave/fxsave/xsave directly in KVM, without using
the kernel FPU helpers, and once it's done, the size of the object in
the cache will be something like kvm_xstate_size.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21 11:28:19 +01:00
David S. Miller 2be09de7d6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Lots of conflicts, by happily all cases of overlapping
changes, parallel adds, things of that nature.

Thanks to Stephen Rothwell, Saeed Mahameed, and others
for their guidance in these resolutions.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 11:53:36 -08:00
David Howells e262e32d6b vfs: Suppress MS_* flag defs within the kernel unless explicitly enabled
Only the mount namespace code that implements mount(2) should be using the
MS_* flags.  Suppress them inside the kernel unless uapi/linux/mount.h is
included.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: David Howells <dhowells@redhat.com>
2018-12-20 16:32:56 +00:00
Sinan Kaya 5d32a66541 PCI/ACPI: Allow ACPI to be built without CONFIG_PCI set
We are compiling PCI code today for systems with ACPI and no PCI
device present. Remove the useless code and reduce the tight
dependency.

Signed-off-by: Sinan Kaya <okaya@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # PCI parts
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-12-20 10:19:49 +01:00
Joerg Roedel 03ebe48e23 Merge branches 'iommu/fixes', 'arm/renesas', 'arm/mediatek', 'arm/tegra', 'arm/omap', 'arm/smmu', 'x86/vt-d', 'x86/amd' and 'core' into next 2018-12-20 10:05:20 +01:00
Steven Rostedt (VMware) d2a68c4eff x86/ftrace: Do not call function graph from dynamic trampolines
Since commit 79922b8009 ("ftrace: Optimize function graph to be
called directly"), dynamic trampolines should not be calling the
function graph tracer at the end. If they do, it could cause the function
graph tracer to trace functions that it filtered out.

Right now it does not cause a problem because there's a test to check if
the function graph tracer is attached to the same function as the
function tracer, which for now is true. But the function graph tracer is
undergoing changes that can make this no longer true which will cause
the function graph tracer to trace other functions.

 For example:

 # cd /sys/kernel/tracing/
 # echo do_IRQ > set_ftrace_filter
 # mkdir instances/foo
 # echo ip_rcv > instances/foo/set_ftrace_filter
 # echo function_graph > current_tracer
 # echo function > instances/foo/current_tracer

Would cause the function graph tracer to trace both do_IRQ and ip_rcv,
if the current tests change.

As the current tests prevent this from being a problem, this code does
not need to be backported. But it does make the code cleaner.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-12-19 22:43:37 -05:00
Vitaly Kuznetsov 3cf85f9f6b KVM: x86: nSVM: fix switch to guest mmu
Recent optimizations in MMU code broke nested SVM with NPT in L1
completely: when we do nested_svm_{,un}init_mmu_context() we want
to switch from TDP MMU to shadow MMU, both init_kvm_tdp_mmu() and
kvm_init_shadow_mmu() check if re-configuration is needed by looking
at cache source data. The data, however, doesn't change - it's only
the type of the MMU which changes. We end up not re-initializing
guest MMU as shadow and everything goes off the rails.

The issue could have been fixed by putting MMU type into extended MMU
role but this is not really needed. We can just split root and guest MMUs
the exact same way we did for nVMX, their types never change in the
lifetime of a vCPU.

There is still room for improvement: currently, we reset all MMU roots
when switching from L1 to L2 and back and this is not needed.

Fixes: 7dcd575520 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-19 22:19:22 +01:00
Ingo Molnar 6ac389346e Revert "kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs"
This reverts commit 77b0bf55bc.

See this commit for details about the revert:

  e769742d35 ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"")

 Conflicts:
	arch/x86/Makefile

Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Richard Biener <rguenther@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-19 12:00:28 +01:00
Ingo Molnar 96af6cd02a Revert "x86/objtool: Use asm macros to work around GCC inlining bugs"
This reverts commit c06c4d8090.

See this commit for details about the revert:

  e769742d35 ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"")

Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Richard Biener <rguenther@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-19 12:00:23 +01:00
Ingo Molnar ac180540b0 Revert "x86/refcount: Work around GCC inlining bug"
This reverts commit 9e1725b410.

See this commit for details about the revert:

  e769742d35 ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"")

The conflict resolution for interaction with:

  288e4521f0f6: ("x86/asm: 'Simplify' GEN_*_RMWcc() macros")

was provided by Masahiro Yamada.

 Conflicts:
	arch/x86/include/asm/refcount.h

Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Richard Biener <rguenther@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-19 12:00:09 +01:00
Ingo Molnar 851a4cd7cc Revert "x86/alternatives: Macrofy lock prefixes to work around GCC inlining bugs"
This reverts commit 77f48ec28e.

See this commit for details about the revert:

  e769742d35 ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"")

Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Richard Biener <rguenther@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-19 12:00:04 +01:00
Ingo Molnar ffb61c6346 Revert "x86/bug: Macrofy the BUG table section handling, to work around GCC inlining bugs"
This reverts commit f81f8ad56f.

See this commit for details about the revert:

  e769742d35 ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"")

Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Richard Biener <rguenther@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-19 12:00:00 +01:00
Ingo Molnar a4da3d86a2 Revert "x86/paravirt: Work around GCC inlining bugs when compiling paravirt ops"
This reverts commit 494b5168f2.

See this commit for details about the revert:

  e769742d35 ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"")

Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Richard Biener <rguenther@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-19 11:59:55 +01:00
Ingo Molnar 81a68455e7 Revert "x86/extable: Macrofy inline assembly code to work around GCC inlining bugs"
This reverts commit 0474d5d9d2.

See this commit for details about the revert:

  e769742d35 ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"")

Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Richard Biener <rguenther@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-19 11:59:47 +01:00
Ingo Molnar c3462ba986 Revert "x86/cpufeature: Macrofy inline assembly code to work around GCC inlining bugs"
This reverts commit d5a581d84a.

See this commit for details about the revert:

  e769742d35 ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"")

Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Richard Biener <rguenther@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-19 11:59:21 +01:00
Ingo Molnar e769742d35 Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"
This reverts commit 5bdcd510c2.

The macro based workarounds for GCC's inlining bugs caused regressions: distcc
and other distro build setups broke, and the fixes are not easy nor will they
solve regressions on already existing installations.

So we are reverting this patch and the 8 followup patches.

What makes this revert easier is that GCC9 will likely include the new 'asm inline'
syntax that makes inlining of assembly blocks a lot more robust.

This is a superior method to any macro based hackeries - and might even be
backported to GCC8, which would make all modern distros get the inlining
fixes as well.

Many thanks to Masahiro Yamada and others for helping sort out these problems.

Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Richard Biener <rguenther@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-19 11:58:10 +01:00
Borislav Petkov 72a8f089c3 x86/mce: Restore MCE injector's module name
It was mce-inject.ko but it turned into inject.ko since the containing
source file got renamed. Restore it.

Fixes: 21afaf1813 ("x86/mce: Streamline MCE subsystem's naming")
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20181218182546.GA21386@zn.tnic
2018-12-19 00:04:36 +01:00
Colin Ian King 32043fa065 x86/mtrr: Don't copy uninitialized gentry fields back to userspace
Currently the copy_to_user of data in the gentry struct is copying
uninitiaized data in field _pad from the stack to userspace.

Fix this by explicitly memset'ing gentry to zero, this also will zero any
compiler added padding fields that may be in struct (currently there are
none).

Detected by CoverityScan, CID#200783 ("Uninitialized scalar variable")

Fixes: b263b31e8a ("x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Cc: security@kernel.org
Link: https://lkml.kernel.org/r/20181218172956.1440-1-colin.king@canonical.com
2018-12-19 00:00:16 +01:00
Eduardo Habkost 0e1b869fff kvm: x86: Add AMD's EX_CFG to the list of ignored MSRs
Some guests OSes (including Windows 10) write to MSR 0xc001102c
on some cases (possibly while trying to apply a CPU errata).
Make KVM ignore reads and writes to that MSR, so the guest won't
crash.

The MSR is documented as "Execution Unit Configuration (EX_CFG)",
at AMD's "BIOS and Kernel Developer's Guide (BKDG) for AMD Family
15h Models 00h-0Fh Processors".

Cc: stable@vger.kernel.org
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-18 22:15:44 +01:00
Wanpeng Li dcbd3e49c2 KVM: X86: Fix NULL deref in vcpu_scan_ioapic
Reported by syzkaller:

    CPU: 1 PID: 5962 Comm: syz-executor118 Not tainted 4.20.0-rc6+ #374
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:kvm_apic_hw_enabled arch/x86/kvm/lapic.h:169 [inline]
    RIP: 0010:vcpu_scan_ioapic arch/x86/kvm/x86.c:7449 [inline]
    RIP: 0010:vcpu_enter_guest arch/x86/kvm/x86.c:7602 [inline]
    RIP: 0010:vcpu_run arch/x86/kvm/x86.c:7874 [inline]
    RIP: 0010:kvm_arch_vcpu_ioctl_run+0x5296/0x7320 arch/x86/kvm/x86.c:8074
    Call Trace:
	 kvm_vcpu_ioctl+0x5c8/0x1150 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2596
	 vfs_ioctl fs/ioctl.c:46 [inline]
	 file_ioctl fs/ioctl.c:509 [inline]
	 do_vfs_ioctl+0x1de/0x1790 fs/ioctl.c:696
	 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:713
	 __do_sys_ioctl fs/ioctl.c:720 [inline]
	 __se_sys_ioctl fs/ioctl.c:718 [inline]
	 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
	 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
	 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The reason is that the testcase writes hyperv synic HV_X64_MSR_SINT14 msr
and triggers scan ioapic logic to load synic vectors into EOI exit bitmap.
However, irqchip is not initialized by this simple testcase, ioapic/apic
objects should not be accessed.

This patch fixes it by also considering whether or not apic is present.

Reported-by: syzbot+39810e6c400efadfef71@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-18 22:15:44 +01:00
Cfir Cohen c2dd5146e9 KVM: Fix UAF in nested posted interrupt processing
nested_get_vmcs12_pages() processes the posted_intr address in vmcs12. It
caches the kmap()ed page object and pointer, however, it doesn't handle
errors correctly: it's possible to cache a valid pointer, then release
the page and later dereference the dangling pointer.

I was able to reproduce with the following steps:

1. Call vmlaunch with valid posted_intr_desc_addr but an invalid
MSR_EFER. This causes nested_get_vmcs12_pages() to cache the kmap()ed
pi_desc_page and pi_desc. Later the invalid EFER value fails
check_vmentry_postreqs() which fails the first vmlaunch.

2. Call vmlanuch with a valid EFER but an invalid posted_intr_desc_addr
(I set it to 2G - 0x80). The second time we call nested_get_vmcs12_pages
pi_desc_page is unmapped and released and pi_desc_page is set to NULL
(the "shouldn't happen" clause). Due to the invalid
posted_intr_desc_addr, kvm_vcpu_gpa_to_page() fails and
nested_get_vmcs12_pages() returns. It doesn't return an error value so
vmlaunch proceeds. Note that at this time we have a dangling pointer in
vmx->nested.pi_desc and POSTED_INTR_DESC_ADDR in L0's vmcs.

3. Issue an IPI in L2 guest code. This triggers a call to
vmx_complete_nested_posted_interrupt() and pi_test_and_clear_on() which
dereferences the dangling pointer.

Vulnerable code requires nested and enable_apicv variables to be set to
true. The host CPU must also support posted interrupts.

Fixes: 5e2f30b756 "KVM: nVMX: get rid of nested_get_page()"
Cc: stable@vger.kernel.org
Reviewed-by: Andy Honig <ahonig@google.com>
Signed-off-by: Cfir Cohen <cfir@google.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-18 22:15:34 +01:00
Chang S. Bae 87ab4689ca x86/fsgsbase/64: Fix the base write helper functions
Andy spotted a regression in the fs/gs base helpers after the patch series
was committed. The helper functions which write fs/gs base are not just
writing the base, they are also changing the index. That's wrong and needs
to be separated because writing the base has not to modify the index.

While the regression is not causing any harm right now because the only
caller depends on that behaviour, it's a guarantee for subtle breakage down
the road.

Make the index explicitly changed from the caller, instead of including
the code in the helpers.

Subsequently, the task write helpers do not handle for the current task
anymore. The range check for a base value is also factored out, to minimize
code redundancy from the caller.

Fixes: b1378a561f ("x86/fsgsbase/64: Introduce FS/GS base helper functions")
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20181126195524.32179-1-chang.seok.bae@intel.com
2018-12-18 14:26:09 +01:00
Thomas Lendacky 20c3a2c33e x86/speculation: Add support for STIBP always-on preferred mode
Different AMD processors may have different implementations of STIBP.
When STIBP is conditionally enabled, some implementations would benefit
from having STIBP always on instead of toggling the STIBP bit through MSR
writes. This preference is advertised through a CPUID feature bit.

When conditional STIBP support is requested at boot and the CPU advertises
STIBP always-on mode as preferred, switch to STIBP "on" support. To show
that this transition has occurred, create a new spectre_v2_user_mitigation
value and a new spectre_v2_user_strings message. The new mitigation value
is used in spectre_v2_user_select_mitigation() to print the new mitigation
message as well as to return a new string from stibp_state().

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lkml.kernel.org/r/20181213230352.6937.74943.stgit@tlendack-t1.amdoffice.net
2018-12-18 14:13:33 +01:00
Hui Wang aa02ef099c x86/topology: Use total_cpus for max logical packages calculation
nr_cpu_ids can be limited on the command line via nr_cpus=. This can break the
logical package management because it results in a smaller number of packages
while in kdump kernel.

Check below case:
There is a two sockets system, each socket has 8 cores, which has 16 logical
cpus while HT was turn on.

 0  1  2  3  4  5  6  7     |    16 17 18 19 20 21 22 23
 cores on socket 0               threads on socket 0
 8  9 10 11 12 13 14 15     |    24 25 26 27 28 29 30 31
 cores on socket 1               threads on socket 1

While starting the kdump kernel with command line option nr_cpus=16 panic
was triggered on one of the cpus 24-31 eg. 26, then online cpu will be
1-15, 26(cpu 0 was disabled in kdump), ncpus will be 16 and
__max_logical_packages will be 1, but actually two packages were booted on.

This issue can reproduced by set kdump option nr_cpus=<real physical core
numbers>, and then trigger panic on last socket's thread, for example:

taskset -c 26 echo c > /proc/sysrq-trigger

Use total_cpus which will not be limited by nr_cpus command line to calculate
the value of __max_logical_packages.

Signed-off-by: Hui Wang <john.wanghui@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <guijianfeng@huawei.com>
Cc: <wencongyang2@huawei.com>
Cc: <douliyang1@huawei.com>
Cc: <qiaonuohan@huawei.com>
Link: https://lkml.kernel.org/r/20181107023643.22174-1-john.wanghui@huawei.com
2018-12-18 13:38:37 +01:00
Yangtao Li 6848ac7ca3 x86/mm/dump_pagetables: Use DEFINE_SHOW_ATTRIBUTE()
Use DEFINE_SHOW_ATTRIBUTE() instead of open coding it.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: keescook@chromium.org
Cc: luto@kernel.org
Cc: peterz@infradead.org
Cc: bp@alien8.de
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/20181119154334.18265-1-tiny.windzz@gmail.com
2018-12-18 13:05:54 +01:00
James Morris 5580b4a1a8 Merge branch 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity into next-integrity
From Mimi:

In Linux 4.19, a new LSM hook named security_kernel_load_data was
upstreamed, allowing LSMs and IMA to prevent the kexec_load
syscall.  Different signature verification methods exist for verifying
the kexec'ed kernel image.  This pull request adds additional support
in IMA to prevent loading unsigned kernel images via the kexec_load
syscall, independently of the IMA policy rules, based on the runtime
"secure boot" flag.  An initial IMA kselftest is included.

In addition, this pull request defines a new, separate keyring named
".platform" for storing the preboot/firmware keys needed for verifying
the kexec'ed kernel image's signature and includes the associated IMA
kexec usage of the ".platform" keyring.

(David Howell's and Josh Boyer's patches for reading the
preboot/firmware keys, which were previously posted for a different
use case scenario, are included here.)
2018-12-17 11:26:46 -08:00
Peter Zijlstra 3c567356db x86/mm/cpa: Rename @addrinarray to @numpages
The CPA_ARRAY interface works in single pages, and everything, except
in these 'few' locations is this variable called 'numpages'.

Remove this 'addrinarray' abberation and use 'numpages' consistently.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom.StDenis@amd.com
Cc: dave.hansen@intel.com
Link: http://lkml.kernel.org/r/20181203171043.695039210@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17 18:54:30 +01:00
Peter Zijlstra c38116bb94 x86/mm/cpa: Better use CLFLUSHOPT
Currently we issue an MFENCE before and after flushing a range. This
means that if we flush a bunch of single page ranges -- like with the
cpa array, we issue a whole bunch of superfluous MFENCEs.

Reorgainze the code a little to avoid this.

[ mingo: capitalize instructions, tweak changelog and comments. ]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom.StDenis@amd.com
Cc: dave.hansen@intel.com
Link: http://lkml.kernel.org/r/20181203171043.626999883@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17 18:54:29 +01:00
Peter Zijlstra fe0937b24f x86/mm/cpa: Fold cpa_flush_range() and cpa_flush_array() into a single cpa_flush() function
Note that the cache flush loop in cpa_flush_*() is identical when we
use __cpa_addr(); further observe that flush_tlb_kernel_range() is a
special case of to the cpa_flush_array() TLB invalidation code.

This then means the two functions are virtually identical. Fold these
two functions into a single cpa_flush() call.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom.StDenis@amd.com
Cc: dave.hansen@intel.com
Link: http://lkml.kernel.org/r/20181203171043.559855600@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17 18:54:28 +01:00
Peter Zijlstra 83b4e39146 x86/mm/cpa: Make cpa_data::numpages invariant
Make sure __change_page_attr_set_clr() doesn't modify cpa->numpages.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom.StDenis@amd.com
Cc: dave.hansen@intel.com
Link: http://lkml.kernel.org/r/20181203171043.493000228@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17 18:54:27 +01:00
Peter Zijlstra 935f583982 x86/mm/cpa: Optimize cpa_flush_array() TLB invalidation
Instead of punting and doing tlb_flush_all(), do the same as
flush_tlb_kernel_range() does and use single page invalidations.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom.StDenis@amd.com
Cc: dave.hansen@intel.com
Link: http://lkml.kernel.org/r/20181203171043.430001980@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17 18:54:26 +01:00
Peter Zijlstra 5fe26b7a8f x86/mm/cpa: Simplify the code after making cpa->vaddr invariant
Since cpa->vaddr is invariant, this means we can remove all
workarounds that deal with it changing.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom.StDenis@amd.com
Cc: dave.hansen@intel.com
Link: http://lkml.kernel.org/r/20181203171043.366619025@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17 18:54:25 +01:00
Peter Zijlstra 98bfc9b038 x86/mm/cpa: Make cpa_data::vaddr invariant
Currently __change_page_attr_set_clr() will modify cpa->vaddr when
!(CPA_ARRAY | CPA_PAGES_ARRAY), whereas in the array cases it will
increment cpa->curpage.

Change __cpa_addr() such that its @idx argument also works in the
!array case and use cpa->curpage increments for all cases.

NOTE: since cpa_data::numpages is 'unsigned long' so should cpa_data::curpage be.
NOTE: after this only cpa->numpages is still modified.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom.StDenis@amd.com
Cc: dave.hansen@intel.com
Link: http://lkml.kernel.org/r/20181203171043.295174892@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17 18:54:24 +01:00
Peter Zijlstra 16ebf031e8 x86/mm/cpa: Add __cpa_addr() helper
The code to compute the virtual address of a cpa_data is duplicated;
introduce a helper before more copies happen.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom.StDenis@amd.com
Cc: dave.hansen@intel.com
Link: http://lkml.kernel.org/r/20181203171043.229119497@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17 18:54:23 +01:00
Peter Zijlstra ecc729f1f4 x86/mm/cpa: Add ARRAY and PAGES_ARRAY selftests
The current pageattr-test code only uses the regular range interface,
add code that also tests the array and pages interface.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom.StDenis@amd.com
Cc: dave.hansen@intel.com
Link: http://lkml.kernel.org/r/20181203171043.162771364@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17 18:54:22 +01:00
Ingo Molnar 02117e42db Merge branch 'x86/urgent' into x86/mm, to pick up dependent fix
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17 18:48:25 +01:00
Peter Zijlstra 721066dfd4 x86/mm/cpa: Fix cpa_flush_array() TLB invalidation
In commit:

  a7295fd53c ("x86/mm/cpa: Use flush_tlb_kernel_range()")

I misread the CAP array code and incorrectly used
tlb_flush_kernel_range(), resulting in missing TLB flushes and
consequent failures.

Instead do a full invalidate in this case -- for now.

Reported-by: StDenis, Tom <Tom.StDenis@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Fixes: a7295fd53c ("x86/mm/cpa: Use flush_tlb_kernel_range()")
Link: http://lkml.kernel.org/r/20181203171043.089868285@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17 18:48:09 +01:00
Masami Hiramatsu 8162b3d1a7 kprobes/x86: Remove unneeded arch_within_kprobe_blacklist from x86
Remove x86 specific arch_within_kprobe_blacklist().

Since we have already added all blacklisted symbols to the
kprobe blacklist by arch_populate_kprobe_blacklist(),
we don't need arch_within_kprobe_blacklist() on x86
anymore.

Tested-by: Andrea Righi <righi.andrea@gmail.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yonghong Song <yhs@fb.com>
Link: http://lkml.kernel.org/r/154503491354.26176.13903264647254766066.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17 17:48:40 +01:00
Masami Hiramatsu fe6e656154 kprobes/x86: Show x86-64 specific blacklisted symbols correctly
Show x86-64 specific blacklisted symbols in debugfs.

Since x86-64 prohibits probing on symbols which are in
entry text, those should be shown.

Tested-by: Andrea Righi <righi.andrea@gmail.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yonghong Song <yhs@fb.com>
Link: http://lkml.kernel.org/r/154503488425.26176.17136784384033608516.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17 17:48:39 +01:00
Andrea Righi bf9445a33a kprobes/x86/xen: blacklist non-attachable xen interrupt functions
Blacklist symbols in Xen probe-prohibited areas, so that user can see
these prohibited symbols in debugfs.

See also: a50480cb6d.

Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-12-17 10:27:59 -05:00
Alistair Strachan cd01544a26 x86/vdso: Pass --eh-frame-hdr to the linker
Commit

  379d98ddf4 ("x86: vdso: Use $LD instead of $CC to link")

accidentally broke unwinding from userspace, because ld would strip the
.eh_frame sections when linking.

Originally, the compiler would implicitly add --eh-frame-hdr when
invoking the linker, but when this Makefile was converted from invoking
ld via the compiler, to invoking it directly (like vmlinux does),
the flag was missed. (The EH_FRAME section is important for the VDSO
shared libraries, but not for vmlinux.)

Fix the problem by explicitly specifying --eh-frame-hdr, which restores
parity with the old method.

See relevant bug reports for additional info:

  https://bugzilla.kernel.org/show_bug.cgi?id=201741
  https://bugzilla.redhat.com/show_bug.cgi?id=1659295

Fixes: 379d98ddf4 ("x86: vdso: Use $LD instead of $CC to link")
Reported-by: Florian Weimer <fweimer@redhat.com>
Reported-by: Carlos O'Donell <carlos@redhat.com>
Reported-by: "H. J. Lu" <hjl.tools@gmail.com>
Signed-off-by: Alistair Strachan <astrachan@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: kernel-team@android.com
Cc: Laura Abbott <labbott@redhat.com>
Cc: stable <stable@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: X86 ML <x86@kernel.org>
Link: https://lkml.kernel.org/r/20181214223637.35954-1-astrachan@google.com
2018-12-15 11:37:51 +01:00
Marc Orr b666a4b697 kvm: x86: Dynamically allocate guest_fpu
Previously, the guest_fpu field was embedded in the kvm_vcpu_arch
struct. Unfortunately, the field is quite large, (e.g., 4352 bytes on my
current setup). This bloats the kvm_vcpu_arch struct for x86 into an
order 3 memory allocation, which can become a problem on overcommitted
machines. Thus, this patch moves the fpu state outside of the
kvm_vcpu_arch struct.

With this patch applied, the kvm_vcpu_arch struct is reduced to 15168
bytes for vmx on my setup when building the kernel with kvmconfig.

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Marc Orr <marcorr@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 18:00:08 +01:00
Marc Orr 240c35a378 kvm: x86: Use task structs fpu field for user
Previously, x86's instantiation of 'struct kvm_vcpu_arch' added an fpu
field to save/restore fpu-related architectural state, which will differ
from kvm's fpu state. However, this is redundant to the 'struct fpu'
field, called fpu, embedded in the task struct, via the thread field.
Thus, this patch removes the user_fpu field from the kvm_vcpu_arch
struct and replaces it with the task struct's fpu field.

This change is significant because the fpu struct is actually quite
large. For example, on the system used to develop this patch, this
change reduces the size of the vcpu_vmx struct from 23680 bytes down to
19520 bytes, when building the kernel with kvmconfig. This reduction in
the size of the vcpu_vmx struct moves us closer to being able to
allocate the struct at order 2, rather than order 3.

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Marc Orr <marcorr@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 18:00:07 +01:00
Krish Sadhukhan 4e445aee96 KVM: nVMX: Move the checks for Guest Non-Register States to a separate helper function
.. to improve readability and maintainability, and to align the code as per
the layout of the checks in chapter "VM Entries" in Intel SDM vol 3C.

Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 18:00:06 +01:00
Krish Sadhukhan 254b2f3b0f KVM: nVMX: Move the checks for Host Control Registers and MSRs to a separate helper function
.. to improve readability and maintainability, and to align the code as per
the layout of the checks in chapter "VM Entries" in Intel SDM vol 3C.

Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 18:00:06 +01:00
Krish Sadhukhan 5fbf963400 KVM: nVMX: Move the checks for VM-Entry Control Fields to a separate helper function
.. to improve readability and maintainability, and to align the code as per
the layout of the checks in chapter "VM Entries" in Intel SDM vol 3C.

Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 18:00:05 +01:00
Krish Sadhukhan 61446ba75e KVM: nVMX: Move the checks for VM-Exit Control Fields to a separate helper function
.. to improve readability and maintainability, and to align the code as per
the layout of the checks in chapter "VM Entries" in Intel SDM vol 3C.

Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 18:00:04 +01:00
Sean Christopherson f9b245e182 KVM: nVMX: Remove param indirection from nested_vmx_check_msr_switch()
Passing the enum and doing an indirect lookup is silly when we can
simply pass the field directly.  Remove the "fast path" code in
nested_vmx_check_msr_switch_controls() as it's now nothing more than a
redundant check.

Remove the debug message rather than continue passing the enum for the
address field.  Having debug messages for the MSRs themselves is useful
as MSR legality is a huge space, whereas messing up a physical address
means the VMM is fundamentally broken.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 18:00:04 +01:00
Krish Sadhukhan 461b4ba4c7 KVM: nVMX: Move the checks for VM-Execution Control Fields to a separate helper function
.. to improve readability and maintainability, and to align the code as per
the layout of the checks in chapter "VM Entries" in Intel SDM vol 3C.

Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 18:00:03 +01:00
Krish Sadhukhan 16322a3b5e KVM: nVMX: Prepend "nested_vmx_" to check_vmentry_{pre,post}reqs()
.. as they are used only in nested vmx context.

Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 18:00:02 +01:00
Lan Tianyu 53963a70ac KVM/VMX: Check ept_pointer before flushing ept tlb
This patch is to initialize ept_pointer to INVALID_PAGE and check it
before flushing ept tlb. If ept_pointer is invalid, bypass the flush
request.

Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 18:00:02 +01:00
Krish Sadhukhan a0d4f80344 KVM nVMX: MSRs should not be stored if VM-entry fails during or after loading guest state
According to section "VM-entry Failures During or After Loading Guest State"
in Intel SDM vol 3C,

	"No MSRs are saved into the VM-exit MSR-store area."

when bit 31 of the exit reason is set.

Reported-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Suggested-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 18:00:01 +01:00
Jim Mattson e53d88af63 kvm: x86: Don't modify MSR_PLATFORM_INFO on vCPU reset
If userspace has provided a different value for this MSR (e.g with the
turbo bits set), the userspace-provided value should survive a vCPU
reset. For backwards compatibility, MSR_PLATFORM_INFO is initialized
in kvm_arch_vcpu_setup.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Drew Schmitt <dasch@google.com>
Cc: Abhiroop Dabral <adabral@paloaltonetworks.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 18:00:01 +01:00
Wei Huang 3d82c565a7 kvm: vmx: add cpu into VMX preemption timer bug list
This patch adds Intel "Xeon CPU E3-1220 V2", with CPUID.01H.EAX=0x000306A8,
into the list of known broken CPUs which fail to support VMX preemption
timer. This bug was found while running the APIC timer test of
kvm-unit-test on this specific CPU, even though the errata info can't be
located in the public domain for this CPU.

Signed-off-by: Wei Huang <wei@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 18:00:00 +01:00
Eduardo Habkost d7b09c827a kvm: x86: Report STIBP on GET_SUPPORTED_CPUID
Months ago, we have added code to allow direct access to MSR_IA32_SPEC_CTRL
to the guest, which makes STIBP available to guests.  This was implemented
by commits d28b387fb7 ("KVM/VMX: Allow direct access to
MSR_IA32_SPEC_CTRL") and b2ac58f905 ("KVM/SVM: Allow direct access to
MSR_IA32_SPEC_CTRL").

However, we never updated GET_SUPPORTED_CPUID to let userspace know that
STIBP can be enabled in CPUID.  Fix that by updating
kvm_cpuid_8000_0008_ebx_x86_features and kvm_cpuid_7_0_edx_x86_features.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 17:59:59 +01:00
Vitaly Kuznetsov 87a8d795b2 x86/hyper-v: Stop caring about EOI for direct stimers
Turns out we over-engineered Direct Mode for stimers a bit: unlike
traditional stimers where we may want to try to re-inject the message upon
EOI, Direct Mode stimers just set the irq in APIC and kvm_apic_set_irq()
fails only when APIC is disabled (see APIC_DM_FIXED case in
__apic_accept_irq()). Remove the redundant part.

Suggested-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 17:59:59 +01:00
Vitaly Kuznetsov 08a800ac25 x86/kvm/hyper-v: avoid open-coding stimer_mark_pending() in kvm_hv_notify_acked_sint()
stimers_pending optimization only helps us to avoid multiple
kvm_make_request() calls. This doesn't happen very often and these
calls are very cheap in the first place, remove open-coded version of
stimer_mark_pending() from kvm_hv_notify_acked_sint().

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 17:59:58 +01:00
Vitaly Kuznetsov 8644f771e0 x86/kvm/hyper-v: direct mode for synthetic timers
Turns out Hyper-V on KVM (as of 2016) will only use synthetic timers
if direct mode is available. With direct mode we notify the guest by
asserting APIC irq instead of sending a SynIC message.

The implementation uses existing vec_bitmap for letting lapic code
know that we're interested in the particular IRQ's EOI request. We assume
that the same APIC irq won't be used by the guest for both direct mode
stimer and as sint source (especially with AutoEOI semantics). It is
unclear how things should be handled if that's not true.

Direct mode is also somewhat less expensive; in my testing
stimer_send_msg() takes not less than 1500 cpu cycles and
stimer_notify_direct() can usually be done in 300-400. WS2016 without
Hyper-V, however, always sticks to non-direct version.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 17:59:57 +01:00
Vitaly Kuznetsov 6a058a1ead x86/kvm/hyper-v: use stimer config definition from hyperv-tlfs.h
As a preparation to implementing Direct Mode for Hyper-V synthetic
timers switch to using stimer config definition from hyperv-tlfs.h.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 17:59:57 +01:00
Vitaly Kuznetsov 0aa67255f5 x86/hyper-v: move synic/stimer control structures definitions to hyperv-tlfs.h
We implement Hyper-V SynIC and synthetic timers in KVM too so there's some
room for code sharing.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 17:59:56 +01:00
Vitaly Kuznetsov 2bc39970e9 x86/kvm/hyper-v: Introduce KVM_GET_SUPPORTED_HV_CPUID
With every new Hyper-V Enlightenment we implement we're forced to add a
KVM_CAP_HYPERV_* capability. While this approach works it is fairly
inconvenient: the majority of the enlightenments we do have corresponding
CPUID feature bit(s) and userspace has to know this anyways to be able to
expose the feature to the guest.

Add KVM_GET_SUPPORTED_HV_CPUID ioctl (backed by KVM_CAP_HYPERV_CPUID, "one
cap to rule them all!") returning all Hyper-V CPUID feature leaves.

Using the existing KVM_GET_SUPPORTED_CPUID doesn't seem to be possible:
Hyper-V CPUID feature leaves intersect with KVM's (e.g. 0x40000000,
0x40000001) and we would probably confuse userspace in case we decide to
return these twice.

KVM_CAP_HYPERV_CPUID's number is interim: we're intended to drop
KVM_CAP_HYPERV_STIMER_DIRECT and use its number instead.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 17:59:54 +01:00
Vitaly Kuznetsov e2e871ab2f x86/kvm/hyper-v: Introduce nested_get_evmcs_version() helper
The upcoming KVM_GET_SUPPORTED_HV_CPUID ioctl will need to return
Enlightened VMCS version in HYPERV_CPUID_NESTED_FEATURES.EAX when
it was enabled.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 17:59:54 +01:00
Vitaly Kuznetsov 220d6586ec x86/hyper-v: Drop HV_X64_CONFIGURE_PROFILER definition
BIT(13) in HYPERV_CPUID_FEATURES.EBX is described as "ConfigureProfiler" in
TLFS v4.0 but starting 5.0 it is replaced with 'Reserved'. As we don't
currently us it in kernel it can just be dropped.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 17:59:53 +01:00
Vitaly Kuznetsov a4987defc1 x86/hyper-v: Do some housekeeping in hyperv-tlfs.h
hyperv-tlfs.h is a bit messy: CPUID feature bits are not always sorted,
it's hard to get which CPUID they belong to, some items are duplicated
(e.g. HV_X64_MSR_CRASH_CTL_NOTIFY/HV_CRASH_CTL_CRASH_NOTIFY).

Do some housekeeping work. While on it, replace all (1 << X) with BIT(X)
macro.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 17:59:53 +01:00
Vitaly Kuznetsov ec08449172 x86/hyper-v: Mark TLFS structures packed
The TLFS structures are used for hypervisor-guest communication and must
exactly meet the specification.

Compilers can add alignment padding to structures or reorder struct members
for randomization and optimization, which would break the hypervisor ABI.

Mark the structures as packed to prevent this. 'struct hv_vp_assist_page'
and 'struct hv_enlightened_vmcs' need to be properly padded to support the
change.

Suggested-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Nadav Amit <nadav.amit@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 17:59:52 +01:00
Roman Kagan 7deec5e0df x86: kvm: hyperv: don't retry message delivery for periodic timers
The SynIC message delivery protocol allows the message originator to
request, should the message slot be busy, to be notified when it's free.

However, this is unnecessary and even undesirable for messages generated
by SynIC timers in periodic mode: if the period is short enough compared
to the time the guest spends in the timer interrupt handler, so the
timer ticks start piling up, the excessive interactions due to this
notification and retried message delivery only makes the things worse.

[This was observed, in particular, with Windows L2 guests setting
(temporarily) the periodic timer to 2 kHz, and spending hundreds of
microseconds in the timer interrupt handler due to several L2->L1 exits;
under some load in L0 this could exceed 500 us so the timer ticks
started to pile up and the guest livelocked.]

Relieve the situation somewhat by not retrying message delivery for
periodic SynIC timers.  This appears to remain within the "lazy" lost
ticks policy for SynIC timers as implemented in KVM.

Note that it doesn't solve the fundamental problem of livelocking the
guest with a periodic timer whose period is smaller than the time needed
to process a tick, but it makes it a bit less likely to be triggered.

Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 17:59:51 +01:00
Roman Kagan 3a0e773172 x86: kvm: hyperv: simplify SynIC message delivery
SynIC message delivery is somewhat overengineered: it pretends to follow
the ordering rules when grabbing the message slot, using atomic
operations and all that, but does it incorrectly and unnecessarily.

The correct order would be to first set .msg_pending, then atomically
replace .message_type if it was zero, and then clear .msg_pending if
the previous step was successful.  But this all is done in vcpu context
so the whole update looks atomic to the guest (it's assumed to only
access the message page from this cpu), and therefore can be done in
whatever order is most convenient (and is also the reason why the
incorrect order didn't trigger any bugs so far).

While at this, also switch to kvm_vcpu_{read,write}_guest_page, and drop
the no longer needed synic_clear_sint_msg_pending.

Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 17:59:51 +01:00
Peng Hao eb1ff0a913 kvm: x86: remove unnecessary recalculate_apic_map
In the previous code, the variable apic_sw_disabled influences
recalculate_apic_map. But in "KVM: x86: simplify kvm_apic_map"
(commit: 3b5a5ffa92),
the access to apic_sw_disabled in recalculate_apic_map has been
deleted.

Signed-off-by: Peng Hao <peng.hao2@zte.com.cn>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 17:59:50 +01:00
Peng Hao b2227ddec1 kvm: svm: remove unused struct definition
structure svm_init_data is never used. So remove it.

Signed-off-by: Peng Hao <peng.hao2@zte.com.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 17:59:50 +01:00
Jim Mattson 84c8c5b8f8 kvm: vmx: Skip all SYSCALL MSRs in setup_msrs() when !EFER.SCE
Like IA32_STAR, IA32_LSTAR and IA32_FMASK only need to contain guest
values on VM-entry when the guest is in long mode and EFER.SCE is set.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Marc Orr <marcorr@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 17:59:49 +01:00
Jim Mattson db31c8f5af kvm: vmx: Don't set hardware IA32_CSTAR MSR on VM-entry
SYSCALL raises #UD in compatibility mode on Intel CPUs, so it's
pointless to load the guest's IA32_CSTAR value into the hardware MSR.

IA32_CSTAR still provides 48 bits of storage on Intel CPUs that have
CPUID.80000001:EDX.LM[bit 29] set, so we cannot remove it from the
vmx_msr_index[] array.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 17:59:48 +01:00
Jim Mattson 898a811f14 kvm: vmx: Document the need for MSR_STAR in i386 builds
Add a comment explaining why MSR_STAR must be included in
vmx_msr_index[] even for i386 builds.

The elided comment has not been relevant since move_msr_up() was
introduced in commit a75beee6e4 ("KVM: VMX: Avoid saving and
restoring msrs on lightweight vmexit").

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 17:59:48 +01:00
Jim Mattson 0023ef39dc kvm: vmx: Set IA32_TSC_AUX for legacy mode guests
RDTSCP is supported in legacy mode as well as long mode. The
IA32_TSC_AUX MSR should be set to the correct guest value before
entering any guest that supports RDTSCP.

Fixes: 4e47c7a6d7 ("KVM: VMX: Add instruction rdtscp support for guest")
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Marc Orr <marcorr@google.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 17:59:47 +01:00
Sean Christopherson 55d2375e58 KVM: nVMX: Move nested code to dedicated files
From a functional perspective, this is (supposed to be) a straight
copy-paste of code.  Code was moved piecemeal to nested.c as not all
code that could/should be moved was obviously nested-only.  The nested
code was then re-ordered as needed to compile, i.e. stats may not show
this is being a "pure" move despite there not being any intended changes
in functionality.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 17:59:46 +01:00
Sean Christopherson 7c97fcb3b6 KVM: VMX: Expose nested_vmx_allowed() to nested VMX as a non-inline
Exposing only the function allows @nested, i.e. the module param, to be
statically defined in vmx.c, ensuring we aren't unnecessarily checking
said variable in the nested code.  nested_vmx_allowed() is exposed due
to the need to verify nested support in vmx_{get,set}_nested_state().
The downside is that nested_vmx_allowed() likely won't be inlined in
vmx_{get,set}_nested_state(), but that should be a non-issue as they're
not a hot path.  Keeping vmx_{get,set}_nested_state() in vmx.c isn't a
viable option as they need access to several nested-only functions.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 17:59:45 +01:00
Sean Christopherson 97b7ead392 KVM: VMX: Expose various getters and setters to nested VMX
...as they're used directly by the nested code.  This will allow
moving the bulk of the nested code out of vmx.c without concurrent
changes to vmx.h.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 17:18:01 +01:00
Sean Christopherson cf3646eb3a KVM: VMX: Expose misc variables needed for nested VMX
Exposed vmx_msr_index, vmx_return and host_efer via vmx.h so that the
nested code can be moved out of vmx.c.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 17:18:01 +01:00
Sean Christopherson ff241486ac KVM: nVMX: Move "vmcs12 to shadow/evmcs sync" to helper function
...so that the function doesn't need to be created when moving the
nested code out of vmx.c.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 17:18:00 +01:00
Sean Christopherson 3e8eacccae KVM: nVMX: Call nested_vmx_setup_ctls_msrs() iff @nested is true
...so that it doesn't need access to @nested. The only case where the
provided struct isn't already zeroed is the call from vmx_create_vcpu()
as setup_vmcs_config() zeroes the struct in the other use cases.  This
will allow @nested to be statically defined in vmx.c, i.e. this removes
the last direct reference from nested code.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 17:17:59 +01:00
Sean Christopherson e4027cfafd KVM: nVMX: Set callbacks for nested functions during hardware setup
...in nested-specific code so that they can eventually be moved out of
vmx.c, e.g. into nested.c.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 17:17:58 +01:00
Sean Christopherson a3203381ca KVM: VMX: Move the hardware {un}setup functions to the bottom
...so that future patches can reference e.g. @kvm_vmx_exit_handlers
without having to simultaneously move a big chunk of code.  Speaking
from experience, resolving merge conflicts is an absolute nightmare
without pre-moving the code.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 17:17:58 +01:00
Sean Christopherson 5158917c7b KVM: x86: nVMX: Allow nested_enable_evmcs to be NULL
...so that it can conditionally set by the VMX code, i.e. iff @nested is
true.  This will in turn allow it to be moved out of vmx.c and into a
nested-specified file.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 17:17:57 +01:00
Sean Christopherson 944c346453 KVM: VMX: Move nested hardware/vcpu {un}setup to helper functions
Eventually this will allow us to move the nested VMX code out of vmx.c.
Note that this also effectively wraps @enable_shadow_vmcs with @nested
so that it too can be moved out of vmx.c.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 17:17:56 +01:00
Sean Christopherson 89b0c9f583 KVM: VMX: Move VMX instruction wrappers to a dedicated header file
VMX has a few hundred lines of code just to wrap various VMX specific
instructions, e.g. VMWREAD, INVVPID, etc...  Move them to a dedicated
header so it's easier to find/isolate the boilerplate.

With this change, more inlines can be moved from vmx.c to vmx.h.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 17:17:27 +01:00
Sean Christopherson 75edce8a45 KVM: VMX: Move eVMCS code to dedicated files
The header, evmcs.h, already exists and contains a fair amount of code,
but there are a few pieces in vmx.c that can be moved verbatim.  In
addition, move an array definition to evmcs.c to prepare for multiple
consumers of evmcs.h.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 14:00:06 +01:00
Sean Christopherson 8373d25d25 KVM: VMX: Add vmx.h to hold VMX definitions
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 14:00:01 +01:00
Sean Christopherson 609363cf81 KVM: nVMX: Move vmcs12 code to dedicated files
vmcs12 is the KVM-defined struct used to track a nested VMCS, e.g. a
VMCS created by L1 for L2.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 12:34:30 +01:00
Sean Christopherson cb1d474b32 KVM: VMX: Move VMCS definitions to dedicated file
This isn't intended to be a pure reflection of hardware, e.g. struct
loaded_vmcs and struct vmcs_host_state are KVM-defined constructs.
Similar to capabilities.h, this is a standalone file to avoid circular
dependencies between yet-to-be-created vmx.h and nested.h files.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 12:34:29 +01:00
Sean Christopherson 2c4fd91d26 KVM: VMX: Expose various module param vars via capabilities.h
Expose the variables associated with various module params that are
needed by the nested VMX code.  There is no ulterior logic for what
variables are/aren't exposed, this is purely "what's needed by the
nested code".

Note that @nested is intentionally not exposed.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 12:34:29 +01:00
Sean Christopherson 3077c19108 KVM: VMX: Move capabilities structs and helpers to dedicated file
Defining a separate capabilities.h as opposed to putting this code in
e.g. vmx.h avoids circular dependencies between (the yet-to-be-added)
vmx.h and nested.h.  The aforementioned circular dependencies are why
struct nested_vmx_msrs also resides in capabilities instead of e.g.
nested.h.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 12:34:28 +01:00
Sean Christopherson 7caaa71108 KVM: VMX: Pass vmx_capability struct to setup_vmcs_config()
...instead of referencing the global struct.  This will allow moving
setup_vmcs_config() to a separate file that may not have access to
the global variable.  Modify nested_vmx_setup_ctls_msrs() appropriately
since vmx_capability.ept may not be accurate when called by
vmx_check_processor_compat().

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 12:34:27 +01:00
Sean Christopherson c73da3fcab KVM: VMX: Properly handle dynamic VM Entry/Exit controls
EFER and PERF_GLOBAL_CTRL MSRs have dedicated VM Entry/Exit controls
that KVM dynamically toggles based on whether or not the guest's value
for each MSRs differs from the host.  Handle the dynamic behavior by
adding a helper that clears the dynamic bits so the bits aren't set
when initializing the VMCS field outside of the dynamic toggling flow.
This makes the handling consistent with similar behavior for other
controls, e.g. pin, exec and sec_exec.  More importantly, it eliminates
two global bools that are stealthily modified by setup_vmcs_config.

Opportunistically clean up a comment and print related to errata for
IA32_PERF_GLOBAL_CTRL.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 12:34:26 +01:00
Sean Christopherson 71d9409e20 KVM: VMX: Move caching of MSR_IA32_XSS to hardware_setup()
MSR_IA32_XSS has no relation to the VMCS whatsoever, it doesn't belong
in setup_vmcs_config() and its reference to host_xss prevents moving
setup_vmcs_config() to a dedicated file.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 12:34:26 +01:00
Sean Christopherson 4cebd747d7 KVM: VMX: Drop the "vmx" prefix from vmx_evmcs.h
VMX specific files now reside in a dedicated subdirectory, i.e. the
file name prefix is redundant.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 12:34:25 +01:00
Sean Christopherson e0123119a5 KVM: VMX: rename vmx_shadow_fields.h to vmcs_shadow_fields.h
VMX specific files now reside in a dedicated subdirectory.  Drop the
"vmx" prefix, which is redundant, and add a "vmcs" prefix to clarify
that the file is referring to VMCS shadow fields.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 12:34:24 +01:00
Sean Christopherson a821bab2d1 KVM: VMX: Move VMX specific files to a "vmx" subdirectory
...to prepare for shattering vmx.c into multiple files without having
to prepend "vmx_" to all new files.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 12:34:24 +01:00
Sean Christopherson 3592cda6bc KVM: x86: Add requisite includes to hyperv.h
Until this point vmx.c has been the only consumer and included the
file after many others.  Prepare for multiple consumers, i.e. the
shattering of vmx.c

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 12:34:23 +01:00
Sean Christopherson 8ba2e525ec KVM: x86: Add requisite includes to kvm_cache_regs.h
Until this point vmx.c has been the only consumer and included the
file after many others.  Prepare for multiple consumers, i.e. the
shattering of vmx.c

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 12:34:22 +01:00
Sean Christopherson 199b118ab3 KVM: VMX: Alphabetize the includes in vmx.c
...to prepare for the creation of a "vmx" subdirectory that will contain
a variety of headers.  Clean things up now to avoid making a bigger mess
in the future.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 12:34:21 +01:00
Sean Christopherson dfae3c03b8 KVM: nVMX: Allocate and configure VM{READ,WRITE} bitmaps iff enable_shadow_vmcs
...and make enable_shadow_vmcs depend on nested.  Aside from the obvious
memory savings, this will allow moving the relevant code out of vmx.c in
the future, e.g. to a nested specific file.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 12:34:21 +01:00
Sean Christopherson 1b3ab5ad1b KVM: nVMX: Free the VMREAD/VMWRITE bitmaps if alloc_kvm_area() fails
Fixes: 34a1cd60d1 ("kvm: x86: vmx: move some vmx setting from vmx_init() to hardware_setup()")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 12:34:20 +01:00
Paolo Bonzini 2a31b9db15 kvm: introduce manual dirty log reprotect
There are two problems with KVM_GET_DIRTY_LOG.  First, and less important,
it can take kvm->mmu_lock for an extended period of time.  Second, its user
can actually see many false positives in some cases.  The latter is due
to a benign race like this:

  1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
     them.
  2. The guest modifies the pages, causing them to be marked ditry.
  3. Userspace actually copies the pages.
  4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
     they were not written to since (3).

This is especially a problem for large guests, where the time between
(1) and (3) can be substantial.  This patch introduces a new
capability which, when enabled, makes KVM_GET_DIRTY_LOG not
write-protect the pages it returns.  Instead, userspace has to
explicitly clear the dirty log bits just before using the content
of the page.  The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a
64-page granularity rather than requiring to sync a full memslot;
this way, the mmu_lock is taken for small amounts of time, and
only a small amount of time will pass between write protection
of pages and the sending of their content.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 12:34:19 +01:00
Paolo Bonzini 8fe65a8299 kvm: rename last argument to kvm_get_dirty_log_protect
When manual dirty log reprotect will be enabled, kvm_get_dirty_log_protect's
pointer argument will always be false on exit, because no TLB flush is needed
until the manual re-protection operation.  Rename it from "is_dirty" to "flush",
which more accurately tells the caller what they have to do with it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 12:34:18 +01:00
Paolo Bonzini e5d83c74a5 kvm: make KVM_CAP_ENABLE_CAP_VM architecture agnostic
The first such capability to be handled in virt/kvm/ will be manual
dirty page reprotection.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 12:34:18 +01:00
Paolo Bonzini bb22dc14a2 Merge branch 'khdr_fix' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest into HEAD
Merge topic branch from Shuah.
2018-12-14 12:33:31 +01:00
Christoph Hellwig 356da6d0cd dma-mapping: bypass indirect calls for dma-direct
Avoid expensive indirect calls in the fast path DMA mapping
operations by directly calling the dma_direct_* ops if we are using
the directly mapped DMA operations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com>
2018-12-13 21:06:18 +01:00
Christoph Hellwig 55897af630 dma-direct: merge swiotlb_dma_ops into the dma_direct code
While the dma-direct code is (relatively) clean and simple we actually
have to use the swiotlb ops for the mapping on many architectures due
to devices with addressing limits.  Instead of keeping two
implementations around this commit allows the dma-direct
implementation to call the swiotlb bounce buffering functions and
thus share the guts of the mapping implementation.  This also
simplified the dma-mapping setup on a few architectures where we
don't have to differenciate which implementation to use.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com>
2018-12-13 21:06:17 +01:00
Christoph Hellwig 3731c3d477 dma-mapping: always build the direct mapping code
All architectures except for sparc64 use the dma-direct code in some
form, and even for sparc64 we had the discussion of a direct mapping
mode a while ago.  In preparation for directly calling the direct
mapping code don't bother having it optionally but always build the
code in.  This is a minor hardship for some powerpc and arm configs
that don't pull it in yet (although they should in a relase ot two),
and sparc64 which currently doesn't need it at all, but it will
reduce the ifdef mess we'd otherwise need significantly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com>
2018-12-13 21:06:11 +01:00
Maran Wilson 716ff017a3 KVM: x86: Allow Qemu/KVM to use PVH entry point
For certain applications it is desirable to rapidly boot a KVM virtual
machine. In cases where legacy hardware and software support within the
guest is not needed, Qemu should be able to boot directly into the
uncompressed Linux kernel binary without the need to run firmware.

There already exists an ABI to allow this for Xen PVH guests and the ABI
is supported by Linux and FreeBSD:

   https://xenbits.xen.org/docs/unstable/misc/pvh.html

This patch enables Qemu to use that same entry point for booting KVM
guests.

Signed-off-by: Maran Wilson <maran.wilson@oracle.com>
Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Suggested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-12-13 13:41:49 -05:00
Maran Wilson a43fb7da53 xen/pvh: Move Xen code for getting mem map via hcall out of common file
We need to refactor PVH entry code so that support for other hypervisors
like Qemu/KVM can be added more easily.

The original design for PVH entry in Xen guests relies on being able to
obtain the memory map from the hypervisor using a hypercall. When we
extend the PVH entry ABI to support other hypervisors like Qemu/KVM,
a new mechanism will be added that allows the guest to get the memory
map without needing to use hypercalls.

For Xen guests, the hypercall approach will still be supported. In
preparation for adding support for other hypervisors, we can move the
code that uses hypercalls into the Xen specific file. This will allow us
to compile kernels in the future without CONFIG_XEN that are still capable
of being booted as a Qemu/KVM guest via the PVH entry point.

Signed-off-by: Maran Wilson <maran.wilson@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-12-13 13:41:49 -05:00
Maran Wilson 8cee3974b3 xen/pvh: Move Xen specific PVH VM initialization out of common file
We need to refactor PVH entry code so that support for other hypervisors
like Qemu/KVM can be added more easily.

This patch moves the small block of code used for initializing Xen PVH
virtual machines into the Xen specific file. This initialization is not
going to be needed for Qemu/KVM guests. Moving it out of the common file
is going to allow us to compile kernels in the future without CONFIG_XEN
that are still capable of being booted as a Qemu/KVM guest via the PVH
entry point.

Signed-off-by: Maran Wilson <maran.wilson@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-12-13 13:41:49 -05:00
Maran Wilson 4df7363e52 xen/pvh: Create a new file for Xen specific PVH code
We need to refactor PVH entry code so that support for other hypervisors
like Qemu/KVM can be added more easily.

The first step in that direction is to create a new file that will
eventually hold the Xen specific routines.

Signed-off-by: Maran Wilson <maran.wilson@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-12-13 13:41:49 -05:00
Maran Wilson fcd4747698 xen/pvh: Move PVH entry code out of Xen specific tree
Once hypervisors other than Xen start using the PVH entry point for
starting VMs, we would like the option of being able to compile PVH entry
capable kernels without enabling CONFIG_XEN and all the code that comes
along with that. To allow that, we are moving the PVH code out of Xen and
into files sitting at a higher level in the tree.

This patch is not introducing any code or functional changes, just moving
files from one location to another.

Signed-off-by: Maran Wilson <maran.wilson@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-12-13 13:41:49 -05:00
Maran Wilson 7733607fb3 xen/pvh: Split CONFIG_XEN_PVH into CONFIG_PVH and CONFIG_XEN_PVH
In order to pave the way for hypervisors other than Xen to use the PVH
entry point for VMs, we need to factor the PVH entry code into Xen specific
and hypervisor agnostic components. The first step in doing that, is to
create a new config option for PVH entry that can be enabled
independently from CONFIG_XEN.

Signed-off-by: Maran Wilson <maran.wilson@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-12-13 13:41:49 -05:00
Eric Biggers a033aed5a8 crypto: x86/chacha - yield the FPU occasionally
To improve responsiveness, yield the FPU (temporarily re-enabling
preemption) every 4 KiB encrypted/decrypted, rather than keeping
preemption disabled during the entire encryption/decryption operation.

Alternatively we could do this for every skcipher_walk step, but steps
may be small in some cases, and yielding the FPU is expensive on x86.

Suggested-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-13 18:24:58 +08:00
Eric Biggers 7a507d6225 crypto: x86/chacha - add XChaCha12 support
Now that the x86_64 SIMD implementations of ChaCha20 and XChaCha20 have
been refactored to support varying the number of rounds, add support for
XChaCha12.  This is identical to XChaCha20 except for the number of
rounds, which is 12 instead of 20.  This can be used by Adiantum.

Reviewed-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-13 18:24:58 +08:00
Eric Biggers 8b65f34c58 crypto: x86/chacha20 - refactor to allow varying number of rounds
In preparation for adding XChaCha12 support, rename/refactor the x86_64
SIMD implementations of ChaCha20 to support different numbers of rounds.

Reviewed-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-13 18:24:58 +08:00
Eric Biggers 4af7826187 crypto: x86/chacha20 - add XChaCha20 support
Add an XChaCha20 implementation that is hooked up to the x86_64 SIMD
implementations of ChaCha20.  This can be used by Adiantum.

An SSSE3 implementation of single-block HChaCha20 is also added so that
XChaCha20 can use it rather than the generic implementation.  This
required refactoring the ChaCha permutation into its own function.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-13 18:24:57 +08:00
Eric Biggers 0f961f9f67 crypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305
Add a 64-bit AVX2 implementation of NHPoly1305, an ε-almost-∆-universal
hash function used in the Adiantum encryption mode.  For now, only the
NH portion is actually AVX2-accelerated; the Poly1305 part is less
performance-critical so is just implemented in C.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-13 18:24:57 +08:00
Eric Biggers 012c82388c crypto: x86/nhpoly1305 - add SSE2 accelerated NHPoly1305
Add a 64-bit SSE2 implementation of NHPoly1305, an ε-almost-∆-universal
hash function used in the Adiantum encryption mode.  For now, only the
NH portion is actually SSE2-accelerated; the Poly1305 part is less
performance-critical so is just implemented in C.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-13 18:24:57 +08:00
Dan Williams 51c3fbd89d x86/mm: Fix decoy address handling vs 32-bit builds
A decoy address is used by set_mce_nospec() to update the cache attributes
for a page that may contain poison (multi-bit ECC error) while attempting
to minimize the possibility of triggering a speculative access to that
page.

When reserve_memtype() is handling a decoy address it needs to convert it
to its real physical alias. The conversion, AND'ing with __PHYSICAL_MASK,
is broken for a 32-bit physical mask and reserve_memtype() is passed the
last physical page. Gert reports triggering the:

    BUG_ON(start >= end);

...assertion when running a 32-bit non-PAE build on a platform that has
a driver resource at the top of physical memory:

    BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved

Given that the decoy address scheme is only targeted at 64-bit builds and
assumes that the top of physical address space is free for use as a decoy
address range, simply bypass address sanitization in the 32-bit case.

Lastly, there was no need to crash the system when this failure occurred,
and no need to crash future systems if the assumptions of decoy addresses
are ever violated. Change the BUG_ON() to a WARN() with an error return.

Fixes: 510ee090ab ("x86/mm/pat: Prepare {reserve, free}_memtype() for...")
Reported-by: Gert Robben <t2@gert.gr>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Gert Robben <t2@gert.gr>
Cc: stable@vger.kernel.org
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: platform-driver-x86@vger.kernel.org
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/154454337985.789277.12133288391664677775.stgit@dwillia2-desk3.amr.corp.intel.com
2018-12-11 18:28:20 -08:00
Reinette Chatre 52eb74339a x86/resctrl: Fix rdt_find_domain() return value and checks
rdt_find_domain() returns an ERR_PTR() that is generated from a provided
domain id when the value is negative.

Care needs to be taken when creating an ERR_PTR() from this value
because a subsequent check using IS_ERR() expects the error to
be within the MAX_ERRNO range. Using an invalid domain id as an
ERR_PTR() does work at this time since this is currently always -1.
Using this undocumented assumption is fragile since future users of
rdt_find_domain() may not be aware of thus assumption.

Two related issues are addressed:

- Ensure that rdt_find_domain() always returns a valid error value by
forcing the error to be -ENODEV when a negative domain id is provided.

- In a few instances the return value of rdt_find_domain() is just
checked for NULL - fix these to include a check of ERR_PTR.

Fixes: d89b737901 ("x86/intel_rdt/cqm: Add mon_data")
Fixes: 521348b011 ("x86/intel_rdt: Introduce utility to obtain CDP peer")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: fenghua.yu@intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/b88cd4ff6a75995bf8db9b0ea546908fe50f69f3.1544479852.git.reinette.chatre@intel.com
2018-12-11 22:09:28 +01:00
Reinette Chatre 80b71c340f x86/intel_rdt: Ensure a CPU remains online for the region's pseudo-locking sequence
The user triggers the creation of a pseudo-locked region when writing
the requested schemata to the schemata resctrl file. The pseudo-locking
of a region is required to be done on a CPU that is associated with the
cache on which the pseudo-locked region will reside. In order to run the
locking code on a specific CPU, the needed CPU has to be selected and
ensured to remain online during the entire locking sequence.

At this time, the cpu_hotplug_lock is not taken during the pseudo-lock
region creation and it is thus possible for a CPU to be selected to run
the pseudo-locking code and then that CPU to go offline before the
thread is able to run on it.

Fix this by ensuring that the cpu_hotplug_lock is taken while the CPU on
which code has to run needs to be controlled. Since the cpu_hotplug_lock
is always taken before rdtgroup_mutex the lock order is maintained.

Fixes: e0bdfe8e36 ("x86/intel_rdt: Support creation/removal of pseudo-locked region")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: stable <stable@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/b7b17432a80f95a1fa21a1698ba643014f58ad31.1544476425.git.reinette.chatre@intel.com
2018-12-11 21:59:01 +01:00
Robin Murphy a8a4c98fc9 x86/dma/amd-gart: Stop resizing dma_debug_entry pool
dma-debug is now capable of adding new entries to its pool on-demand if
the initial preallocation was insufficient, so the IOMMU_LEAK logic no
longer needs to explicitly change the pool size. This does lose it the
ability to save a couple of megabytes of RAM by reducing the pool size
below its default, but it seems unlikely that that is a realistic
concern these days (or indeed that anyone is actively debugging AGP
drivers' DMA usage any more). Getting rid of dma_debug_resize_entries()
will make room for further streamlining in the dma-debug code itself.

Removing the call reveals quite a lot of cruft which has been useless
for nearly a decade since commit 19c1a6f576 ("x86 gart: reimplement
IOMMU_LEAK feature by using DMA_API_DEBUG"), including the entire
'iommu=leak' parameter, which controlled nothing except whether
dma_debug_resize_entries() was called or not.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Qian Cai <cai@lca.pw>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-11 14:32:12 +01:00