The feature bit X86_FEATURE_MBA is detected via CPUID leaf 0x80000008
EBX Bit 06. This bit indicates the support of AMD's MBA feature.
This feature is supported by both Intel and AMD. But they are detected
in different CPUID leaves.
[ bp: s/cpuid/CPUID/g ]
Signed-off-by: Sherry Hurwitz <sherry.hurwitz@amd.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Dmitry Safonov <dima@arista.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <linux-doc@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Pu Wen <puwen@hygon.cn>
Cc: <qianyue.zj@alibaba-inc.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Rian Hunter <rian@alum.mit.edu>
Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: <xiaochen.shen@intel.com>
Link: https://lkml.kernel.org/r/20181121202811.4492-10-babu.moger@amd.com
The resource control feature is supported by both Intel and AMD. So,
rename CONFIG_INTEL_RDT to the vendor-neutral CONFIG_RESCTRL.
Now CONFIG_RESCTRL will be used for both Intel and AMD to enable
Resource Control support. Update the texts in config and condition
accordingly.
[ bp: Simplify Kconfig text. ]
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Dmitry Safonov <dima@arista.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <linux-doc@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Pu Wen <puwen@hygon.cn>
Cc: <qianyue.zj@alibaba-inc.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Rian Hunter <rian@alum.mit.edu>
Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: <xiaochen.shen@intel.com>
Link: https://lkml.kernel.org/r/20181121202811.4492-9-babu.moger@amd.com
Bring all the functions that are different between the vendors into the
resource structure and initialize them dynamically. Add _intel suffix to
the Intel-specific functions.
cbm_validate() which does cache bitmask validation, differs between the
vendors as AMD allows non-contiguous masks. So, use separate functions
for Intel and AMD.
[ bp: Massage commit message and fixup rdt_resource members' vertical
alignment. ]
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Dmitry Safonov <dima@arista.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <linux-doc@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Pu Wen <puwen@hygon.cn>
Cc: <qianyue.zj@alibaba-inc.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Rian Hunter <rian@alum.mit.edu>
Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: <xiaochen.shen@intel.com>
Link: https://lkml.kernel.org/r/20181121202811.4492-7-babu.moger@amd.com
Initialize the resource functions that are different between the
vendors. Some features are initialized differently between the vendors.
Add _intel suffix to Intel-specific functions.
For example, the MBA feature varies significantly between Intel and AMD.
Separate the initialization of these resource functions. That way we can
easily add AMD's functions later.
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Dmitry Safonov <dima@arista.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <linux-doc@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Pu Wen <puwen@hygon.cn>
Cc: <qianyue.zj@alibaba-inc.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Rian Hunter <rian@alum.mit.edu>
Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: <xiaochen.shen@intel.com>
Link: https://lkml.kernel.org/r/20181121202811.4492-6-babu.moger@amd.com
Separate the call sequence for rdt_quirks and MBA feature. This is in
preparation to handle vendor differences in these call sequences. Rename
the functions to make the flow a bit more meaningful.
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Dmitry Safonov <dima@arista.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <linux-doc@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Pu Wen <puwen@hygon.cn>
Cc: <qianyue.zj@alibaba-inc.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Rian Hunter <rian@alum.mit.edu>
Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: <xiaochen.shen@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20181121202811.4492-4-babu.moger@amd.com
New generation of AMD processors add support for RDT (or QOS) features.
Together, these features will be called RESCTRL. With more than one
vendors supporting these features, it seems more appropriate to rename
these files.
Create a new directory with the name 'resctrl' and move all the
intel_rdt files to the new directory. This way all the resctrl related
code resides inside one directory.
[ bp: Add SPDX identifier to the Makefile ]
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Dmitry Safonov <dima@arista.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <linux-doc@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Pu Wen <puwen@hygon.cn>
Cc: <qianyue.zj@alibaba-inc.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Rian Hunter <rian@alum.mit.edu>
Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: <xiaochen.shen@intel.com>
Link: https://lkml.kernel.org/r/20181121202811.4492-2-babu.moger@amd.com
show_regs() shows the CS in the CPU register instead of the value in
regs. This means that we'll probably print "CS: 0010" almost all
the time regardless of what was actually in CS when the kernel
malfunctioned. This gives a particularly confusing result if we
OOPSed due to an implicit supervisor access from user mode.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/4e36812b6e1e95236a812021d35cbf22746b5af6.1542841400.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The sequence
fpu->initialized = 1; /* step A */
preempt_disable(); /* step B */
fpu__restore(fpu);
preempt_enable();
in __fpu__restore_sig() is racy in regard to a context switch.
For 32bit frames, __fpu__restore_sig() prepares the FPU state within
fpu->state. To ensure that a context switch (switch_fpu_prepare() in
particular) does not modify fpu->state it uses fpu__drop() which sets
fpu->initialized to 0.
After fpu->initialized is cleared, the CPU's FPU state is not saved
to fpu->state during a context switch. The new state is loaded via
fpu__restore(). It gets loaded into fpu->state from userland and
ensured it is sane. fpu->initialized is then set to 1 in order to avoid
fpu__initialize() doing anything (overwrite the new state) which is part
of fpu__restore().
A context switch between step A and B above would save CPU's current FPU
registers to fpu->state and overwrite the newly prepared state. This
looks like a tiny race window but the Kernel Test Robot reported this
back in 2016 while we had lazy FPU support. Borislav Petkov made the
link between that report and another patch that has been posted. Since
the removal of the lazy FPU support, this race goes unnoticed because
the warning has been removed.
Disable bottom halves around the restore sequence to avoid the race. BH
need to be disabled because BH is allowed to run (even with preemption
disabled) and might invoke kernel_fpu_begin() by doing IPsec.
[ bp: massage commit message a bit. ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: kvm ML <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: stable@vger.kernel.org
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20181120102635.ddv3fvavxajjlfqk@linutronix.de
Link: https://lkml.kernel.org/r/20160226074940.GA28911@pd.tnic
Peter Anvin pointed out that commit:
ae7e1238e6 ("x86/boot: Add ACPI RSDP address to setup_header")
should be reverted as setup_header should only contain items set by the
legacy BIOS.
So revert said commit. Instead of fully reverting the dependent commit
of:
e7b66d16fe ("x86/acpi, x86/boot: Take RSDP address for boot params if available")
just remove the setup_header reference in order to replace it by
a boot_params in a followup patch.
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boris.ostrovsky@oracle.com
Cc: bp@alien8.de
Cc: daniel.kiper@oracle.com
Cc: sstabellini@kernel.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20181120072529.5489-2-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently, the code scanning the CPU equivalence table read from a
microcode container file assumes that it actually contains a terminating
zero entry.
Check also the size of this table to make sure that no reads past its
end happen, in case there's no terminating zero entry at the end of the
table.
[ bp: Adjust to new changes. ]
Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20181107170218.7596-16-bp@alien8.de
Convert the CPU equivalence table into a proper struct in preparation
for tracking also the size of this table.
[ bp: Have functions deal with struct equiv_cpu_table pointers only. Rediff. ]
Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20181107170218.7596-15-bp@alien8.de
Convert the late loading path to use the newly introduced microcode
container data checking functions as it was previously done for the
early loader.
[ bp: Keep header length addition in install_equiv_cpu_table() and rediff. ]
Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20181107170218.7596-14-bp@alien8.de
Make it size_t everywhere as this is what we get from cpio.
[ bp: Fix a smatch warning. ]
Originally-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20181107170218.7596-13-bp@alien8.de
Now that they have the required functionality, use them to verify the
equivalence table and each patch, thus making parse_container() more
readable.
Originally-by: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20181107170218.7596-12-bp@alien8.de
Add a verify_patch() function which tries to sanity-check many aspects
of a microcode patch supplied by an outside container before attempting
a load.
Prepend all sub-functions' names which verify an aspect of a microcode
patch with "__".
Call it in verify_and_add_patch() *before* looking at the microcode
header.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20181107170218.7596-7-bp@alien8.de
Rename the variable which contains the patch size read out from the
section header to sh_psize for better differentiation of all the "sizes"
in that function.
Also, improve the comment above it.
No functional changes.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20181107170218.7596-6-bp@alien8.de
Starting with family 0x15, the patch size verification is not needed
anymore. Thus get rid of the need to update this checking function with
each new family.
Keep the check for older families.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20181107170218.7596-5-bp@alien8.de
Add container and patch verification functions to the AMD microcode
update driver.
These functions check whether a passed buffer contains the relevant
structure, whether it isn't truncated and (for actual microcode patches)
whether the size of a patch is not too large for a particular CPU family.
By adding these checks as separate functions the actual microcode loading
code won't get interspersed with a lot of checks and so will be more
readable.
[ bp: Make all pr_err() calls into pr_debug() and drop the
verify_patch() bits. ]
Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/3014e96c82cd90761b4601bd2cfe59c4119e46a7.1529424596.git.mail@maciej.szmigiero.name
verify_patch_size() verifies whether the remaining size of the microcode
container file is large enough to contain a patch of the indicated size.
However, the section header length is not included in this indicated
size but it is present in the leftover file length so it should be
subtracted from the leftover file length before passing this value to
verify_patch_size().
[ bp: Split comment. ]
Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/6df43f4f6a28186a13a66e8d7e61143c5e1a2324.1529424596.git.mail@maciej.szmigiero.name
Add the proper includes and make smca_get_name() static.
Fix an actual bug too which the warning triggered:
arch/x86/kernel/cpu/mcheck/therm_throt.c:395:39: error: conflicting \
types for ‘smp_thermal_interrupt’
asmlinkage __visible void __irq_entry smp_thermal_interrupt(struct pt_regs *r)
^~~~~~~~~~~~~~~~~~~~~
In file included from arch/x86/kernel/cpu/mcheck/therm_throt.c:29:
./arch/x86/include/asm/traps.h:107:17: note: previous declaration of \
‘smp_thermal_interrupt’ was here
asmlinkage void smp_thermal_interrupt(void);
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Yi Wang <wang.yi59@zte.com.cn>
Cc: Michael Matz <matz@suse.de>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1811081633160.1549@nanos.tec.linutronix.de
Distros are concerned about totally disabling the kexec_load syscall.
As a compromise, the kexec_load syscall will only be disabled when
CONFIG_KEXEC_VERIFY_SIG is configured and the system is booted with
secureboot enabled.
This patch defines the new arch specific function called
arch_ima_get_secureboot() to retrieve the secureboot state of the system.
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Suggested-by: Seth Forshee <seth.forshee@canonical.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Peter Jones <pjones@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Commit e61d98d8da ("x64, x2apic/intr-remap: Intel vt-d, IOMMU
code reorganization") moved dma_remapping.h from drivers/pci/ to
current place. It is entirely VT-d specific, but uses a generic
name. This merges dma_remapping.h with include/linux/intel-iommu.h
and removes dma_remapping.h as the result.
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Sohil Mehta <sohil.mehta@intel.com>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Liu, Yi L <yi.l.liu@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When running function tracing on a Linux guest running on VMware
Workstation, the guest would crash. This is due to tracing of the
sched_clock internal call of the VMware vmware_sched_clock(), which
causes an infinite recursion within the tracing code (clock calls must
not be traced).
Make vmware_sched_clock() not traced by ftrace.
Fixes: 80e9a4f21f ("x86/vmware: Add paravirt sched clock")
Reported-by: GwanYeong Kim <gy741.kim@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Alok Kataria <akataria@vmware.com>
CC: GwanYeong Kim <gy741.kim@gmail.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
CC: Thomas Gleixner <tglx@linutronix.de>
CC: virtualization@lists.linux-foundation.org
CC: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20181109152207.4d3e7d70@gandalf.local.home
Add the PCI device IDs for family 17h model 30h, since they are needed
for accessing various registers via the data fabric/SMN interface.
Signed-off-by: Brian Woods <brian.woods@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Bjorn Helgaas <bhelgaas@google.com>
CC: Clemens Ladisch <clemens@ladisch.de>
CC: Guenter Roeck <linux@roeck-us.net>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: Jean Delvare <jdelvare@suse.com>
CC: Jia Zhang <qianyue.zj@alibaba-inc.com>
CC: <linux-hwmon@vger.kernel.org>
CC: <linux-pci@vger.kernel.org>
CC: Pu Wen <puwen@hygon.cn>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20181106200754.60722-4-brian.woods@amd.com
Add support for new processors which have multiple PCI root complexes
per data fabric/system management network interface. If there are (N)
multiple PCI roots per DF/SMN interface, then the PCI roots are
redundant (as far as SMN/DF access goes). For each DF/SMN interface:
map to the first available PCI root and skip the next N-1 PCI roots so
the following DF/SMN interface get mapped to a correct PCI root.
Ex:
DF/SMN 0 -> 60
40
20
00
DF/SMN 1 -> e0
c0
a0
80
Signed-off-by: Brian Woods <brian.woods@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Bjorn Helgaas <bhelgaas@google.com>
CC: Clemens Ladisch <clemens@ladisch.de>
CC: Guenter Roeck <linux@roeck-us.net>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: Jean Delvare <jdelvare@suse.com>
CC: Jia Zhang <qianyue.zj@alibaba-inc.com>
CC: <linux-hwmon@vger.kernel.org>
CC: <linux-pci@vger.kernel.org>
CC: Pu Wen <puwen@hygon.cn>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20181106200754.60722-3-brian.woods@amd.com
The threshold in tsc_read_refs() is constant which may favor slower CPUs
but may not be optimal for simple reading of reference on faster ones.
Hence make it proportional to tsc_khz when available to compensate for
this. The threshold guards against any disturbance like IRQs, NMIs, SMIs
or CPU stealing by host on guest systems so rename it accordingly and
fix comments as well.
Also on some systems there is noticeable DMI bus contention at some point
during boot keeping the readout failing (observed with about one in ~300
boots when testing). In that case retry also the second readout instead of
simply bailing out unrefined. Usually the next second the readout returns
fast just fine without any issues.
Signed-off-by: Daniel Vacek <neelx@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/1541437840-29293-1-git-send-email-neelx@redhat.com
vSMP dependency on pv_irq_ops has been removed some years ago, but the code
still deals with pv_irq_ops.
In short, "cap & ctl & (1 << 4)" is always returning 0, so all
PARAVIRT/PARAVIRT_XXL code related to that can be removed.
However, the rest of the code depends on CONFIG_PCI, so fix it accordingly.
Rename set_vsmp_pv_ops to set_vsmp_ctl as the original name does not make
sense anymore.
Signed-off-by: Eial Czerwacki <eial@scalemp.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Shai Fultheim <shai@scalemp.com>
Cc: Juergen Gross <jgross@suse.com>
Link: https://lkml.kernel.org/r/1541439114-28297-1-git-send-email-eial@scalemp.com
modify_ldt(2) leaves the old LDT mapped after switching over to the new
one. The old LDT gets freed and the pages can be re-used.
Leaving the mapping in place can have security implications. The mapping is
present in the userspace page tables and Meltdown-like attacks can read
these freed and possibly reused pages.
It's relatively simple to fix: unmap the old LDT and flush TLB before
freeing the old LDT memory.
This further allows to avoid flushing the TLB in map_ldt_struct() as the
slot is unmapped and flushed by unmap_ldt_struct() or has never been mapped
at all.
[ tglx: Massaged changelog and removed the needless line breaks ]
Fixes: f55f0501cb ("x86/pti: Put the LDT in its own PGD if PTI is on")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: dave.hansen@linux.intel.com
Cc: luto@kernel.org
Cc: peterz@infradead.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Cc: bhe@redhat.com
Cc: willy@infradead.org
Cc: linux-mm@kvack.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181026122856.66224-3-kirill.shutemov@linux.intel.com
The NFIT machine check handler uses the physical address from the mce
structure, and compares it against information in the ACPI NFIT table
to determine whether that location lies on an NVDIMM. The mce->addr
field however may not always be valid, and this is indicated by the
MCI_STATUS_ADDRV bit in the status field.
Export mce_usable_address() which already performs validation for the
address, and use it in the NFIT handler.
Fixes: 6839a6d96f ("nfit: do an ARS scrub on hitting a latent media error")
Reported-by: Robert Elliott <elliott@hpe.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Arnd Bergmann <arnd@arndb.de>
Cc: Dan Williams <dan.j.williams@intel.com>
CC: Dave Jiang <dave.jiang@intel.com>
CC: elliott@hpe.com
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: Len Brown <lenb@kernel.org>
CC: linux-acpi@vger.kernel.org
CC: linux-edac <linux-edac@vger.kernel.org>
CC: linux-nvdimm@lists.01.org
CC: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
CC: "Rafael J. Wysocki" <rjw@rjwysocki.net>
CC: Ross Zwisler <zwisler@kernel.org>
CC: stable <stable@vger.kernel.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Tony Luck <tony.luck@intel.com>
CC: x86-ml <x86@kernel.org>
CC: Yazen Ghannam <yazen.ghannam@amd.com>
Link: http://lkml.kernel.org/r/20181026003729.8420-2-vishal.l.verma@intel.com
The MCE handler for nfit devices is called for memory errors on a
Non-Volatile DIMM and adds the error location to a 'badblocks' list.
This list is used by the various NVDIMM drivers to avoid consuming known
poison locations during IO.
The MCE handler gets called for both corrected and uncorrectable errors.
Until now, both kinds of errors have been added to the badblocks list.
However, corrected memory errors indicate that the problem has already
been fixed by hardware, and the resulting interrupt is merely a
notification to Linux.
As far as future accesses to that location are concerned, it is
perfectly fine to use, and thus doesn't need to be included in the above
badblocks list.
Add a check in the nfit MCE handler to filter out corrected mce events,
and only process uncorrectable errors.
Fixes: 6839a6d96f ("nfit: do an ARS scrub on hitting a latent media error")
Reported-by: Omar Avelar <omar.avelar@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: Arnd Bergmann <arnd@arndb.de>
CC: Dan Williams <dan.j.williams@intel.com>
CC: Dave Jiang <dave.jiang@intel.com>
CC: elliott@hpe.com
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: Len Brown <lenb@kernel.org>
CC: linux-acpi@vger.kernel.org
CC: linux-edac <linux-edac@vger.kernel.org>
CC: linux-nvdimm@lists.01.org
CC: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
CC: "Rafael J. Wysocki" <rjw@rjwysocki.net>
CC: Ross Zwisler <zwisler@kernel.org>
CC: stable <stable@vger.kernel.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Tony Luck <tony.luck@intel.com>
CC: x86-ml <x86@kernel.org>
CC: Yazen Ghannam <yazen.ghannam@amd.com>
Link: http://lkml.kernel.org/r/20181026003729.8420-1-vishal.l.verma@intel.com
... to actually explain what the function is trying to do.
Reported-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <x86@kernel.org>
Link: http://lkml.kernel.org/r/20181101155314.30690-1-bp@alien8.de
get_scattered_cpuid_leaf() was added[1] to help KVM rebuild hardware-
defined leafs that are rearranged by Linux to avoid bloating the
x86_capability array. Eventually, the last consumer of the function was
removed[2], but the function itself was kept, perhaps even intentionally
as a form of documentation.
Remove get_scattered_cpuid_leaf() as it is currently not used by KVM.
Furthermore, simply rebuilding the "real" leaf does not resolve all of
KVM's woes when it comes to exposing a scattered CPUID feature, i.e.
keeping the function as documentation may be counter-productive in some
scenarios, e.g. when KVM needs to do more than simply expose the leaf.
[1] 47bdf3378d ("x86/cpuid: Provide get_scattered_cpuid_leaf()")
[2] b7b27aa011 ("KVM/x86: Update the reverse_cpuid list to include CPUID_7_EDX")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ingo Molnar <mingo@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20181105185725.18679-1-sean.j.christopherson@intel.com
The result of in_compat_syscall() can be pictured as:
x86 platform:
---------------------------------------------------
| Arch\syscall | 64-bit | ia32 | x32 |
|-------------------------------------------------|
| x86_64 | false | true | true |
|-------------------------------------------------|
| i686 | | <true> | |
---------------------------------------------------
Other platforms:
-------------------------------------------
| Arch\syscall | 64-bit | compat |
|-----------------------------------------|
| 64-bit | false | true |
|-----------------------------------------|
| 32-bit(?) | | <false> |
-------------------------------------------
As seen, the result of in_compat_syscall() on generic 32-bit platform
differs from i686.
There is no reason for in_compat_syscall() == true on native i686. It also
easy to misread code if the result on native 32-bit platform differs
between arches.
Because of that non arch-specific code has many places with:
if (IS_ENABLED(CONFIG_COMPAT) && in_compat_syscall())
in different variations.
It looks-like the only non-x86 code which uses in_compat_syscall() not
under CONFIG_COMPAT guard is in amd/amdkfd. But according to the commit
a18069c132 ("amdkfd: Disable support for 32-bit user processes"), it
actually should be disabled on native i686.
Rename in_compat_syscall() to in_32bit_syscall() for x86-specific code
and make in_compat_syscall() false under !CONFIG_COMPAT.
A follow on patch will clean up generic users which were forced to check
IS_ENABLED(CONFIG_COMPAT) with in_compat_syscall().
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-efi@vger.kernel.org
Cc: netdev@vger.kernel.org
Link: https://lkml.kernel.org/r/20181012134253.23266-2-dima@arista.com
When a memblock allocation APIs are called with align = 0, the alignment
is implicitly set to SMP_CACHE_BYTES.
Implicit alignment is done deep in the memblock allocator and it can
come as a surprise. Not that such an alignment would be wrong even
when used incorrectly but it is better to be explicit for the sake of
clarity and the prinicple of the least surprise.
Replace all such uses of memblock APIs with the 'align' parameter
explicitly set to SMP_CACHE_BYTES and stop implicit alignment assignment
in the memblock internal allocation functions.
For the case when memblock APIs are used via helper functions, e.g. like
iommu_arena_new_node() in Alpha, the helper functions were detected with
Coccinelle's help and then manually examined and updated where
appropriate.
The direct memblock APIs users were updated using the semantic patch below:
@@
expression size, min_addr, max_addr, nid;
@@
(
|
- memblock_alloc_try_nid_raw(size, 0, min_addr, max_addr, nid)
+ memblock_alloc_try_nid_raw(size, SMP_CACHE_BYTES, min_addr, max_addr,
nid)
|
- memblock_alloc_try_nid_nopanic(size, 0, min_addr, max_addr, nid)
+ memblock_alloc_try_nid_nopanic(size, SMP_CACHE_BYTES, min_addr, max_addr,
nid)
|
- memblock_alloc_try_nid(size, 0, min_addr, max_addr, nid)
+ memblock_alloc_try_nid(size, SMP_CACHE_BYTES, min_addr, max_addr, nid)
|
- memblock_alloc(size, 0)
+ memblock_alloc(size, SMP_CACHE_BYTES)
|
- memblock_alloc_raw(size, 0)
+ memblock_alloc_raw(size, SMP_CACHE_BYTES)
|
- memblock_alloc_from(size, 0, min_addr)
+ memblock_alloc_from(size, SMP_CACHE_BYTES, min_addr)
|
- memblock_alloc_nopanic(size, 0)
+ memblock_alloc_nopanic(size, SMP_CACHE_BYTES)
|
- memblock_alloc_low(size, 0)
+ memblock_alloc_low(size, SMP_CACHE_BYTES)
|
- memblock_alloc_low_nopanic(size, 0)
+ memblock_alloc_low_nopanic(size, SMP_CACHE_BYTES)
|
- memblock_alloc_from_nopanic(size, 0, min_addr)
+ memblock_alloc_from_nopanic(size, SMP_CACHE_BYTES, min_addr)
|
- memblock_alloc_node(size, 0, nid)
+ memblock_alloc_node(size, SMP_CACHE_BYTES, nid)
)
[mhocko@suse.com: changelog update]
[akpm@linux-foundation.org: coding-style fixes]
[rppt@linux.ibm.com: fix missed uses of implicit alignment]
Link: http://lkml.kernel.org/r/20181016133656.GA10925@rapoport-lnx
Link: http://lkml.kernel.org/r/1538687224-17535-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Paul Burton <paul.burton@mips.com> [MIPS]
Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move remaining definitions and declarations from include/linux/bootmem.h
into include/linux/memblock.h and remove the redundant header.
The includes were replaced with the semantic patch below and then
semi-automated removal of duplicated '#include <linux/memblock.h>
@@
@@
- #include <linux/bootmem.h>
+ #include <linux/memblock.h>
[sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
[sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
[sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Serge Semin <fancer.lancer@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The alloc_bootmem(size) is a shortcut for allocation of SMP_CACHE_BYTES
aligned memory. When the align parameter of memblock_alloc() is 0, the
alignment is implicitly set to SMP_CACHE_BYTES and thus alloc_bootmem(size)
and memblock_alloc(size, 0) are equivalent.
The conversion is done using the following semantic patch:
@@
expression size;
@@
- alloc_bootmem(size)
+ memblock_alloc(size, 0)
Link: http://lkml.kernel.org/r/1536927045-23536-22-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Serge Semin <fancer.lancer@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The alloc_bootmem_pages() function allocates PAGE_SIZE aligned memory.
memblock_alloc() with alignment set to PAGE_SIZE does exactly the same
thing.
The conversion is done using the following semantic patch:
@@
expression e;
@@
- alloc_bootmem_pages(e)
+ memblock_alloc(e, PAGE_SIZE)
Link: http://lkml.kernel.org/r/1536927045-23536-20-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Serge Semin <fancer.lancer@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When __alloc_bootmem_nopanic() is used with explicit lower limit for the
allocation it attempts to allocate memory at or above that limit and falls
back to allocation with no limit set.
The memblock_alloc_from_nopanic() does exactly the same thing and can be
used as a replacement for __alloc_bootmem_nopanic() is such cases.
Link: http://lkml.kernel.org/r/1536927045-23536-14-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Serge Semin <fancer.lancer@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The __alloc_bootmem_node_nopanic() attempts to allocate memory for a
specified node. If the allocation fails it then retries to allocate memory
from any node. Upon success, the allocated memory is set to 0.
The memblock_alloc_try_nid_nopanic() does exactly the same thing and can be
used instead.
Link: http://lkml.kernel.org/r/1536927045-23536-11-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Serge Semin <fancer.lancer@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The functions are equivalent, just the later does not require nobootmem
translation layer.
Link: http://lkml.kernel.org/r/1536927045-23536-10-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Serge Semin <fancer.lancer@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Building with -Wformat-nonliteral gives:
arch/x86/kernel/traps.c:334:2: warning: format not a string literal and no format arguments [-Wformat-nonliteral]
panic(message);
handle_stack_overflow() can only be called from two places (kernel/traps.c
and via inline asm in mm/fault.c), in both cases with a string not
containing format specifiers, so we might as well silence this warning
using "%s" as a format string.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181026222004.14193-1-linux@rasmusvillemoes.dk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
"sizeof(x)" is the canonical coding style used in arch/x86 most of the time.
Fix the few places that didn't follow the convention.
(Also do some whitespace cleanups in a few places while at it.)
[ mingo: Rewrote the changelog. ]
Signed-off-by: Jordan Borgner <mail@jordan-borgner.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181028125828.7rgammkgzep2wpam@JordanDesktop
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Merge updates from Andrew Morton:
- a few misc things
- ocfs2 updates
- most of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (132 commits)
hugetlbfs: dirty pages as they are added to pagecache
mm: export add_swap_extent()
mm: split SWP_FILE into SWP_ACTIVATED and SWP_FS
tools/testing/selftests/vm/map_fixed_noreplace.c: add test for MAP_FIXED_NOREPLACE
mm: thp: relocate flush_cache_range() in migrate_misplaced_transhuge_page()
mm: thp: fix mmu_notifier in migrate_misplaced_transhuge_page()
mm: thp: fix MADV_DONTNEED vs migrate_misplaced_transhuge_page race condition
mm/kasan/quarantine.c: make quarantine_lock a raw_spinlock_t
mm/gup: cache dev_pagemap while pinning pages
Revert "x86/e820: put !E820_TYPE_RAM regions into memblock.reserved"
mm: return zero_resv_unavail optimization
mm: zero remaining unavailable struct pages
tools/testing/selftests/vm/gup_benchmark.c: add MAP_HUGETLB option
tools/testing/selftests/vm/gup_benchmark.c: add MAP_SHARED option
tools/testing/selftests/vm/gup_benchmark.c: allow user specified file
tools/testing/selftests/vm/gup_benchmark.c: fix 'write' flag usage
mm/gup_benchmark.c: add additional pinning methods
mm/gup_benchmark.c: time put_page()
mm: don't raise MEMCG_OOM event due to failed high-order allocation
mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock
...
commit 124049decb ("x86/e820: put !E820_TYPE_RAM regions into
memblock.reserved") breaks movable_node kernel option because it changed
the memory gap range to reserved memblock. So, the node is marked as
Normal zone even if the SRAT has Hot pluggable affinity.
=====================================================================
kernel: BIOS-e820: [mem 0x0000180000000000-0x0000180fffffffff] usable
kernel: BIOS-e820: [mem 0x00001c0000000000-0x00001c0fffffffff] usable
...
kernel: reserved[0x12]#011[0x0000181000000000-0x00001bffffffffff], 0x000003f000000000 bytes flags: 0x0
...
kernel: ACPI: SRAT: Node 2 PXM 6 [mem 0x180000000000-0x1bffffffffff] hotplug
kernel: ACPI: SRAT: Node 3 PXM 7 [mem 0x1c0000000000-0x1fffffffffff] hotplug
...
kernel: Movable zone start for each node
kernel: Node 3: 0x00001c0000000000
kernel: Early memory node ranges
...
=====================================================================
The original issue is fixed by the former patches, so let's revert commit
124049decb ("x86/e820: put !E820_TYPE_RAM regions into
memblock.reserved").
Link: http://lkml.kernel.org/r/20181002143821.5112-4-msys.mizuma@gmail.com
Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Sync dtc with upstream version v1.4.7-14-gc86da84d30e4
- Work to get rid of direct accesses to struct device_node name and
type pointers in preparation for removing them. New helpers for
parsing DT cpu nodes and conversions to use the helpers. printk
conversions to %pOFn for printing DT node names. Most went thru
subystem trees, so this is the remainder.
- Fixes to DT child node lookups to actually be restricted to child
nodes instead of treewide.
- Refactoring of dtb targets out of arch code. This makes the support
more uniform and enables building all dtbs on c6x, microblaze, and
powerpc.
- Various DT binding updates for Renesas r8a7744 SoC
- Vendor prefixes for Facebook, OLPC
- Restructuring of some ARM binding docs moving some peripheral bindings
out of board/SoC binding files
- New "secure-chosen" binding for secure world settings on ARM
- Dual licensing of 2 DT IRQ binding headers
-----BEGIN PGP SIGNATURE-----
iQJEBAABCgAuFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAlvTKWYQHHJvYmhAa2Vy
bmVsLm9yZwAKCRD6+121jbxhw8J5EACMAnrTxWQmXfQXOZEVxztcFavH6LP8mh2e
7FZIZ38jzHXXvl81tAg1nBhzFUU/qtvqW8NDCZ9OBxKvp6PFDNhWu241ZodSB1Kw
MZWy2A9QC+qbHYCC+SB5gOT0+Py3v7LNCBa5/TxhbFd35THJM8X0FP7gmcCGX593
9Ml1rqawT4mK5XmCpczT0cXxyC4TgVtpfDWZH2KgJTR/kwXVQlOQOGZ8a1y/wrt7
8TLIe7Qy4SFRzjhwbSta1PUehyYfe4uTSsXIJ84kMvNMxinLXQtvd7t9TfsK8p/R
WjYUneJskVjtxVrMQfdV4MxyFL1YEt2mYcr0PMKIWxMCgGDAZsHPoUZmjyh/PrCI
uiZtEHn3fXpUZAV/xEHHNirJxYyQfHGiksAT+lPrUXYYLCcZ3ZmqiTEYhGoQAfH5
CQPMuxA6yXxp6bov6zJwZSTZtkXciju8aQRhUhlxIfHTqezmGYeql/bnWd+InNuR
upANLZBh6D2jTWzDyobconkCCLlVkSqDoqOx725mMl6hIcdH9d2jVX7hwRf077VI
5i3CyPSJOkSOLSdB8bAPYfBoaDtH2bthxieUrkkSbIjbwHO1H6a2lxPeG/zah0a3
ePMGhi7J84UM4VpJEi000cP+bhPumJtJrG7zxP7ldXdfAF436sQ6KRptlcpLpj5i
IwMhUQNH+g==
=335v
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull Devicetree updates from Rob Herring:
"A bit bigger than normal as I've been busy this cycle.
There's a few things with dependencies and a few things subsystem
maintainers didn't pick up, so I'm taking them thru my tree.
The fixes from Johan didn't get into linux-next, but they've been
waiting for some time now and they are what's left of what subsystem
maintainers didn't pick up.
Summary:
- Sync dtc with upstream version v1.4.7-14-gc86da84d30e4
- Work to get rid of direct accesses to struct device_node name and
type pointers in preparation for removing them. New helpers for
parsing DT cpu nodes and conversions to use the helpers. printk
conversions to %pOFn for printing DT node names. Most went thru
subystem trees, so this is the remainder.
- Fixes to DT child node lookups to actually be restricted to child
nodes instead of treewide.
- Refactoring of dtb targets out of arch code. This makes the support
more uniform and enables building all dtbs on c6x, microblaze, and
powerpc.
- Various DT binding updates for Renesas r8a7744 SoC
- Vendor prefixes for Facebook, OLPC
- Restructuring of some ARM binding docs moving some peripheral
bindings out of board/SoC binding files
- New "secure-chosen" binding for secure world settings on ARM
- Dual licensing of 2 DT IRQ binding headers"
* tag 'devicetree-for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (78 commits)
ARM: dt: relicense two DT binding IRQ headers
power: supply: twl4030-charger: fix OF sibling-node lookup
NFC: nfcmrvl_uart: fix OF child-node lookup
net: stmmac: dwmac-sun8i: fix OF child-node lookup
net: bcmgenet: fix OF child-node lookup
drm/msm: fix OF child-node lookup
drm/mediatek: fix OF sibling-node lookup
of: Add missing exports of node name compare functions
dt-bindings: Add OLPC vendor prefix
dt-bindings: misc: bk4: Add device tree binding for Liebherr's BK4 SPI bus
dt-bindings: thermal: samsung: Add SPDX license identifier
dt-bindings: clock: samsung: Add SPDX license identifiers
dt-bindings: timer: ostm: Add R7S9210 support
dt-bindings: phy: rcar-gen2: Add r8a7744 support
dt-bindings: can: rcar_can: Add r8a7744 support
dt-bindings: timer: renesas, cmt: Document r8a7744 CMT support
dt-bindings: watchdog: renesas-wdt: Document r8a7744 support
dt-bindings: thermal: rcar: Add device tree support for r8a7744
Documentation: dt: Add binding for /secure-chosen/stdout-path
dt-bindings: arm: zte: Move sysctrl bindings to their own doc
...
Pull siginfo updates from Eric Biederman:
"I have been slowly sorting out siginfo and this is the culmination of
that work.
The primary result is in several ways the signal infrastructure has
been made less error prone. The code has been updated so that manually
specifying SEND_SIG_FORCED is never necessary. The conversion to the
new siginfo sending functions is now complete, which makes it
difficult to send a signal without filling in the proper siginfo
fields.
At the tail end of the patchset comes the optimization of decreasing
the size of struct siginfo in the kernel from 128 bytes to about 48
bytes on 64bit. The fundamental observation that enables this is by
definition none of the known ways to use struct siginfo uses the extra
bytes.
This comes at the cost of a small user space observable difference.
For the rare case of siginfo being injected into the kernel only what
can be copied into kernel_siginfo is delivered to the destination, the
rest of the bytes are set to 0. For cases where the signal and the
si_code are known this is safe, because we know those bytes are not
used. For cases where the signal and si_code combination is unknown
the bits that won't fit into struct kernel_siginfo are tested to
verify they are zero, and the send fails if they are not.
I made an extensive search through userspace code and I could not find
anything that would break because of the above change. If it turns out
I did break something it will take just the revert of a single change
to restore kernel_siginfo to the same size as userspace siginfo.
Testing did reveal dependencies on preferring the signo passed to
sigqueueinfo over si->signo, so bit the bullet and added the
complexity necessary to handle that case.
Testing also revealed bad things can happen if a negative signal
number is passed into the system calls. Something no sane application
will do but something a malicious program or a fuzzer might do. So I
have fixed the code that performs the bounds checks to ensure negative
signal numbers are handled"
* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (80 commits)
signal: Guard against negative signal numbers in copy_siginfo_from_user32
signal: Guard against negative signal numbers in copy_siginfo_from_user
signal: In sigqueueinfo prefer sig not si_signo
signal: Use a smaller struct siginfo in the kernel
signal: Distinguish between kernel_siginfo and siginfo
signal: Introduce copy_siginfo_from_user and use it's return value
signal: Remove the need for __ARCH_SI_PREABLE_SIZE and SI_PAD_SIZE
signal: Fail sigqueueinfo if si_signo != sig
signal/sparc: Move EMT_TAGOVF into the generic siginfo.h
signal/unicore32: Use force_sig_fault where appropriate
signal/unicore32: Generate siginfo in ucs32_notify_die
signal/unicore32: Use send_sig_fault where appropriate
signal/arc: Use force_sig_fault where appropriate
signal/arc: Push siginfo generation into unhandled_exception
signal/ia64: Use force_sig_fault where appropriate
signal/ia64: Use the force_sig(SIGSEGV,...) in ia64_rt_sigreturn
signal/ia64: Use the generic force_sigsegv in setup_frame
signal/arm/kvm: Use send_sig_mceerr
signal/arm: Use send_sig_fault where appropriate
signal/arm: Use force_sig_fault where appropriate
...
Pull x86 vdso updates from Ingo Molnar:
"Two main changes:
- Cleanups, simplifications and CLOCK_TAI support (Thomas Gleixner)
- Improve code generation (Andy Lutomirski)"
* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vdso: Rearrange do_hres() to improve code generation
x86/vdso: Document vgtod_ts better
x86/vdso: Remove "memory" clobbers in the vDSO syscall fallbacks
x66/vdso: Add CLOCK_TAI support
x86/vdso: Move cycle_last handling into the caller
x86/vdso: Simplify the invalid vclock case
x86/vdso: Replace the clockid switch case
x86/vdso: Collapse coarse functions
x86/vdso: Collapse high resolution functions
x86/vdso: Introduce and use vgtod_ts
x86/vdso: Use unsigned int consistently for vsyscall_gtod_data:: Seq
x86/vdso: Enforce 64bit clocksource
x86/time: Implement clocksource_arch_init()
clocksource: Provide clocksource_arch_init()
Pull x86 pti updates from Ingo Molnar:
"The main changes:
- Make the IBPB barrier more strict and add STIBP support (Jiri
Kosina)
- Micro-optimize and clean up the entry code (Andy Lutomirski)
- ... plus misc other fixes"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation: Propagate information about RSB filling mitigation to sysfs
x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation
x86/speculation: Apply IBPB more strictly to avoid cross-process data leak
x86/speculation: Add RETPOLINE_AMD support to the inline asm CALL_NOSPEC variant
x86/CPU: Fix unused variable warning when !CONFIG_IA32_EMULATION
x86/pti/64: Remove the SYSCALL64 entry trampoline
x86/entry/64: Use the TSS sp2 slot for SYSCALL/SYSRET scratch space
x86/entry/64: Document idtentry
Pull x86 paravirt updates from Ingo Molnar:
"Two main changes:
- Remove no longer used parts of the paravirt infrastructure and put
large quantities of paravirt ops under a new config option
PARAVIRT_XXL=y, which is selected by XEN_PV only. (Joergen Gross)
- Enable PV spinlocks on Hyperv (Yi Sun)"
* 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/hyperv: Enable PV qspinlock for Hyper-V
x86/hyperv: Add GUEST_IDLE_MSR support
x86/paravirt: Clean up native_patch()
x86/paravirt: Prevent redefinition of SAVE_FLAGS macro
x86/xen: Make xen_reservation_lock static
x86/paravirt: Remove unneeded mmu related paravirt ops bits
x86/paravirt: Move the Xen-only pv_mmu_ops under the PARAVIRT_XXL umbrella
x86/paravirt: Move the pv_irq_ops under the PARAVIRT_XXL umbrella
x86/paravirt: Move the Xen-only pv_cpu_ops under the PARAVIRT_XXL umbrella
x86/paravirt: Move items in pv_info under PARAVIRT_XXL umbrella
x86/paravirt: Introduce new config option PARAVIRT_XXL
x86/paravirt: Remove unused paravirt bits
x86/paravirt: Use a single ops structure
x86/paravirt: Remove clobbers from struct paravirt_patch_site
x86/paravirt: Remove clobbers parameter from paravirt patch functions
x86/paravirt: Make paravirt_patch_call() and paravirt_patch_jmp() static
x86/xen: Add SPDX identifier in arch/x86/xen files
x86/xen: Link platform-pci-unplug.o only if CONFIG_XEN_PVHVM
x86/xen: Move pv specific parts of arch/x86/xen/mmu.c to mmu_pv.c
x86/xen: Move pv irq related functions under CONFIG_XEN_PV umbrella
Pull x86 mm updates from Ingo Molnar:
"Lots of changes in this cycle:
- Lots of CPA (change page attribute) optimizations and related
cleanups (Thomas Gleixner, Peter Zijstra)
- Make lazy TLB mode even lazier (Rik van Riel)
- Fault handler cleanups and improvements (Dave Hansen)
- kdump, vmcore: Enable kdumping encrypted memory with AMD SME
enabled (Lianbo Jiang)
- Clean up VM layout documentation (Baoquan He, Ingo Molnar)
- ... plus misc other fixes and enhancements"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
x86/stackprotector: Remove the call to boot_init_stack_canary() from cpu_startup_entry()
x86/mm: Kill stray kernel fault handling comment
x86/mm: Do not warn about PCI BIOS W+X mappings
resource: Clean it up a bit
resource: Fix find_next_iomem_res() iteration issue
resource: Include resource end in walk_*() interfaces
x86/kexec: Correct KEXEC_BACKUP_SRC_END off-by-one error
x86/mm: Remove spurious fault pkey check
x86/mm/vsyscall: Consider vsyscall page part of user address space
x86/mm: Add vsyscall address helper
x86/mm: Fix exception table comments
x86/mm: Add clarifying comments for user addr space
x86/mm: Break out user address space handling
x86/mm: Break out kernel address space handling
x86/mm: Clarify hardware vs. software "error_code"
x86/mm/tlb: Make lazy TLB mode lazier
x86/mm/tlb: Add freed_tables element to flush_tlb_info
x86/mm/tlb: Add freed_tables argument to flush_tlb_mm_range
smp,cpumask: introduce on_each_cpu_cond_mask
smp: use __cpumask_set_cpu in on_each_cpu_cond
...
Pull x86 grub2 updates from Ingo Molnar:
"This extends the x86 boot protocol to include an address for the RSDP
table - utilized by Xen currently.
Matching Grub2 patches are pending as well. (Juergen Gross)"
* 'x86-grub2-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/acpi, x86/boot: Take RSDP address for boot params if available
x86/boot: Add ACPI RSDP address to setup_header
x86/xen: Fix boot loader version reported for PVH guests
Pull x86 cpu updates from Ingo Molnar:
"The main changes in this cycle were:
- Add support for the "Dhyana" x86 CPUs by Hygon: these are licensed
based on the AMD Zen architecture, and are built and sold in China,
for domestic datacenter use. The code is pretty close to AMD
support, mostly with a few quirks and enumeration differences. (Pu
Wen)
- Enable CPUID support on Cyrix 6x86/6x86L processors"
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tools/cpupower: Add Hygon Dhyana support
cpufreq: Add Hygon Dhyana support
ACPI: Add Hygon Dhyana support
x86/xen: Add Hygon Dhyana support to Xen
x86/kvm: Add Hygon Dhyana support to KVM
x86/mce: Add Hygon Dhyana support to the MCA infrastructure
x86/bugs: Add Hygon Dhyana to the respective mitigation machinery
x86/apic: Add Hygon Dhyana support
x86/pci, x86/amd_nb: Add Hygon Dhyana support to PCI and northbridge
x86/amd_nb: Check vendor in AMD-only functions
x86/alternative: Init ideal_nops for Hygon Dhyana
x86/events: Add Hygon Dhyana support to PMU infrastructure
x86/smpboot: Do not use BSP INIT delay and MWAIT to idle on Dhyana
x86/cpu/mtrr: Support TOP_MEM2 and get MTRR number
x86/cpu: Get cache info and setup cache cpumap for Hygon Dhyana
x86/cpu: Create Hygon Dhyana architecture support file
x86/CPU: Change query logic so CPUID is enabled before testing
x86/CPU: Use correct macros for Cyrix calls
Pull x86 boot updates from Ingo Molnar:
"Two cleanups and a bugfix for a rare boot option combination"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/KASLR: Remove return value from handle_mem_options()
x86/corruption-check: Use pr_*() instead of printk()
x86/corruption-check: Fix panic in memory_corruption_check() when boot option without value is provided
Pull x86 asm updates from Ingo Molnar:
"The main changes in this cycle were the fsgsbase related preparatory
patches from Chang S. Bae - but there's also an optimized
memcpy_flushcache() and a cleanup for the __cmpxchg_double() assembly
glue"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fsgsbase/64: Clean up various details
x86/segments: Introduce the 'CPUNODE' naming to better document the segment limit CPU/node NR trick
x86/vdso: Initialize the CPU/node NR segment descriptor earlier
x86/vdso: Introduce helper functions for CPU and node number
x86/segments/64: Rename the GDT PER_CPU entry to CPU_NUMBER
x86/fsgsbase/64: Factor out FS/GS segment loading from __switch_to()
x86/fsgsbase/64: Convert the ELF core dump code to the new FSGSBASE helpers
x86/fsgsbase/64: Make ptrace use the new FS/GS base helpers
x86/fsgsbase/64: Introduce FS/GS base helper functions
x86/fsgsbase/64: Fix ptrace() to read the FS/GS base accurately
x86/asm: Use CC_SET()/CC_OUT() in __cmpxchg_double()
x86/asm: Optimize memcpy_flushcache()
Pull x86 apic updates from Ingo Molnar:
"Improve the spreading of managed IRQs at allocation time"
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irq/matrix: Spread managed interrupts on allocation
irq/matrix: Split out the CPU selection code into a helper
Pull RAS updates from Ingo Molnar:
"Misc smaller fixes and cleanups"
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mcelog: Remove one mce_helper definition
x86/mce: Add macros for the corrected error count bit field
x86/mce: Use BIT_ULL(x) for bit mask definitions
x86/mce-inject: Reset injection struct after injection
Pull perf updates from Ingo Molnar:
"The main updates in this cycle were:
- Lots of perf tooling changes too voluminous to list (big perf trace
and perf stat improvements, lots of libtraceevent reorganization,
etc.), so I'll list the authors and refer to the changelog for
details:
Benjamin Peterson, Jérémie Galarneau, Kim Phillips, Peter
Zijlstra, Ravi Bangoria, Sangwon Hong, Sean V Kelley, Steven
Rostedt, Thomas Gleixner, Ding Xiang, Eduardo Habkost, Thomas
Richter, Andi Kleen, Sanskriti Sharma, Adrian Hunter, Tzvetomir
Stoyanov, Arnaldo Carvalho de Melo, Jiri Olsa.
... with the bulk of the changes written by Jiri Olsa, Tzvetomir
Stoyanov and Arnaldo Carvalho de Melo.
- Continued intel_rdt work with a focus on playing well with perf
events. This also imported some non-perf RDT work due to
dependencies. (Reinette Chatre)
- Implement counter freezing for Arch Perfmon v4 (Skylake and newer).
This allows to speed up the PMI handler by avoiding unnecessary MSR
writes and make it more accurate. (Andi Kleen)
- kprobes cleanups and simplification (Masami Hiramatsu)
- Intel Goldmont PMU updates (Kan Liang)
- ... plus misc other fixes and updates"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (155 commits)
kprobes/x86: Use preempt_enable() in optimized_callback()
x86/intel_rdt: Prevent pseudo-locking from using stale pointers
kprobes, x86/ptrace.h: Make regs_get_kernel_stack_nth() not fault on bad stack
perf/x86/intel: Export mem events only if there's PEBS support
x86/cpu: Drop pointless static qualifier in punit_dev_state_show()
x86/intel_rdt: Fix initial allocation to consider CDP
x86/intel_rdt: CBM overlap should also check for overlap with CDP peer
x86/intel_rdt: Introduce utility to obtain CDP peer
tools lib traceevent, perf tools: Move struct tep_handler definition in a local header file
tools lib traceevent: Separate out tep_strerror() for strerror_r() issues
perf python: More portable way to make CFLAGS work with clang
perf python: Make clang_has_option() work on Python 3
perf tools: Free temporary 'sys' string in read_event_files()
perf tools: Avoid double free in read_event_file()
perf tools: Free 'printk' string in parse_ftrace_printk()
perf tools: Cleanup trace-event-info 'tdata' leak
perf strbuf: Match va_{add,copy} with va_end
perf test: S390 does not support watchpoints in test 22
perf auxtrace: Include missing asm/bitsperlong.h to get BITS_PER_LONG
tools include: Adopt linux/bits.h
...
Pull locking and misc x86 updates from Ingo Molnar:
"Lots of changes in this cycle - in part because locking/core attracted
a number of related x86 low level work which was easier to handle in a
single tree:
- Linux Kernel Memory Consistency Model updates (Alan Stern, Paul E.
McKenney, Andrea Parri)
- lockdep scalability improvements and micro-optimizations (Waiman
Long)
- rwsem improvements (Waiman Long)
- spinlock micro-optimization (Matthew Wilcox)
- qspinlocks: Provide a liveness guarantee (more fairness) on x86.
(Peter Zijlstra)
- Add support for relative references in jump tables on arm64, x86
and s390 to optimize jump labels (Ard Biesheuvel, Heiko Carstens)
- Be a lot less permissive on weird (kernel address) uaccess faults
on x86: BUG() when uaccess helpers fault on kernel addresses (Jann
Horn)
- macrofy x86 asm statements to un-confuse the GCC inliner. (Nadav
Amit)
- ... and a handful of other smaller changes as well"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
locking/lockdep: Make global debug_locks* variables read-mostly
locking/lockdep: Fix debug_locks off performance problem
locking/pvqspinlock: Extend node size when pvqspinlock is configured
locking/qspinlock_stat: Count instances of nested lock slowpaths
locking/qspinlock, x86: Provide liveness guarantee
x86/asm: 'Simplify' GEN_*_RMWcc() macros
locking/qspinlock: Rework some comments
locking/qspinlock: Re-order code
locking/lockdep: Remove duplicated 'lock_class_ops' percpu array
x86/defconfig: Enable CONFIG_USB_XHCI_HCD=y
futex: Replace spin_is_locked() with lockdep
locking/lockdep: Make class->ops a percpu counter and move it under CONFIG_DEBUG_LOCKDEP=y
x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs
x86/cpufeature: Macrofy inline assembly code to work around GCC inlining bugs
x86/extable: Macrofy inline assembly code to work around GCC inlining bugs
x86/paravirt: Work around GCC inlining bugs when compiling paravirt ops
x86/bug: Macrofy the BUG table section handling, to work around GCC inlining bugs
x86/alternatives: Macrofy lock prefixes to work around GCC inlining bugs
x86/refcount: Work around GCC inlining bug
x86/objtool: Use asm macros to work around GCC inlining bugs
...
- Backport hibernation bug fixes from x86-64 to x86-32 and
consolidate hibernation handling on x86 to allow 32-bit
systems to work in all of the cases in which 64-bit ones
work (Zhimin Gu, Chen Yu).
- Fix hibernation documentation (Vladimir D. Seleznev).
- Update the menu cpuidle governor to fix a couple of issues
with it, make it more efficient in some cases and clean it
up (Rafael Wysocki).
- Rework the cpuidle polling state implementation to make it
more efficient (Rafael Wysocki).
- Clean up the cpuidle core somewhat (Fieah Lim).
- Fix the cpufreq conservative governor to take policy limits
into account properly in some cases (Rafael Wysocki).
- Add support for retrieving guaranteed performance information
to the ACPI CPPC library and make the intel_pstate driver use
it to expose the CPU base frequency via sysfs on systems with
the hardware-managed P-states (HWP) feature enabled (Srinivas
Pandruvada).
- Fix clang warning in the CPPC cpufreq driver (Nathan Chancellor).
- Get rid of device_node.name printing from cpufreq (Rob Herring).
- Remove unnecessary unlikely() from the cpufreq core (Igor Stoppa).
- Add support for the r8a7744 SoC to the cpufreq-dt driver (Biju Das).
- Update the dt-platdev cpufreq driver to allow RK3399 to have
separate tunables per cluster (Dmitry Torokhov).
- Fix the dma_alloc_coherent() usage in the tegra186 cpufreq driver
(Christoph Hellwig).
- Make the imx6q cpufreq driver read OCOTP through nvmem for
imx6ul/imx6ull (Anson Huang).
- Fix several bugs in the operating performance points (OPP)
framework and make it more stable (Viresh Kumar, Dave Gerlach).
- Update the devfreq subsystem to take changes in the APIs used
by into account, fix some issues with it and make it stop
print device_node.name directly (Bjorn Andersson, Enric Balletbo
i Serra, Matthias Kaehlcke, Rob Herring, Vincent Donnefort, zhong
jiang).
- Prepare the generic power domains (genpd) framework for dealing
with domains containing CPUs (Ulf Hansson).
- Prevent sysfs attributes representing low-power S0 residency
counters from being exposed if low-power S0 support is not
indicated in ACPI FADT (Rajneesh Bhardwaj).
- Get rid of custom CPU features macros for Intel CPUs from the
intel_idle and RAPL drivers (Andy Shevchenko).
- Update the tasks freezer to list tasks that refused to freeze
and caused a system transition to a sleep state to be aborted
(Todd Brandt).
- Update the pm-graph set of tools to v5.2 (Todd Brandt).
- Fix some issues in the cpupower utility (Anders Roxell, Prarit
Bhargava).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJbyaznAAoJEILEb/54YlRxUkoP/iOroh5pMW7PDa1g8sG26bfN
ICln5Tt9lv1Euk3QALc5r05kLjyObfoMoDwvH2oiM0TgwSw6G64tm/ansTsvbPpc
DCk53d0/gSqv5B1dZxV6OUYoXP0Z5hD+nW+1dg6EiGr1h24kesdEXdSB09bfTUY3
N4zUurWDUD92havuV3PakI/d/aOdxlwt9drwxv/cx4/gSYS0q5KtB2/N8YdWrk8Q
1UNwZkQLO8I0URfp9bwvwG3VhgKn0SKpLHlajq9KzWDPRgCl32oB0tY+3fOHW9Q+
djgMRA7xlAzAcCCL0vYJnEja6uMenvx3hZa1m68ZWFr0C25LQ5V87IEyZ3znvJQu
IlcY9jMbYkX8dZz1M8LZA+nOtyYM5GxvgylaQvHRn8fi0jzYJWfJbAKdyvEX94qz
UWtY35ihXFVBkhJuSxDPzluhMwxtd5uux1zO09/KlpUg8nnhxRx5l7AF7k7YyRk9
TZ5dVa6kp8CdmBZK6E9FNHstfvECL64oc9Ig3CB/bRXYBm60hN9pLXO2abJKV7dU
FUe4kmWUNus5QKOzfGuPKJokw34/vxBW2CVrOeRUNcuaRhlUwuboijeLPf23XLI/
fYDI4EiMxAZvcEZ5h0KKDS0MaLv4uy0LbAdrWx8Eg7pNeFUiovDgovYUF7HOmn6M
BzesklDaXWUSPWxlnASg
=WJgu
-----END PGP SIGNATURE-----
Merge tag 'pm-4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These make hibernation on 32-bit x86 systems work in all of the cases
in which it works on 64-bit x86 ones, update the menu cpuidle governor
and the "polling" state to make them more efficient, add more hardware
support to cpufreq drivers and fix issues with some of them, fix a bug
in the conservative cpufreq governor, fix the operating performance
points (OPP) framework and make it more stable, update the devfreq
subsystem to take changes in the APIs used by into account and clean
up some things all over.
Specifics:
- Backport hibernation bug fixes from x86-64 to x86-32 and
consolidate hibernation handling on x86 to allow 32-bit systems to
work in all of the cases in which 64-bit ones work (Zhimin Gu, Chen
Yu).
- Fix hibernation documentation (Vladimir D. Seleznev).
- Update the menu cpuidle governor to fix a couple of issues with it,
make it more efficient in some cases and clean it up (Rafael
Wysocki).
- Rework the cpuidle polling state implementation to make it more
efficient (Rafael Wysocki).
- Clean up the cpuidle core somewhat (Fieah Lim).
- Fix the cpufreq conservative governor to take policy limits into
account properly in some cases (Rafael Wysocki).
- Add support for retrieving guaranteed performance information to
the ACPI CPPC library and make the intel_pstate driver use it to
expose the CPU base frequency via sysfs on systems with the
hardware-managed P-states (HWP) feature enabled (Srinivas
Pandruvada).
- Fix clang warning in the CPPC cpufreq driver (Nathan Chancellor).
- Get rid of device_node.name printing from cpufreq (Rob Herring).
- Remove unnecessary unlikely() from the cpufreq core (Igor Stoppa).
- Add support for the r8a7744 SoC to the cpufreq-dt driver (Biju
Das).
- Update the dt-platdev cpufreq driver to allow RK3399 to have
separate tunables per cluster (Dmitry Torokhov).
- Fix the dma_alloc_coherent() usage in the tegra186 cpufreq driver
(Christoph Hellwig).
- Make the imx6q cpufreq driver read OCOTP through nvmem for
imx6ul/imx6ull (Anson Huang).
- Fix several bugs in the operating performance points (OPP)
framework and make it more stable (Viresh Kumar, Dave Gerlach).
- Update the devfreq subsystem to take changes in the APIs used by
into account, fix some issues with it and make it stop print
device_node.name directly (Bjorn Andersson, Enric Balletbo i Serra,
Matthias Kaehlcke, Rob Herring, Vincent Donnefort, zhong jiang).
- Prepare the generic power domains (genpd) framework for dealing
with domains containing CPUs (Ulf Hansson).
- Prevent sysfs attributes representing low-power S0 residency
counters from being exposed if low-power S0 support is not
indicated in ACPI FADT (Rajneesh Bhardwaj).
- Get rid of custom CPU features macros for Intel CPUs from the
intel_idle and RAPL drivers (Andy Shevchenko).
- Update the tasks freezer to list tasks that refused to freeze and
caused a system transition to a sleep state to be aborted (Todd
Brandt).
- Update the pm-graph set of tools to v5.2 (Todd Brandt).
- Fix some issues in the cpupower utility (Anders Roxell, Prarit
Bhargava)"
* tag 'pm-4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (73 commits)
PM / Domains: Document flags for genpd
PM / Domains: Deal with multiple states but no governor in genpd
PM / Domains: Don't treat zero found compatible idle states as an error
cpuidle: menu: Avoid computations when result will be discarded
cpuidle: menu: Drop redundant comparison
cpufreq: tegra186: don't pass GFP_DMA32 to dma_alloc_coherent()
cpufreq: conservative: Take limits changes into account properly
Documentation: intel_pstate: Add base_frequency information
cpufreq: intel_pstate: Add base_frequency attribute
ACPI / CPPC: Add support for guaranteed performance
cpuidle: menu: Simplify checks related to the polling state
PM / tools: sleepgraph and bootgraph: upgrade to v5.2
PM / tools: sleepgraph: first batch of v5.2 changes
cpupower: Fix coredump on VMWare
cpupower: Fix AMD Family 0x17 msr_pstate size
cpufreq: imx6q: read OCOTP through nvmem for imx6ul/imx6ull
cpufreq: dt-platdev: allow RK3399 to have separate tunables per cluster
cpuidle: poll_state: Revise loop termination condition
cpuidle: menu: Move the latency_req == 0 special case check
cpuidle: menu: Avoid computations for very close timers
...
- mostly more consolidation of the direct mapping code, including
converting over hexagon, and merging the coherent and non-coherent
code into a single dma_map_ops instance (me)
- cleanups for the dma_configure/dma_unconfigure callchains (me)
- better handling of dma_masks in odd setups (me, Alexander Duyck)
- better debugging of passing vmalloc address to the DMA API
(Stephen Boyd)
- CMA command line parsing fix (He Zhe)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlvNg6YLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYMm/Q/9FFVOH73Nc3rT40N2HdaPbzV2hXmI1//hEJcImDP5
mLGq8XqieGuo8Pmu9+xp1tC2UnfUkhK4FjhQbWM+qKER/RNYES2BD50xVFmt6ICS
9d8IaRcs+ceggljfdwszkkucJspBsYNxpiKjjao0OsHn6UDatu6elZs/yvb2nXci
HCJUvs9vYm9MkAtVXEtOQtij3YRaJ/9xYY4h5Dy5vBtHPp+kjUMF0mWAwA2+Ec1V
8iqKjUY3c8nr8Kf6WE9tzJ0wrMFijc4HJlE3W1ud8YsKdfCkCf8XiIuS6PgTzOeK
0cn9h8dVrV1ZXJ/D/9JZDivmYvIsoKWAYVQHNzAiq7PI3uOJY1ggCxyZpWtTHZhM
ATHF0sJGpIenkSWybYpKee8e8RsS7L9dUgu6bYpK5pVkirNYnR9IOGVJNmS63L7Q
B0uUtqjBKDG2yNGZGY9zqBQFgxiPO0wxFLeKyHbIsC0b7FBti3rXGAimch5WiBuL
zlDV0zEfMH0BW6gNPrjfFur84duKtGZ/0DBSxQ0E1Mvk8B1LBr78MgZt8OfJEuoe
dx1FYU70u8PYi+hjmn386YnNNMTjd1GT5XW7AWedM2wCjRYmNy0yMGmm9cACMneN
5eBv/SYr7X1zKNL7w7H6KQVZilTJcBoj3f/lmjL7i22m9FXYQpcUP61L8wHNM8H2
iJo=
=AVSD
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-4.20' of git://git.infradead.org/users/hch/dma-mapping
Pull dma mapping updates from Christoph Hellwig:
"First batch of dma-mapping changes for 4.20.
There will be a second PR as some big changes were only applied just
before the end of the merge window, and I want to give them a few more
days in linux-next.
Summary:
- mostly more consolidation of the direct mapping code, including
converting over hexagon, and merging the coherent and non-coherent
code into a single dma_map_ops instance (me)
- cleanups for the dma_configure/dma_unconfigure callchains (me)
- better handling of dma_masks in odd setups (me, Alexander Duyck)
- better debugging of passing vmalloc address to the DMA API (Stephen
Boyd)
- CMA command line parsing fix (He Zhe)"
* tag 'dma-mapping-4.20' of git://git.infradead.org/users/hch/dma-mapping: (27 commits)
dma-direct: respect DMA_ATTR_NO_WARN
dma-mapping: translate __GFP_NOFAIL to DMA_ATTR_NO_WARN
dma-direct: document the zone selection logic
dma-debug: Check for drivers mapping invalid addresses in dma_map_single()
dma-direct: fix return value of dma_direct_supported
dma-mapping: move dma_default_get_required_mask under ifdef
dma-direct: always allow dma mask <= physiscal memory size
dma-direct: implement complete bus_dma_mask handling
dma-direct: refine dma_direct_alloc zone selection
dma-direct: add an explicit dma_direct_get_required_mask
dma-mapping: make the get_required_mask method available unconditionally
unicore32: remove swiotlb support
Revert "dma-mapping: clear dev->dma_ops in arch_teardown_dma_ops"
dma-mapping: support non-coherent devices in dma_common_get_sgtable
dma-mapping: consolidate the dma mmap implementations
dma-mapping: merge direct and noncoherent ops
dma-mapping: move the dma_coherent flag to struct device
MIPS: don't select DMA_MAYBE_COHERENT from DMA_PERDEV_COHERENT
dma-mapping: add the missing ARCH_HAS_SYNC_DMA_FOR_CPU_ALL declaration
dma-mapping: fix panic caused by passing empty cma command line argument
...
The following commit:
a19b2e3d78 ("kprobes/x86: Remove IRQ disabling from ftrace-based/optimized kprobes”)
removed local_irq_save/restore() from optimized_callback(), the handler
might be interrupted by the rescheduling interrupt and might be
rescheduled - so we must not use the preempt_enable_no_resched() macro.
Use preempt_enable() instead, to not lose preemption events.
[ mingo: Improved the changelog. ]
Reported-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dwmw@amazon.co.uk
Fixes: a19b2e3d78 ("kprobes/x86: Remove IRQ disabling from ftrace-based/optimized kprobes”)
Link: http://lkml.kernel.org/r/154002887331.7627.10194920925792947001.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When the last CPU in an rdt_domain goes offline, its rdt_domain struct gets
freed. Current pseudo-locking code is unaware of this scenario and tries to
dereference the freed structure in a few places.
Add checks to prevent pseudo-locking code from doing this.
While further work is needed to seamlessly restore resource groups (not
just pseudo-locking) to their configuration when the domain is brought back
online, the immediate issue of invalid pointers is addressed here.
Fixes: f4e80d67a5 ("x86/intel_rdt: Resctrl files reflect pseudo-locked information")
Fixes: 443810fe61 ("x86/intel_rdt: Create debugfs files for pseudo-locking testing")
Fixes: 746e08590b ("x86/intel_rdt: Create character device exposing pseudo-locked region")
Fixes: 33dc3e410a ("x86/intel_rdt: Make CPU information accessible for pseudo-locked regions")
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: gavin.hindman@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/231f742dbb7b00a31cc104416860e27dba6b072d.1539384145.git.reinette.chatre@intel.com
We already build the swiotlb code for 32-bit kernels with PAE support,
but the code to actually use swiotlb has only been enabled for 64-bit
kernels for an unknown reason.
Before Linux v4.18 we paper over this fact because the networking code,
the SCSI layer and some random block drivers implemented their own
bounce buffering scheme.
[ mingo: Changelog fixes. ]
Fixes: 21e07dba9f ("scsi: reduce use of block bounce buffers")
Fixes: ab74cfebaf ("net: remove the PCI_DMA_BUS_IS_PHYS check in illegal_highdma")
Reported-by: Matthew Whitehead <tedheadster@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Matthew Whitehead <tedheadster@gmail.com>
Cc: konrad.wilk@oracle.com
Cc: iommu@lists.linux-foundation.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181014075208.2715-1-hch@lst.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
* acpi-pm:
ACPI / PM: LPIT: Register sysfs attributes based on FADT
* pm-sleep:
x86-32, hibernate: Adjust in_suspend after resumed on 32bit system
x86-32, hibernate: Set up temporary text mapping for 32bit system
x86-32, hibernate: Switch to relocated restore code during resume on 32bit system
x86-32, hibernate: Switch to original page table after resumed
x86-32, hibernate: Use the page size macro instead of constant value
x86-32, hibernate: Use temp_pgt as the temporary page table
x86, hibernate: Rename temp_level4_pgt to temp_pgt
x86-32, hibernate: Enable CONFIG_ARCH_HIBERNATION_HEADER on 32bit system
x86, hibernate: Extract the common code of 64/32 bit system
x86-32/asm/power: Create stack frames in hibernate_asm_32.S
PM / hibernate: Check the success of generating md5 digest before hibernation
x86, hibernate: Fix nosave_regions setup for hibernation
PM / sleep: Show freezing tasks that caused a suspend abort
PM / hibernate: Documentation: fix image_size default value
Commit
5de97c9f6d ("x86/mce: Factor out and deprecate the /dev/mcelog driver")
moved the old interface into one file including mce_helper definition as
static and "extern". Remove one.
Fixes: 5de97c9f6d ("x86/mce: Factor out and deprecate the /dev/mcelog driver")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Tony Luck <tony.luck@intel.com>
CC: linux-edac <linux-edac@vger.kernel.org>
CC: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20181017170554.18841-3-bigeasy@linutronix.de
Commit:
c5bedc6847 ("x86/fpu: Get rid of PF_USED_MATH usage, convert it to fpu->fpstate_active")
introduced the 'fpu' variable at top of __restore_xstate_sig(),
which now shadows the other definition:
arch/x86/kernel/fpu/signal.c:318:28: warning: symbol 'fpu' shadows an earlier one
arch/x86/kernel/fpu/signal.c:271:20: originally declared here
Remove the shadowed definition of 'fpu', as the two definitions are the same.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: c5bedc6847 ("x86/fpu: Get rid of PF_USED_MATH usage, convert it to fpu->fpstate_active")
Link: http://lkml.kernel.org/r/20181016202525.29437-3-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Clang warns that the declaration of jiffies in include/linux/jiffies.h
doesn't match the definition in arch/x86/time/kernel.c:
arch/x86/kernel/time.c:29:42: warning: section does not match previous declaration [-Wsection]
__visible volatile unsigned long jiffies __cacheline_aligned = INITIAL_JIFFIES;
^
./include/linux/cache.h:49:4: note: expanded from macro '__cacheline_aligned'
__section__(".data..cacheline_aligned")))
^
./include/linux/jiffies.h:81:31: note: previous attribute is here
extern unsigned long volatile __cacheline_aligned_in_smp __jiffy_arch_data jiffies;
^
./arch/x86/include/asm/cache.h:20:2: note: expanded from macro '__cacheline_aligned_in_smp'
__page_aligned_data
^
./include/linux/linkage.h:39:29: note: expanded from macro '__page_aligned_data'
#define __page_aligned_data __section(.data..page_aligned) __aligned(PAGE_SIZE)
^
./include/linux/compiler_attributes.h:233:56: note: expanded from macro '__section'
#define __section(S) __attribute__((__section__(#S)))
^
1 warning generated.
The declaration was changed in commit 7c30f352c8 ("jiffies.h: declare
jiffies and jiffies_64 with ____cacheline_aligned_in_smp") but wasn't
updated here. Make them match so Clang no longer warns.
Fixes: 7c30f352c8 ("jiffies.h: declare jiffies and jiffies_64 with ____cacheline_aligned_in_smp")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181013005311.28617-1-natechancellor@gmail.com
Looking at the asm for native_sched_clock() I noticed we don't inline
enough. Mostly caused by sharing code with cyc2ns_read_begin(), which
we didn't used to do. So mark all that __force_inline to make it DTRT.
Fixes: 59eaef78bf ("x86/tsc: Remodel cyc2ns to use seqcount_latch()")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: hpa@zytor.com
Cc: eric.dumazet@gmail.com
Cc: bp@alien8.de
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181011104019.695196158@infradead.org
In case the RSDP address in struct boot_params is specified don't try
to find the table by searching, but take the address directly as set
by the boot loader.
Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jia Zhang <qianyue.zj@alibaba-inc.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boris.ostrovsky@oracle.com
Cc: linux-kernel@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20181010061456.22238-4-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Xen PVH guests receive the address of the RSDP table from Xen. In order
to support booting a Xen PVH guest via Grub2 using the standard x86
boot entry we need a way for Grub2 to pass the RSDP address to the
kernel.
For this purpose expand the struct setup_header to hold the physical
address of the RSDP address. Being zero means it isn't specified and
has to be located the legacy way (searching through low memory or
EBDA).
While documenting the new setup_header layout and protocol version
2.14 add the missing documentation of protocol version 2.13.
There are Grub2 versions in several distros with a downstream patch
violating the boot protocol by writing past the end of setup_header.
This requires another update of the boot protocol to enable the kernel
to distinguish between a specified RSDP address and one filled with
garbage by such a broken Grub2.
From protocol 2.14 on Grub2 will write the version it is supporting
(but never a higher value than found to be supported by the kernel)
ored with 0x8000 to the version field of setup_header. This enables
the kernel to know up to which field Grub2 has written information
to. All fields after that are supposed to be clobbered.
Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boris.ostrovsky@oracle.com
Cc: bp@alien8.de
Cc: corbet@lwn.net
Cc: linux-doc@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20181010061456.22238-3-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Use the new tlb_get_unmap_shift() to determine the stride of the
INVLPG loop.
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Implement the required wait and kick callbacks to support PV spinlocks in
Hyper-V guests.
[ tglx: Document the requirement for disabling interrupts in the wait()
callback. Remove goto and unnecessary includes. Add prototype
for hv_vcpu_is_preempted(). Adapted to pending paravirt changes. ]
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Michael Kelley (EOSG) <Michael.H.Kelley@microsoft.com>
Cc: chao.p.peng@intel.com
Cc: chao.gao@intel.com
Cc: isaku.yamahata@intel.com
Cc: tianyu.lan@microsoft.com
Link: https://lkml.kernel.org/r/1538987374-51217-3-git-send-email-yi.y.sun@linux.intel.com
When a new resource group is created it is initialized with a default
allocation that considers which portions of cache are currently
available for sharing across all resource groups or which portions of
cache are currently unused.
If a CDP allocation forms part of a resource group that is in exclusive
mode then it should be ensured that no new allocation overlaps with any
resource that shares the underlying hardware. The current initial
allocation does not take this sharing of hardware into account and
a new allocation in a resource that shares the same
hardware would affect the exclusive resource group.
Fix this by considering the allocation of a peer RDT domain - a RDT
domain sharing the same hardware - as part of the test to determine
which portion of cache is in use and available for use.
Fixes: 95f0b77efa ("x86/intel_rdt: Initialize new resource group with sane defaults")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: tony.luck@intel.com
Cc: jithu.joseph@intel.com
Cc: gavin.hindman@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/b1f7ec08b1695be067de416a4128466d49684317.1538603665.git.reinette.chatre@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The CBM overlap test is used to manage the allocations of RDT resources
where overlap is possible between resource groups. When a resource group
is in exclusive mode then there should be no overlap between resource
groups.
The current overlap test only considers overlap between the same
resources, for example, that usage of a RDT_RESOURCE_L2DATA resource
in one resource group does not overlap with usage of a RDT_RESOURCE_L2DATA
resource in another resource group. The problem with this is that it
allows overlap between a RDT_RESOURCE_L2DATA resource in one resource
group with a RDT_RESOURCE_L2CODE resource in another resource group -
even if both resource groups are in exclusive mode. This is a problem
because even though these appear to be different resources they end up
sharing the same underlying hardware and thus does not fulfill the
user's request for exclusive use of hardware resources.
Fix this by including the CDP peer (if there is one) in every CBM
overlap test. This does not impact the overlap between resources
within the same exclusive resource group that is allowed.
Fixes: 49f7b4efa1 ("x86/intel_rdt: Enable setting of exclusive mode")
Reported-by: Jithu Joseph <jithu.joseph@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jithu Joseph <jithu.joseph@intel.com>
Acked-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: tony.luck@intel.com
Cc: gavin.hindman@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/e538b7f56f7ca15963dce2e00ac3be8edb8a68e1.1538603665.git.reinette.chatre@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Introduce a utility that, when provided with a RDT resource and an
instance of this RDT resource (a RDT domain), would return pointers to
the RDT resource and RDT domain that share the same hardware. This is
specific to the CDP resources that share the same hardware.
For example, if a pointer to the RDT_RESOURCE_L2DATA resource (struct
rdt_resource) and a pointer to an instance of this resource (struct
rdt_domain) is provided, then it will return a pointer to the
RDT_RESOURCE_L2CODE resource as well as the specific instance that
shares the same hardware as the provided rdt_domain.
This utility is created in support of the "exclusive" resource group
mode where overlap of resource allocation between resource groups need
to be avoided. The overlap test need to consider not just the matching
resources, but also the resources that share the same hardware.
Temporarily mark it as unused in support of patch testing to avoid
compile warnings until it is used.
Fixes: 49f7b4efa1 ("x86/intel_rdt: Enable setting of exclusive mode")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jithu Joseph <jithu.joseph@intel.com>
Acked-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: tony.luck@intel.com
Cc: gavin.hindman@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/9b4bc4d59ba2e903b6a3eb17e16ef41a8e7b7c3e.1538603665.git.reinette.chatre@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
While the DOC at the beginning of lib/bitmap.c explicitly states that
"The number of valid bits in a given bitmap does _not_ need to be an
exact multiple of BITS_PER_LONG.", some of the bitmap operations do
indeed access BITS_PER_LONG portions of the provided bitmap no matter
the size of the provided bitmap. For example, if bitmap_intersects()
is provided with an 8 bit bitmap the operation will access
BITS_PER_LONG bits from the provided bitmap. While the operation
ensures that these extra bits do not affect the result, the memory
is still accessed.
The capacity bitmasks (CBMs) are typically stored in u32 since they
can never exceed 32 bits. A few instances exist where a bitmap_*
operation is performed on a CBM by simply pointing the bitmap operation
to the stored u32 value.
The consequence of this pattern is that some bitmap_* operations will
access out-of-bounds memory when interacting with the provided CBM. This
is confirmed with a KASAN test that reports:
BUG: KASAN: stack-out-of-bounds in __bitmap_intersects+0xa2/0x100
and
BUG: KASAN: stack-out-of-bounds in __bitmap_weight+0x58/0x90
Fix this by moving any CBM provided to a bitmap operation needing
BITS_PER_LONG to an 'unsigned long' variable.
[ tglx: Changed related function arguments to unsigned long and got rid
of the _cbm extra step ]
Fixes: 72d5050566 ("x86/intel_rdt: Add utilities to test pseudo-locked region possibility")
Fixes: 49f7b4efa1 ("x86/intel_rdt: Enable setting of exclusive mode")
Fixes: d9b48c86eb ("x86/intel_rdt: Display resource groups' allocations' size in bytes")
Fixes: 95f0b77efa ("x86/intel_rdt: Initialize new resource group with sane defaults")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/69a428613a53f10e80594679ac726246020ff94f.1538686926.git.reinette.chatre@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We have a special segment descriptor entry in the GDT, whose sole purpose is to
encode the CPU and node numbers in its limit (size) field. There are user-space
instructions that allow the reading of the limit field, which gives us a really
fast way to read the CPU and node IDs from the vDSO for example.
But the naming of related functionality does not make this clear, at all:
VDSO_CPU_SIZE
VDSO_CPU_MASK
__CPU_NUMBER_SEG
GDT_ENTRY_CPU_NUMBER
vdso_encode_cpu_node
vdso_read_cpu_node
There's a number of problems:
- The 'VDSO_CPU_SIZE' doesn't really make it clear that these are number
of bits, nor does it make it clear which 'CPU' this refers to, i.e.
that this is about a GDT entry whose limit encodes the CPU and node number.
- Furthermore, the 'CPU_NUMBER' naming is actively misleading as well,
because the segment limit encodes not just the CPU number but the
node ID as well ...
So use a better nomenclature all around: name everything related to this trick
as 'CPUNODE', to make it clear that this is something special, and add
_BITS to make it clear that these are number of bits, and propagate this to
every affected name:
VDSO_CPU_SIZE => VDSO_CPUNODE_BITS
VDSO_CPU_MASK => VDSO_CPUNODE_MASK
__CPU_NUMBER_SEG => __CPUNODE_SEG
GDT_ENTRY_CPU_NUMBER => GDT_ENTRY_CPUNODE
vdso_encode_cpu_node => vdso_encode_cpunode
vdso_read_cpu_node => vdso_read_cpunode
This, beyond being less confusing, also makes it easier to grep for all related
functionality:
$ git grep -i cpunode arch/x86
Also, while at it, fix "return is not a function" style sloppiness in vdso_encode_cpunode().
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Markus T Metzger <markus.t.metzger@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1537312139-5580-2-git-send-email-chang.seok.bae@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently the CPU/node NR segment descriptor (GDT_ENTRY_CPU_NUMBER) is
initialized relatively late during CPU init, from the vCPU code, which
has a number of disadvantages, such as hotplug CPU notifiers and SMP
cross-calls.
Instead just initialize it much earlier, directly in cpu_init().
This reduces complexity and increases robustness.
[ mingo: Wrote new changelog. ]
Suggested-by: H. Peter Anvin <hpa@zytor.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Markus T Metzger <markus.t.metzger@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1537312139-5580-9-git-send-email-chang.seok.bae@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Instead of open coding the calls to load_seg_legacy(), introduce
x86_fsgsbase_load() to load FS/GS segments.
This makes it more explicit that this is part of FSGSBASE functionality,
and the new helper can be updated when FSGSBASE instructions are enabled.
[ mingo: Wrote new changelog. ]
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Markus T Metzger <markus.t.metzger@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Link: http://lkml.kernel.org/r/1537312139-5580-6-git-send-email-chang.seok.bae@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Use the new FS/GS base helper functions in <asm/fsgsbase.h> in the platform
specific ptrace implementation of the following APIs:
PTRACE_ARCH_PRCTL,
PTRACE_SETREG,
PTRACE_GETREG,
etc.
The fsgsbase code is more abstracted out this way and the FS/GS-update
mechanism will be easier to change this way.
[ mingo: Wrote new changelog. ]
Based-on-code-from: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Markus T Metzger <markus.t.metzger@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1537312139-5580-4-git-send-email-chang.seok.bae@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Introduce FS/GS base access functionality via <asm/fsgsbase.h>,
not yet used by anything directly.
Factor out task_seg_base() from x86/ptrace.c and rename it to
x86_fsgsbase_read_task() to make it part of the new helpers.
This will allow us to enhance FSGSBASE support and eventually enable
the FSBASE/GSBASE instructions.
An "inactive" GS base refers to a base saved at kernel entry
and being part of an inactive, non-running/stopped user-task.
(The typical ptrace model.)
Here are the new functions:
x86_fsbase_read_task()
x86_gsbase_read_task()
x86_fsbase_write_task()
x86_gsbase_write_task()
x86_fsbase_read_cpu()
x86_fsbase_write_cpu()
x86_gsbase_read_cpu_inactive()
x86_gsbase_write_cpu_inactive()
As an advantage of the unified namespace we can now see all FS/GSBASE
API use in the kernel via the following 'git grep' pattern:
$ git grep x86_.*sbase
[ mingo: Wrote new changelog. ]
Based-on-code-from: Andy Lutomirski <luto@kernel.org>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Markus T Metzger <markus.t.metzger@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1537312139-5580-3-git-send-email-chang.seok.bae@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On 64-bit kernels ptrace can read the FS/GS base using the register access
APIs (PTRACE_PEEKUSER, etc.) or PTRACE_ARCH_PRCTL.
Make both of these mechanisms return the actual FS/GS base.
This will improve debuggability by providing the correct information
to ptracer such as GDB.
[ chang: Rebased and revised patch description. ]
[ mingo: Revised the changelog some more. ]
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Markus T Metzger <markus.t.metzger@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1537312139-5580-2-git-send-email-chang.seok.bae@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As described in:
77b0bf55bc67: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs")
GCC's inlining heuristics are broken with common asm() patterns used in
kernel code, resulting in the effective disabling of inlining.
The workaround is to set an assembly macro and call it from the inline
assembly block - which is also a minor cleanup for the jump-label code.
As a result the code size is slightly increased, but inlining decisions
are better:
text data bss dec hex filename
18163528 10226300 2957312 31347140 1de51c4 ./vmlinux before
18163608 10227348 2957312 31348268 1de562c ./vmlinux after (+1128)
And functions such as intel_pstate_adjust_policy_max(),
kvm_cpu_accept_dm_intr(), kvm_register_readl() are inlined.
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181005202718.229565-4-namit@vmware.com
Link: https://lore.kernel.org/lkml/20181003213100.189959-11-namit@vmware.com/T/#u
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As described in:
77b0bf55bc67: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs")
GCC's inlining heuristics are broken with common asm() patterns used in
kernel code, resulting in the effective disabling of inlining.
The workaround is to set an assembly macro and call it from the inline
assembly block - which is pretty pointless indirection in the static_cpu_has()
case, but is worth it to improve overall inlining quality.
The patch slightly increases the kernel size:
text data bss dec hex filename
18162879 10226256 2957312 31346447 1de4f0f ./vmlinux before
18163528 10226300 2957312 31347140 1de51c4 ./vmlinux after (+693)
And enables the inlining of function such as free_ldt_pgtables().
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181005202718.229565-3-namit@vmware.com
Link: https://lore.kernel.org/lkml/20181003213100.189959-10-namit@vmware.com/T/#u
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As described in:
77b0bf55bc67: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs")
GCC's inlining heuristics are broken with common asm() patterns used in
kernel code, resulting in the effective disabling of inlining.
The workaround is to set an assembly macro and call it from the inline
assembly block - which is also a minor cleanup for the exception table
code.
Text size goes up a bit:
text data bss dec hex filename
18162555 10226288 2957312 31346155 1de4deb ./vmlinux before
18162879 10226256 2957312 31346447 1de4f0f ./vmlinux after (+292)
But this allows the inlining of functions such as nested_vmx_exit_reflected(),
set_segment_reg(), __copy_xstate_to_user() which is a net benefit.
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181005202718.229565-2-namit@vmware.com
Link: https://lore.kernel.org/lkml/20181003213100.189959-9-namit@vmware.com/T/#u
Signed-off-by: Ingo Molnar <mingo@kernel.org>
All VDSO clock sources are TSC based and use CLOCKSOURCE_MASK(64). There is
no point in masking with all FF. Get rid of it and enforce the mask in the
sanity checker.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Matt Rickard <matt@softrans.com.au>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: devel@linuxdriverproject.org
Cc: virtualization@lists.linux-foundation.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Juergen Gross <jgross@suse.com>
Link: https://lkml.kernel.org/r/20180917130707.151963007@linutronix.de
Runtime validate the VCLOCK_MODE in clocksource::archdata and disable
VCLOCK if invalid, which disables the VDSO but keeps the system running.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Matt Rickard <matt@softrans.com.au>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: devel@linuxdriverproject.org
Cc: virtualization@lists.linux-foundation.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Juergen Gross <jgross@suse.com>
Link: https://lkml.kernel.org/r/20180917130707.069167446@linutronix.de
As described in:
77b0bf55bc67: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs")
GCC's inlining heuristics are broken with common asm() patterns used in
kernel code, resulting in the effective disabling of inlining.
The workaround is to set an assembly macro and call it from the inline
assembly block. As a result GCC considers the inline assembly block as
a single instruction. (Which it isn't, but that's the best we can get.)
In this patch we wrap the paravirt call section tricks in a macro,
to hide it from GCC.
The effect of the patch is a more aggressive inlining, which also
causes a size increase of kernel.
text data bss dec hex filename
18147336 10226688 2957312 31331336 1de1408 ./vmlinux before
18162555 10226288 2957312 31346155 1de4deb ./vmlinux after (+14819)
The number of static text symbols (non-inlined functions) goes down:
Before: 40053
After: 39942 (-111)
[ mingo: Rewrote the changelog. ]
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20181003213100.189959-8-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As described in:
77b0bf55bc67: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs")
GCC's inlining heuristics are broken with common asm() patterns used in
kernel code, resulting in the effective disabling of inlining.
The workaround is to set an assembly macro and call it from the inline
assembly block. As a result GCC considers the inline assembly block as
a single instruction. (Which it isn't, but that's the best we can get.)
This patch increases the kernel size:
text data bss dec hex filename
18146889 10225380 2957312 31329581 1de0d2d ./vmlinux before
18147336 10226688 2957312 31331336 1de1408 ./vmlinux after (+1755)
But enables more aggressive inlining (and probably better branch decisions).
The number of static text symbols in vmlinux is much lower:
Before: 40218
After: 40053 (-165)
The assembly code gets harder to read due to the extra macro layer.
[ mingo: Rewrote the changelog. ]
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181003213100.189959-7-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As described in:
77b0bf55bc67: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs")
GCC's inlining heuristics are broken with common asm() patterns used in
kernel code, resulting in the effective disabling of inlining.
The workaround is to set an assembly macro and call it from the inline
assembly block - i.e. to macrify the affected block.
As a result GCC considers the inline assembly block as a single instruction.
This patch handles the LOCK prefix, allowing more aggresive inlining:
text data bss dec hex filename
18140140 10225284 2957312 31322736 1ddf270 ./vmlinux before
18146889 10225380 2957312 31329581 1de0d2d ./vmlinux after (+6845)
This is the reduction in non-inlined functions:
Before: 40286
After: 40218 (-68)
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181003213100.189959-6-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As described in:
77b0bf55bc67: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs")
GCC's inlining heuristics are broken with common asm() patterns used in
kernel code, resulting in the effective disabling of inlining.
The workaround is to set an assembly macro and call it from the inline
assembly block. As a result GCC considers the inline assembly block as
a single instruction. (Which it isn't, but that's the best we can get.)
This patch allows GCC to inline simple functions such as __get_seccomp_filter().
To no-one's surprise the result is that GCC performs more aggressive (read: correct)
inlining decisions in these senarios, which reduces the kernel size and presumably
also speeds it up:
text data bss dec hex filename
18140970 10225412 2957312 31323694 1ddf62e ./vmlinux before
18140140 10225284 2957312 31322736 1ddf270 ./vmlinux after (-958)
16 fewer static text symbols:
Before: 40302
After: 40286 (-16)
these got inlined instead.
Functions such as kref_get(), free_user(), fuse_file_get() now get inlined. Hurray!
[ mingo: Rewrote the changelog. ]
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181003213100.189959-5-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As described in:
77b0bf55bc67: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs")
GCC's inlining heuristics are broken with common asm() patterns used in
kernel code, resulting in the effective disabling of inlining.
In the case of objtool the resulting borkage can be significant, since all the
annotations of objtool are discarded during linkage and never inlined,
yet GCC bogusly considers most functions affected by objtool annotations
as 'too large'.
The workaround is to set an assembly macro and call it from the inline
assembly block. As a result GCC considers the inline assembly block as
a single instruction. (Which it isn't, but that's the best we can get.)
This increases the kernel size slightly:
text data bss dec hex filename
18140829 10224724 2957312 31322865 1ddf2f1 ./vmlinux before
18140970 10225412 2957312 31323694 1ddf62e ./vmlinux after (+829)
The number of static text symbols (i.e. non-inlined functions) is reduced:
Before: 40321
After: 40302 (-19)
[ mingo: Rewrote the changelog. ]
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christopher Li <sparse@chrisli.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-sparse@vger.kernel.org
Link: http://lkml.kernel.org/r/20181003213100.189959-4-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Using macros in inline assembly allows us to work around bugs
in GCC's inlining decisions.
Compile macros.S and use it to assemble all C files.
Currently only x86 will use it.
Background:
The inlining pass of GCC doesn't include an assembler, so it's not aware
of basic properties of the generated code, such as its size in bytes,
or that there are such things as discontiuous blocks of code and data
due to the newfangled linker feature called 'sections' ...
Instead GCC uses a lazy and fragile heuristic: it does a linear count of
certain syntactic and whitespace elements in inlined assembly block source
code, such as a count of new-lines and semicolons (!), as a poor substitute
for "code size and complexity".
Unsurprisingly this heuristic falls over and breaks its neck whith certain
common types of kernel code that use inline assembly, such as the frequent
practice of putting useful information into alternative sections.
As a result of this fresh, 20+ years old GCC bug, GCC's inlining decisions
are effectively disabled for inlined functions that make use of such asm()
blocks, because GCC thinks those sections of code are "large" - when in
reality they are often result in just a very low number of machine
instructions.
This absolute lack of inlining provess when GCC comes across such asm()
blocks both increases generated kernel code size and causes performance
overhead, which is particularly noticeable on paravirt kernels, which make
frequent use of these inlining facilities in attempt to stay out of the
way when running on baremetal hardware.
Instead of fixing the compiler we use a workaround: we set an assembly macro
and call it from the inlined assembly block. As a result GCC considers the
inline assembly block as a single instruction. (Which it often isn't but I digress.)
This uglifies and bloats the source code - for example just the refcount
related changes have this impact:
Makefile | 9 +++++++--
arch/x86/Makefile | 7 +++++++
arch/x86/kernel/macros.S | 7 +++++++
scripts/Kbuild.include | 4 +++-
scripts/mod/Makefile | 2 ++
5 files changed, 26 insertions(+), 3 deletions(-)
Yay readability and maintainability, it's not like assembly code is hard to read
and maintain ...
We also hope that GCC will eventually get fixed, but we are not holding
our breath for that. Yet we are optimistic, it might still happen, any decade now.
[ mingo: Wrote new changelog describing the background. ]
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kbuild@vger.kernel.org
Link: http://lkml.kernel.org/r/20181003213100.189959-3-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In resctrl filesystem, mount options exist to enable L3/L2 CDP and MBA
Software Controller features if the platform supports them:
mount -t resctrl resctrl [-o cdp[,cdpl2][,mba_MBps]] /sys/fs/resctrl
But currently only "cdp" option is displayed in /proc/mounts. "cdpl2" and
"mba_MBps" options are not shown even when they are active.
Before:
# mount -t resctrl resctrl -o cdp,mba_MBps /sys/fs/resctrl
# grep resctrl /proc/mounts
/sys/fs/resctrl /sys/fs/resctrl resctrl rw,relatime,cdp 0 0
After:
# mount -t resctrl resctrl -o cdp,mba_MBps /sys/fs/resctrl
# grep resctrl /proc/mounts
/sys/fs/resctrl /sys/fs/resctrl resctrl rw,relatime,cdp,mba_MBps 0 0
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/1536796118-60135-1-git-send-email-fenghua.yu@intel.com
Switch to bitmap_zalloc() to show clearly what is allocated. Besides that
it returns a pointer of bitmap type instead of opaque void *.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20180830115039.63430-1-andriy.shevchenko@linux.intel.com
Clang warns when multiple pairs of parentheses are used for a single
conditional statement.
arch/x86/kernel/cpu/amd.c:925:14: warning: equality comparison with
extraneous parentheses [-Wparentheses-equality]
if ((c->x86 == 6)) {
~~~~~~~^~~~
arch/x86/kernel/cpu/amd.c:925:14: note: remove extraneous parentheses
around the comparison to silence this warning
if ((c->x86 == 6)) {
~ ^ ~
arch/x86/kernel/cpu/amd.c:925:14: note: use '=' to turn this equality
comparison into an assignment
if ((c->x86 == 6)) {
^~
=
1 warning generated.
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181002224511.14929-1-natechancellor@gmail.com
Link: https://github.com/ClangBuiltLinux/linux/issues/187
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The recent rework of the TSC calibration code introduced a regression on UV
systems as it added a call to tsc_early_init() which initializes the TSC
ADJUST values before acpi_boot_table_init(). In the case of UV systems,
that is a necessary step that calls uv_system_init(). This informs
tsc_sanitize_first_cpu() that the kernel runs on a platform with async TSC
resets as documented in commit 341102c3ef ("x86/tsc: Add option that TSC
on Socket 0 being non-zero is valid")
Fix it by skipping the early tsc initialization on UV systems and let TSC
init tests take place later in tsc_init().
Fixes: cf7a63ef4e ("x86/tsc: Calibrate tsc only once")
Suggested-by: Hedi Berriche <hedi.berriche@hpe.com>
Signed-off-by: Mike Travis <mike.travis@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Russ Anderson <rja@hpe.com>
Reviewed-by: Dimitri Sivanich <sivanich@hpe.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Russ Anderson <russ.anderson@hpe.com>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Xiaoming Gao <gxm.linux.kernel@gmail.com>
Cc: Rajvi Jingar <rajvi.jingar@intel.com>
Link: https://lkml.kernel.org/r/20181002180144.923579706@stormcage.americas.sgi.com
The "pciserial" earlyprintk variant helps much on many modern x86
platforms, but unfortunately there are still some platforms with PCI
UART devices which have the wrong PCI class code. In that case, the
current class code check does not allow for them to be used for logging.
Add a sub-option "force" which overrides the class code check and thus
the use of such device can be enforced.
[ bp: massage formulations. ]
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Stuart R . Anderson" <stuart.r.anderson@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H Peter Anvin <hpa@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thymo van Beers <thymovanbeers@gmail.com>
Cc: alan@linux.intel.com
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/20181002164921.25833-1-feng.tang@intel.com
Going primarily by:
https://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors
with additional information gleaned from other related pages; notably:
- Bonnell shrink was called Saltwell
- Moorefield is the Merriefield refresh which makes it Airmont
The general naming scheme is: FAM6_ATOM_UARCH_SOCTYPE
for i in `git grep -l FAM6_ATOM` ; do
sed -i -e 's/ATOM_PINEVIEW/ATOM_BONNELL/g' \
-e 's/ATOM_LINCROFT/ATOM_BONNELL_MID/' \
-e 's/ATOM_PENWELL/ATOM_SALTWELL_MID/g' \
-e 's/ATOM_CLOVERVIEW/ATOM_SALTWELL_TABLET/g' \
-e 's/ATOM_CEDARVIEW/ATOM_SALTWELL/g' \
-e 's/ATOM_SILVERMONT1/ATOM_SILVERMONT/g' \
-e 's/ATOM_SILVERMONT2/ATOM_SILVERMONT_X/g' \
-e 's/ATOM_MERRIFIELD/ATOM_SILVERMONT_MID/g' \
-e 's/ATOM_MOOREFIELD/ATOM_AIRMONT_MID/g' \
-e 's/ATOM_DENVERTON/ATOM_GOLDMONT_X/g' \
-e 's/ATOM_GEMINI_LAKE/ATOM_GOLDMONT_PLUS/g' ${i}
done
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: dave.hansen@linux.intel.com
Cc: len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The success of a cache pseudo-locked region is measured using
performance monitoring events that are programmed directly at the time
the user requests a measurement.
Modifying the performance event registers directly is not appropriate
since it circumvents the in-kernel perf infrastructure that exists to
manage these resources and provide resource arbitration to the
performance monitoring hardware.
The cache pseudo-locking measurements are modified to use the in-kernel
perf infrastructure. Performance events are created and validated with
the appropriate perf API. The performance counters are still read as
directly as possible to avoid the additional cache hits. This is
done safely by first ensuring with the perf API that the counters have
been programmed correctly and only accessing the counters in an
interrupt disabled section where they are not able to be moved.
As part of the transition to the in-kernel perf infrastructure the L2
and L3 measurements are split into two separate measurements that can
be triggered independently. This separation prevents additional cache
misses incurred during the extra testing code used to decide if a
L2 and/or L3 measurement should be made.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: peterz@infradead.org
Cc: acme@kernel.org
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/fc24e728b446404f42c78573c506e98cd0599873.1537468643.git.reinette.chatre@intel.com
A perf event has many attributes that are maintained in a separate
structure that should be provided when a new perf_event is created.
In preparation for the transition to perf_events the required attribute
structures are created for all the events that may be used in the
measurements. Most attributes for all the events are identical. The
actual configuration, what specifies what needs to be measured, is what
will be different between the events used. This configuration needs to
be done with X86_CONFIG that cannot be used as part of the designated
initializers used here, this will be introduced later.
Although they do look identical at this time the attribute structures
needs to be maintained separately since a perf_event will maintain a
pointer to its unique attributes.
In support of patch testing the new structs are given the unused attribute
until their use in later patches.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: acme@kernel.org
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/1822f6164e221a497648d108913d056ab675d5d0.1537377064.git.reinette.chatre@intel.com
Local register variables were used in an effort to improve the
accuracy of the measurement of cache residency of a pseudo-locked
region. This was done to ensure that only the cache residency of
the memory is measured and not the cache residency of the variables
used to perform the measurement.
While local register variables do accomplish the goal they do require
significant care since different architectures have different registers
available. Local register variables also cannot be used with valuable
developer tools like KASAN.
Significant testing has shown that similar accuracy in measurement
results can be obtained by replacing local register variables with
regular local variables.
Make use of local variables in the critical code but do so with
READ_ONCE() to prevent the compiler from merging or refetching reads.
Ensure these variables are initialized before the measurement starts,
and ensure it is only the local variables that are accessed during
the measurement.
With the removal of the local register variables and using READ_ONCE()
there is no longer a motivation for using a direct wrmsr call (that
avoids the additional tracing code that may clobber the local register
variables).
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: acme@kernel.org
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/f430f57347414e0691765d92b144758ab93d8407.1537377064.git.reinette.chatre@intel.com
Use the for_each_of_cpu_node iterator to iterate over cpu nodes. This
has the side effect of defaulting to iterating using "cpu" node names in
preference to the deprecated (for FDT) device_type == "cpu".
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rob Herring <robh@kernel.org>
The Hygon Dhyana CPU has a topology extensions bit in CPUID. With
this bit, the kernel can get the cache information. So add support in
cpuid4_cache_lookup_regs() to get the correct cache size.
The Hygon Dhyana CPU also discovers num_cache_leaves via CPUID leaf
0x8000001d, so add support to it in find_num_cache_leaves().
Also add cacheinfo_hygon_init_llc_id() and init_hygon_cacheinfo()
functions to initialize Dhyana cache info. Setup cache cpumap in the
same way as AMD does.
Signed-off-by: Pu Wen <puwen@hygon.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: bp@alien8.de
Cc: tglx@linutronix.de
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Cc: x86@kernel.org
Cc: thomas.lendacky@amd.com
Link: https://lkml.kernel.org/r/2a686b2ac0e2f5a1f2f5f101124d9dd44f949731.1537533369.git.puwen@hygon.cn
In preparation of switching x86 to use place-relative references for
the code, target and key members of struct jump_entry, replace direct
references to the struct members with invocations of the new accessors.
This will allow us to make the switch by modifying the accessors only.
This incorporates a cleanup of __jump_label_transform() proposed by
Peter.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-s390@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Jessica Yu <jeyu@kernel.org>
Link: https://lkml.kernel.org/r/20180919065144.25010-6-ard.biesheuvel@linaro.org
Add support for R_X86_64_PC64 relocations, which operate on 64-bit
quantities holding a relative symbol reference. Also remove the
definition of R_X86_64_NUM: given that it is currently unused, it
is unclear what the new value should be.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-s390@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Jessica Yu <jeyu@kernel.org>
Link: https://lkml.kernel.org/r/20180919065144.25010-5-ard.biesheuvel@linaro.org
Add x86 architecture support for a new processor: Hygon Dhyana Family
18h. Carve out initialization code needed by Dhyana into a separate
compilation unit.
To identify Hygon Dhyana CPU, add a new vendor type X86_VENDOR_HYGON.
Since Dhyana uses AMD functionality to a large degree, select
CPU_SUP_AMD which provides that functionality.
[ bp: drop explicit license statement as it has an SPDX tag already. ]
Signed-off-by: Pu Wen <puwen@hygon.cn>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: tglx@linutronix.de
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Cc: x86@kernel.org
Cc: thomas.lendacky@amd.com
Link: https://lkml.kernel.org/r/1a882065223bacbde5726f3beaa70cebd8dcd814.1537533369.git.puwen@hygon.cn
If spectrev2 mitigation has been enabled, RSB is filled on context switch
in order to protect from various classes of spectrev2 attacks.
If this mitigation is enabled, say so in sysfs for spectrev2.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "WoodhouseDavid" <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: "SchauflerCasey" <casey.schaufler@intel.com>
Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251438580.15880@cbobk.fhfr.pm
STIBP is a feature provided by certain Intel ucodes / CPUs. This feature
(once enabled) prevents cross-hyperthread control of decisions made by
indirect branch predictors.
Enable this feature if
- the CPU is vulnerable to spectre v2
- the CPU supports SMT and has SMT siblings online
- spectre_v2 mitigation autoselection is enabled (default)
After some previous discussion, this leaves STIBP on all the time, as wrmsr
on crossing kernel boundary is a no-no. This could perhaps later be a bit
more optimized (like disabling it in NOHZ, experiment with disabling it in
idle, etc) if needed.
Note that the synchronization of the mask manipulation via newly added
spec_ctrl_mutex is currently not strictly needed, as the only updater is
already being serialized by cpu_add_remove_lock, but let's make this a
little bit more future-proof.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "WoodhouseDavid" <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: "SchauflerCasey" <casey.schaufler@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251438240.15880@cbobk.fhfr.pm
Presently we check first if CPUID is enabled. If it is not already
enabled, then we next call identify_cpu_without_cpuid() and clear
X86_FEATURE_CPUID.
Unfortunately, identify_cpu_without_cpuid() is the function where CPUID
becomes _enabled_ on Cyrix 6x86/6x86L CPUs.
Reverse the calling sequence so that CPUID is first enabled, and then
check a second time to see if the feature has now been activated.
[ bp: Massage commit message and remove trailing whitespace. ]
Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Matthew Whitehead <tedheadster@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180921212041.13096-3-tedheadster@gmail.com
There are comments in processor-cyrix.h advising you to _not_ make calls
using the deprecated macros in this style:
setCx86_old(CX86_CCR4, getCx86_old(CX86_CCR4) | 0x80);
This is because it expands the macro into a non-functioning calling
sequence. The calling order must be:
outb(CX86_CCR2, 0x22);
inb(0x23);
From the comments:
* When using the old macros a line like
* setCx86(CX86_CCR2, getCx86(CX86_CCR2) | 0x88);
* gets expanded to:
* do {
* outb((CX86_CCR2), 0x22);
* outb((({
* outb((CX86_CCR2), 0x22);
* inb(0x23);
* }) | 0x88), 0x23);
* } while (0);
The new macros fix this problem, so use them instead.
Signed-off-by: Matthew Whitehead <tedheadster@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jia Zhang <qianyue.zj@alibaba-inc.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180921212041.13096-2-tedheadster@gmail.com
Update the DO_ERROR macro to take si_code and si_addr values for a siginfo,
removing the need for the fill_trap_info function.
Update do_trap to also take the sicode and si_addr values for a sigininfo
and modify the code to call force_sig when a sicode is not passed in
and to call force_sig_fault when all of the information is present.
Making this a more obvious, simpler and less error prone construction.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The function "force_sig(sig, tsk)" is equivalent to "
force_sig_info(sig, SEND_SIG_PRIV, tsk)". Using the siginfo variants can
be error prone so use the simpler old fashioned force_sig variant,
and with luck the force_sig_info variant can go away.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Instead of generating the siginfo in x86 specific code use the new
helper function force_sig_bnderr to separate the concerns of
collecting the information and generating a proper siginfo.
Making the code easier to understand and maintain.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The function do_trap_no_signal embodies almost all of the work of the
function do_trap. The exceptions are setting of thread.error_code and
thread.trap_nr in the case when the signal will be sent, and reporting
which signal will be sent with show_signal.
Filling in struct siginfo and then calling do_trap is problematic as
filling in struct siginfo is an fiddly process that can through
inattention has resulted in fields not initialized and the wrong
fields being filled in.
To avoid this error prone situation I am replacing force_sig_info with
a set of functions that take as arguments the information needed to
send a specific kind of signal.
The function do_trap is called in the context of several different
kinds of signals today. Having a solid do_trap_no_signal that
can be reused allows call sites that send different kinds of
signals to reuse all of the code in do_trap_no_signal.
Modify do_trap_no_signal to have a single exit there signals
where be sent (aka returning -1) to allow more of the signal
sending path to be moved to from do_trap to do_trap_no_signal.
Move setting thread.trap_nr and thread.error_code into do_trap_no_signal
so the code does not need to be duplicated.
Make the type of the string that is passed into do_trap_no_signal to
const. The only user of that str is die and it already takes a const
string, so this just makes it explicit that the string won't change.
All of this prepares the way for using do_trap_no_signal outside
of do_trap.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Clear the MCE struct which is used for collecting the injection details
after injection.
Also, populate it with more details from the machine.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20180905081954.10391-1-bp@alien8.de
We met a kernel panic when enabling earlycon, which is due to the fixmap
address of earlycon is not statically setup.
Currently the static fixmap setup in head_64.S only covers 2M virtual
address space, while it actually could be in 4M space with different
kernel configurations, e.g. when VSYSCALL emulation is disabled.
So increase the static space to 4M for now by defining FIXMAP_PMD_NUM to 2,
and add a build time check to ensure that the fixmap is covered by the
initial static page tables.
Fixes: 1ad83c858c ("x86_64,vsyscall: Make vsyscall emulation configurable")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: kernel test robot <rong.a.chen@intel.com>
Reviewed-by: Juergen Gross <jgross@suse.com> (Xen parts)
Cc: H Peter Anvin <hpa@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180920025828.23699-1-feng.tang@intel.com
All the cache maintainance is already stubbed out when not enabled,
but merging the two allows us to nicely handle the case where
cache maintainance is required for some devices, but not others.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Paul Burton <paul.burton@mips.com> # MIPS parts
The code for conditionally printing unhanded signals is duplicated twice
in arch/x86/kernel/traps.c. Factor it out into it's own subroutine
called show_signal to make the code clearer and easier to maintain.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
This separates the logic of generating the signal from the logic of
gathering the information about the bounds violation.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The function fill_sigtrap_info now only has one caller so remove
it and put it's contents in it's caller.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Replace user_single_step_siginfo with user_single_step_report
that allocates siginfo structure on the stack and sends it.
This allows tracehook_report_syscall_exit to become a simple
if statement that calls user_single_step_report or ptrace_report_syscall
depending on the value of step.
Update the default helper function now called user_single_step_report
to explicitly set si_code to SI_USER and to set si_uid and si_pid to 0.
The default helper has always been doing this (using memset) but it
was far from obvious.
The powerpc helper can now just call force_sig_fault.
The x86 helper can now just call send_sigtrap.
Unfortunately the default implementation of user_single_step_report
can not use force_sig_fault as it does not use a SIGTRAP si_code.
So it has to carefully setup the siginfo and use use force_sig_info.
The net result is code that is easier to understand and simpler
to maintain.
Ref: 85ec7fd9f8 ("ptrace: introduce user_single_step_siginfo() helper")
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The first argument to WARN_ONCE() is a condition.
Fixes: 5800dc5c19 ("x86/paravirt: Fix spectre-v2 mitigations for paravirt guests")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alok Kataria <akataria@vmware.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: virtualization@lists.linux-foundation.org
Cc: kernel-janitors@vger.kernel.org
Link: https://lkml.kernel.org/r/20180919103553.GD9238@mwanda
In order to determine a sane default cache allocation for a new CAT/CDP
resource group, all resource groups are checked to determine which cache
portions are available to share. At this time all possible CLOSIDs
that can be supported by the resource is checked. This is problematic
if the resource supports more CLOSIDs than another CAT/CDP resource. In
this case, the number of CLOSIDs that could be allocated are fewer than
the number of CLOSIDs that can be supported by the resource.
Limit the check of closids to that what is supported by the system based
on the minimum across all resources.
Fixes: 95f0b77ef ("x86/intel_rdt: Initialize new resource group with sane defaults")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Xiaochen Shen" <xiaochen.shen@intel.com>
Cc: "Chen Yu" <yu.c.chen@intel.com>
Link: https://lkml.kernel.org/r/1537048707-76280-10-git-send-email-fenghua.yu@intel.com
It is possible for a resource group to consist out of MBA as well as
CAT/CDP resources. The "exclusive" resource mode only applies to the
CAT/CDP resources since MBA allocations cannot be specified to overlap
or not. When a user requests a resource group to become "exclusive" then it
can only be successful if there are CAT/CDP resources in the group
and none of their CBMs associated with the group's CLOSID overlaps with
any other resource group.
Fix the "exclusive" mode setting by failing if there isn't any CAT/CDP
resource in the group and ensuring that the CBM checking is only done on
CAT/CDP resources.
Fixes: 49f7b4efa ("x86/intel_rdt: Enable setting of exclusive mode")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Xiaochen Shen" <xiaochen.shen@intel.com>
Cc: "Chen Yu" <yu.c.chen@intel.com>
Link: https://lkml.kernel.org/r/1537048707-76280-9-git-send-email-fenghua.yu@intel.com
A loop is used to check if a CAT resource's CBM of one CLOSID
overlaps with the CBM of another CLOSID of the same resource. The loop
is run over all CLOSIDs supported by the resource.
The problem with running the loop over all CLOSIDs supported by the
resource is that its number of supported CLOSIDs may be more than the
number of supported CLOSIDs on the system, which is the minimum number of
CLOSIDs supported across all resources.
Fix the loop to only consider the number of system supported CLOSIDs,
not all that are supported by the resource.
Fixes: 49f7b4efa ("x86/intel_rdt: Enable setting of exclusive mode")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Xiaochen Shen" <xiaochen.shen@intel.com>
Cc: "Chen Yu" <yu.c.chen@intel.com>
Link: https://lkml.kernel.org/r/1537048707-76280-8-git-send-email-fenghua.yu@intel.com
A system supporting pseudo-locking may have MBA as well as CAT
resources of which only the CAT resources could support cache
pseudo-locking. When the schemata to be pseudo-locked is provided it
should be checked that that schemata does not attempt to pseudo-lock a
MBA resource.
Fixes: e0bdfe8e3 ("x86/intel_rdt: Support creation/removal of pseudo-locked region")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Xiaochen Shen" <xiaochen.shen@intel.com>
Cc: "Chen Yu" <yu.c.chen@intel.com>
Link: https://lkml.kernel.org/r/1537048707-76280-7-git-send-email-fenghua.yu@intel.com
When a new resource group is created, it is initialized with sane
defaults that currently assume the resource being initialized is a CAT
resource. This code path is also followed by a MBA resource that is not
allocated the same as a CAT resource and as a result we encounter the
following unchecked MSR access error:
unchecked MSR access error: WRMSR to 0xd51 (tried to write 0x0000
000000000064) at rIP: 0xffffffffae059994 (native_write_msr+0x4/0x20)
Call Trace:
mba_wrmsr+0x41/0x80
update_domains+0x125/0x130
rdtgroup_mkdir+0x270/0x500
Fix the above by ensuring the initial allocation is only attempted on a
CAT resource.
Fixes: 95f0b77ef ("x86/intel_rdt: Initialize new resource group with sane defaults")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Xiaochen Shen" <xiaochen.shen@intel.com>
Cc: "Chen Yu" <yu.c.chen@intel.com>
Link: https://lkml.kernel.org/r/1537048707-76280-6-git-send-email-fenghua.yu@intel.com
When multiple resources are managed by RDT, the number of CLOSIDs used
is the minimum of the CLOSIDs supported by each resource. In the function
rdt_bit_usage_show(), the annotated bitmask is created to depict how the
CAT supporting caches are being used. During this annotated bitmask
creation, each resource group is queried for its mode that is used as a
label in the annotated bitmask.
The maximum number of resource groups is currently assumed to be the
number of CLOSIDs supported by the resource for which the information is
being displayed. This is incorrect since the number of active CLOSIDs is
the minimum across all resources.
If information for a cache instance with more CLOSIDs than another is
being generated we thus encounter a warning like:
invalid mode for closid 8
WARNING: CPU: 88 PID: 1791 at [SNIP]/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
:827 rdt_bit_usage_show+0x221/0x2b0
Fix this by ensuring that only the number of supported CLOSIDs are
considered.
Fixes: e651901187 ("x86/intel_rdt: Introduce "bit_usage" to display cache allocations details")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Xiaochen Shen" <xiaochen.shen@intel.com>
Cc: "Chen Yu" <yu.c.chen@intel.com>
Link: https://lkml.kernel.org/r/1537048707-76280-5-git-send-email-fenghua.yu@intel.com
The number of CLOSIDs supported by a system is the minimum number of
CLOSIDs supported by any of its resources. Care should be taken when
iterating over the CLOSIDs of a resource since it may be that the number
of CLOSIDs supported on the system is less than the number of CLOSIDs
supported by the resource.
Introduce a helper function that can be used to query the number of
CLOSIDs that is supported by all resources, irrespective of how many
CLOSIDs are supported by a particular resource.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Xiaochen Shen" <xiaochen.shen@intel.com>
Cc: "Chen Yu" <yu.c.chen@intel.com>
Link: https://lkml.kernel.org/r/1537048707-76280-4-git-send-email-fenghua.yu@intel.com
Chen Yu reported a divide-by-zero error when accessing the 'size'
resctrl file when a MBA resource is enabled.
divide error: 0000 [#1] SMP PTI
CPU: 93 PID: 1929 Comm: cat Not tainted 4.19.0-rc2-debug-rdt+ #25
RIP: 0010:rdtgroup_cbm_to_size+0x7e/0xa0
Call Trace:
rdtgroup_size_show+0x11a/0x1d0
seq_read+0xd8/0x3b0
Quoting Chen Yu's report: This is because for MB resource, the
r->cache.cbm_len is zero, thus calculating size in rdtgroup_cbm_to_size()
will trigger the exception.
Fix this issue in the 'size' file by getting correct memory bandwidth value
which is in MBps when MBA software controller is enabled or in percentage
when MBA software controller is disabled.
Fixes: d9b48c86eb ("x86/intel_rdt: Display resource groups' allocations in bytes")
Reported-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Xiaochen Shen" <xiaochen.shen@intel.com>
Link: https://lkml.kernel.org/r/20180904174614.26682-1-yu.c.chen@intel.com
Link: https://lkml.kernel.org/r/1537048707-76280-3-git-send-email-fenghua.yu@intel.com
Each resource is associated with a parsing callback to parse the data
provided from user space when writing schemata file.
The 'data' parameter in the callbacks is defined as a void pointer which
is error prone due to lack of type check.
parse_bw() processes the 'data' parameter as a string while its caller
actually passes the parameter as a pointer to struct rdt_cbm_parse_data.
Thus, parse_bw() takes wrong data and causes failure of parsing MBA
throttle value.
To fix the issue, the 'data' parameter in all parsing callbacks is defined
and handled as a pointer to struct rdt_parse_data (renamed from struct
rdt_cbm_parse_data).
Fixes: 7604df6e16 ("x86/intel_rdt: Support flexible data to parsing callbacks")
Fixes: 9ab9aa15c3 ("x86/intel_rdt: Ensure requested schemata respects mode")
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Chen Yu" <yu.c.chen@intel.com>
Link: https://lkml.kernel.org/r/1537048707-76280-2-git-send-email-fenghua.yu@intel.com
Linux spreads out the non managed interrupt across the possible target CPUs
to avoid vector space exhaustion.
Managed interrupts are treated differently, as for them the vectors are
reserved (with guarantee) when the interrupt descriptors are initialized.
When the interrupt is requested a real vector is assigned. The assignment
logic uses the first CPU in the affinity mask for assignment. If the
interrupt has more than one CPU in the affinity mask, which happens when a
multi queue device has less queues than CPUs, then doing the same search as
for non managed interrupts makes sense as it puts the interrupt on the
least interrupt plagued CPU. For single CPU affine vectors that's obviously
a NOOP.
Restructre the matrix allocation code so it does the 'best CPU' search, add
the sanity check for an empty affinity mask and adapt the call site in the
x86 vector management code.
[ tglx: Added the empty mask check to the core and improved change log ]
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/20180908175838.14450-2-dou_liyang@163.com
The recent removal of the memblock dependency from kvmclock caused a SEV
guest regression because the wall_clock and hv_clock_boot variables are
no longer mapped decrypted when SEV is active.
Use the __bss_decrypted attribute to put the static wall_clock and
hv_clock_boot in the .bss..decrypted section so that they are mapped
decrypted during boot.
In the preparatory stage of CPU hotplug, the per-cpu pvclock data pointer
assigns either an element of the static array or dynamically allocated
memory for the pvclock data pointer. The static array are now mapped
decrypted but the dynamically allocated memory is not mapped decrypted.
However, when SEV is active this memory range must be mapped decrypted.
Add a function which is called after the page allocator is up, and
allocate memory for the pvclock data pointers for the all possible cpus.
Map this memory range as decrypted when SEV is active.
Fixes: 368a540e02 ("x86/kvmclock: Remove memblock dependency")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Link: https://lkml.kernel.org/r/1536932759-12905-3-git-send-email-brijesh.singh@amd.com
kvmclock defines few static variables which are shared with the
hypervisor during the kvmclock initialization.
When SEV is active, memory is encrypted with a guest-specific key, and
if the guest OS wants to share the memory region with the hypervisor
then it must clear the C-bit before sharing it.
Currently, we use kernel_physical_mapping_init() to split large pages
before clearing the C-bit on shared pages. But it fails when called from
the kvmclock initialization (mainly because the memblock allocator is
not ready that early during boot).
Add a __bss_decrypted section attribute which can be used when defining
such shared variable. The so-defined variables will be placed in the
.bss..decrypted section. This section will be mapped with C=0 early
during boot.
The .bss..decrypted section has a big chunk of memory that may be unused
when memory encryption is not active, free it when memory encryption is
not active.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Radim Krčmář<rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Link: https://lkml.kernel.org/r/1536932759-12905-2-git-send-email-brijesh.singh@amd.com
Get rid of local @cpu variable which is unused in the
!CONFIG_IA32_EMULATION case.
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1536806985-24197-1-git-send-email-zhongjiang@huawei.com
[ Clean up commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Fix build warning in apm_32.c when CONFIG_PROC_FS is not enabled:
../arch/x86/kernel/apm_32.c:1643:12: warning: 'proc_apm_show' defined but not used [-Wunused-function]
static int proc_apm_show(struct seq_file *m, void *v)
Fixes: 3f3942aca6 ("proc: introduce proc_create_single{,_data}")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jiri Kosina <jikos@kernel.org>
Link: https://lkml.kernel.org/r/be39ac12-44c2-4715-247f-4dcc3c525b8b@infradead.org
The SYSCALL64 trampoline has a couple of nice properties:
- The usual sequence of SWAPGS followed by two GS-relative accesses to
set up RSP is somewhat slow because the GS-relative accesses need
to wait for SWAPGS to finish. The trampoline approach allows
RIP-relative accesses to set up RSP, which avoids the stall.
- The trampoline avoids any percpu access before CR3 is set up,
which means that no percpu memory needs to be mapped in the user
page tables. This prevents using Meltdown to read any percpu memory
outside the cpu_entry_area and prevents using timing leaks
to directly locate the percpu areas.
The downsides of using a trampoline may outweigh the upsides, however.
It adds an extra non-contiguous I$ cache line to system calls, and it
forces an indirect jump to transfer control back to the normal kernel
text after CR3 is set up. The latter is because x86 lacks a 64-bit
direct jump instruction that could jump from the trampoline to the entry
text. With retpolines enabled, the indirect jump is extremely slow.
Change the code to map the percpu TSS into the user page tables to allow
the non-trampoline SYSCALL64 path to work under PTI. This does not add a
new direct information leak, since the TSS is readable by Meltdown from the
cpu_entry_area alias regardless. It does allow a timing attack to locate
the percpu area, but KASLR is more or less a lost cause against local
attack on CPUs vulnerable to Meltdown regardless. As far as I'm concerned,
on current hardware, KASLR is only useful to mitigate remote attacks that
try to attack the kernel without first gaining RCE against a vulnerable
user process.
On Skylake, with CONFIG_RETPOLINE=y and KPTI on, this reduces syscall
overhead from ~237ns to ~228ns.
There is a possible alternative approach: Move the trampoline within 2G of
the entry text and make a separate copy for each CPU. This would allow a
direct jump to rejoin the normal entry path. There are pro's and con's for
this approach:
+ It avoids a pipeline stall
- It executes from an extra page and read from another extra page during
the syscall. The latter is because it needs to use a relative
addressing mode to find sp1 -- it's the same *cacheline*, but accessed
using an alias, so it's an extra TLB entry.
- Slightly more memory. This would be one page per CPU for a simple
implementation and 64-ish bytes per CPU or one page per node for a more
complex implementation.
- More code complexity.
The current approach is chosen for simplicity and because the alternative
does not provide a significant benefit, which makes it worth.
[ tglx: Added the alternative discussion to the changelog ]
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/8c7c6e483612c3e4e10ca89495dc160b1aa66878.1536015544.git.luto@kernel.org
For userspace to tell the difference between an random signal
and an exception, the exception must include siginfo information.
Using SEND_SIG_FORCED for SIGSEGV is thus wrong, and it will result in
userspace seeing si_code == SI_USER (like a random signal) instead of
si_code == SI_KERNEL or a more specific si_code as all exceptions
deliver.
Therefore replace force_sig_info(SIGSEGV, SEND_SIG_FORCE, current)
with force_sig(SIG_SEGV, current) which gets this right and is shorter
and easier to type.
Fixes: 791eca1010 ("uretprobes/x86: Hijack return address")
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
When CONFIG_PARAVIRT_SPINLOCKS=n, it generates a warning:
arch/x86/kernel/paravirt_patch_64.c: In function ‘native_patch’:
arch/x86/kernel/paravirt_patch_64.c:89:1: warning: label ‘patch_site’ defined but not used [-Wunused-label]
patch_site:
... but those labels can simply be removed by directly calling the
respective functions there.
Get rid of local variables too, while at it. Also, simplify function
flow for better readability.
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20180911091510.GA12094@zn.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
memory_corruption_check[{_period|_size}]()'s handlers do not check input
argument before passing it to kstrtoul() or simple_strtoull(). The argument
would be a NULL pointer if each of the kernel parameters, without its
value, is set in command line and thus cause the following panic.
PANIC: early exception 0xe3 IP 10:ffffffff73587c22 error 0 cr2 0x0
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.18-rc8+ #2
[ 0.000000] RIP: 0010:kstrtoull+0x2/0x10
...
[ 0.000000] Call Trace
[ 0.000000] ? set_corruption_check+0x21/0x49
[ 0.000000] ? do_early_param+0x4d/0x82
[ 0.000000] ? parse_args+0x212/0x330
[ 0.000000] ? rdinit_setup+0x26/0x26
[ 0.000000] ? parse_early_options+0x20/0x23
[ 0.000000] ? rdinit_setup+0x26/0x26
[ 0.000000] ? parse_early_param+0x2d/0x39
[ 0.000000] ? setup_arch+0x2f7/0xbf4
[ 0.000000] ? start_kernel+0x5e/0x4c2
[ 0.000000] ? load_ucode_bsp+0x113/0x12f
[ 0.000000] ? secondary_startup_64+0xa5/0xb0
This patch adds checks to prevent the panic.
Signed-off-by: He Zhe <zhe.he@windriver.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: gregkh@linuxfoundation.org
Cc: kstewart@linuxfoundation.org
Cc: pombredanne@nexb.com
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1534260823-87917-1-git-send-email-zhe.he@windriver.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
activate_managed() returns EINVAL instead of -EINVAL in case of
error. While this is unlikely to happen, the positive return value would
cause further malfunction at the call site.
Fixes: 2db1f959d9 ("x86/vector: Handle managed interrupts proper")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
In the non-trampoline SYSCALL64 path, a percpu variable is used to
temporarily store the user RSP value.
Instead of a separate variable, use the otherwise unused sp2 slot in the
TSS. This will improve cache locality, as the sp1 slot is already used in
the same code to find the kernel stack. It will also simplify a future
change to make the non-trampoline path work in PTI mode.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/08e769a0023dbad4bac6f34f3631dbaf8ad59f4f.1536015544.git.luto@kernel.org
The idtentry macro is complicated and magical. Document what it
does to help future readers and to allow future patches to adjust
the code and docs at the same time.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/6e56c3ad94879e41afe345750bc28ccc0e820ea8.1536015544.git.luto@kernel.org
When the kernel.print-fatal-signals sysctl has been enabled, a simple
userspace crash will cause the kernel to write a crash dump that contains,
among other things, the kernel gsbase into dmesg.
As suggested by Andy, limit output to pt_regs, FS_BASE and KERNEL_GS_BASE
in this case.
This also moves the bitness-specific logic from show_regs() into
process_{32,64}.c.
Fixes: 45807a1df9 ("vdso: print fatal signals")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180831194151.123586-1-jannh@google.com
Loops per jiffy is calculated by multiplying tsc_khz with 1e3 and then
dividing it by HZ.
Both tsc_khz and the temporary variable holding the multiplication result
are of type unsigned long, so on 32bit the result is truncated to the lower
32bit.
Use u64 as type for the temporary variable and cast tsc_khz to it before
multiplying.
[ tglx: Massaged changelog and removed pointless braces ]
Fixes: cf7a63ef4e ("x86/tsc: Calibrate tsc only once")
Signed-off-by: Chuanhua Lei <chuanhua.lei@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: yixin.zhu@linux.intel.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@microsoft.com>
Cc: Rajvi Jingar <rajvi.jingar@intel.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Link: https://lkml.kernel.org/r/1536228203-18701-1-git-send-email-chuanhua.lei@linux.intel.com
This is preparation for looking at trap number and fault address in the
handlers for uaccess errors. No functional change.
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-kernel@vger.kernel.org
Cc: dvyukov@google.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20180828201421.157735-6-jannh@google.com
This removes the call into exception fixup that was added in commit
c28f896634 ("[PATCH] kprobes: fix broken fault handling for x86_64").
On X86, kprobe_fault_handler() is called from two places:
do_general_protection() (for #GP) and kprobes_fault() (for #PF). In both
paths, the fixup_exception() call in the kprobe fault handler is redundant.
In case of #GP, fixup_exception() is called immediately before
kprobe_fault_handler() is invoked, so no need to try that again. This
assumes that the kprobe's fault handler isn't going to do something crazy
like changing RIP so that it suddenly points to an instruction that does
userspace access.
For #PF on a kernel address from kernel space, after the kprobe fault
handler has run, no_context() is invoked, which calls fixup_exception().
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Kees Cook <keescook@chromium.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-kernel@vger.kernel.org
Cc: dvyukov@google.com
Cc: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20180828201421.157735-4-jannh@google.com
The opaque plumbing of #GP from do_general_protection() through
notify_die() into kprobe_exceptions_notify() makes it hard to understand
what's going on.
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Kees Cook <keescook@chromium.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: dvyukov@google.com
Cc: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20180828201421.157735-3-jannh@google.com
Handle the case where microcode gets loaded on the BSP's hyperthread
sibling first and the boot_cpu_data's microcode revision doesn't get
updated because of early exit due to the siblings sharing a microcode
engine.
For that, simply write the updated revision on all CPUs unconditionally.
Signed-off-by: Filippo Sironi <sironi@amazon.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: prarit@redhat.com
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1533050970-14385-1-git-send-email-sironi@amazon.de
When preparing an MCE record for logging, boot_cpu_data.microcode is used
to read out the microcode revision on the box.
However, on systems where late microcode update has happened, the microcode
revision output in a MCE log record is wrong because
boot_cpu_data.microcode is not updated when the microcode gets updated.
But, the microcode revision saved in boot_cpu_data's microcode member
should be kept up-to-date, regardless, for consistency.
Make it so.
Fixes: fa94d0c6e0 ("x86/MCE: Save microcode revision in machine check records")
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: sironi@amazon.de
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20180731112739.32338-1-prarit@redhat.com
The microcode revision is already readable for non-root users via
/proc/cpuinfo. Thus, there's no reason to keep the same information
readable by root only in /sys/devices/system/cpu/cpuX/microcode/.
Make .../processor_flags world-readable too, while at it.
Reported-by: Tim Burgess <timb@dug.com>
Signed-off-by: Jacek Tomaka <jacek.tomaka@poczta.fm>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180825035039.14409-1-jacekt@dugeo.com
show_opcodes() is used both for dumping kernel instructions and for dumping
user instructions. If userspace causes #PF by jumping to a kernel address,
show_opcodes() can be reached with regs->ip controlled by the user,
pointing to kernel code. Make sure that userspace can't trick us into
dumping kernel memory into dmesg.
Fixes: 7cccf0725c ("x86/dumpstack: Add a show_ip() function")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: security@kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180828154901.112726-1-jannh@google.com
text_poke() and text_poke_bp() must be called with text_mutex held.
Put proper lockdep anotation in place instead of just mentioning the
requirement in a comment.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1808280853520.25787@cbobk.fhfr.pm
Reset the KASAN shadow state of the task stack before rewinding RSP.
Without this, a kernel oops will leave parts of the stack poisoned, and
code running under do_exit() can trip over such poisoned regions and cause
nonsensical false-positive KASAN reports about stack-out-of-bounds bugs.
This does not wipe the exception stacks; if an oops happens on an exception
stack, it might result in random KASAN false-positives from other tasks
afterwards. This is probably relatively uninteresting, since if the kernel
oopses on an exception stack, there are most likely bigger things to worry
about. It'd be more interesting if vmapped stacks and KASAN were
compatible, since then handle_stack_overflow() would oops from exception
stack context.
Fixes: 2deb4be280 ("x86/dumpstack: When OOPSing, rewind the stack before do_exit()")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: kasan-dev@googlegroups.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180828184033.93712-1-jannh@google.com
On Nehalem and newer core CPUs the CPU cache internally uses 44 bits
physical address space. The L1TF workaround is limited by this internal
cache address width, and needs to have one bit free there for the
mitigation to work.
Older client systems report only 36bit physical address space so the range
check decides that L1TF is not mitigated for a 36bit phys/32GB system with
some memory holes.
But since these actually have the larger internal cache width this warning
is bogus because it would only really be needed if the system had more than
43bits of memory.
Add a new internal x86_cache_bits field. Normally it is the same as the
physical bits field reported by CPUID, but for Nehalem and newerforce it to
be at least 44bits.
Change the L1TF memory size warning to use the new cache_bits field to
avoid bogus warnings and remove the bogus comment about memory size.
Fixes: 17dbca1193 ("x86/speculation/l1tf: Add sysfs reporting for l1tf")
Reported-by: George Anchev <studio@anchev.net>
Reported-by: Christopher Snowhill <kode54@gmail.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Michael Hocko <mhocko@suse.com>
Cc: vbabka@suse.cz
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180824170351.34874-1-andi@firstfloor.org
Pull x86 fixes from Thomas Gleixner:
- Correct the L1TF fallout on 32bit and the off by one in the 'too much
RAM for protection' calculation.
- Add a helpful kernel message for the 'too much RAM' case
- Unbreak the VDSO in case that the compiler desides to use indirect
jumps/calls and emits retpolines which cannot be resolved because the
kernel uses its own thunks, which does not work for the VDSO. Make it
use the builtin thunks.
- Re-export start_thread() which was unexported when the 32/64bit
implementation was unified. start_thread() is required by modular
binfmt handlers.
- Trivial cleanups
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation/l1tf: Suggest what to do on systems with too much RAM
x86/speculation/l1tf: Fix off-by-one error when warning that system has too much RAM
x86/kvm/vmx: Remove duplicate l1d flush definitions
x86/speculation/l1tf: Fix overflow in l1tf_pfn_limit() on 32bit
x86/process: Re-export start_thread()
x86/mce: Add notifier_block forward declaration
x86/vdso: Fix vDSO build if a retpoline is emitted
* memory_failure() gets confused by dev_pagemap backed mappings. The
recovery code has specific enabling for several possible page states
that needs new enabling to handle poison in dax mappings. Teach
memory_failure() about ZONE_DEVICE pages.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE5DAy15EJMCV1R6v9YGjFFmlTOEoFAlt9ui8ACgkQYGjFFmlT
OEpNRw//XGj9s7sezfJFeol4psJlRUd935yii/gmJRgi/yPf2VxxQG9qyM6SMBUc
75jASfOL6FSsfxHz0kplyWzMDNdrTkNNAD+9rv80FmY7GqWgcas9DaJX7jZ994vI
5SRO7pfvNZcXlo7IhqZippDw3yxkIU9Ufi0YQKaEUm7GFieptvCZ0p9x3VYfdvwM
BExrxQe0X1XUF4xErp5P78+WUbKxP47DLcucRDig8Q7dmHELUdyNzo3E1SVoc7m+
3CmvyTj6XuFQgOZw7ZKun1BJYfx/eD5ZlRJLZbx6wJHRtTXv/Uea8mZ8mJ31ykN9
F7QVd0Pmlyxys8lcXfK+nvpL09QBE0/PhwWKjmZBoU8AdgP/ZvBXLDL/D6YuMTg6
T4wwtPNJorfV4lVD06OliFkVI4qbKbmNsfRq43Ns7PCaLueu4U/eMaSwSH99UMaZ
MGbO140XW2RZsHiU9yTRUmZq73AplePEjxtzR8oHmnjo45nPDPy8mucWPlkT9kXA
oUFMhgiviK7dOo19H4eaPJGqLmHM93+x5tpYxGqTr0dUOXUadKWxMsTnkID+8Yi7
/kzQWCFvySz3VhiEHGuWkW08GZT6aCcpkREDomnRh4MEnETlZI8bblcuXYOCLs6c
nNf1SIMtLdlsl7U1fEX89PNeQQ2y237vEDhFQZftaalPeu/JJV0=
=Ftop
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-4.19_dax-memory-failure' of gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm memory-failure update from Dave Jiang:
"As it stands, memory_failure() gets thoroughly confused by dev_pagemap
backed mappings. The recovery code has specific enabling for several
possible page states and needs new enabling to handle poison in dax
mappings.
In order to support reliable reverse mapping of user space addresses:
1/ Add new locking in the memory_failure() rmap path to prevent races
that would typically be handled by the page lock.
2/ Since dev_pagemap pages are hidden from the page allocator and the
"compound page" accounting machinery, add a mechanism to determine
the size of the mapping that encompasses a given poisoned pfn.
3/ Given pmem errors can be repaired, change the speculatively
accessed poison protection, mce_unmap_kpfn(), to be reversible and
otherwise allow ongoing access from the kernel.
A side effect of this enabling is that MADV_HWPOISON becomes usable
for dax mappings, however the primary motivation is to allow the
system to survive userspace consumption of hardware-poison via dax.
Specifically the current behavior is:
mce: Uncorrected hardware memory error in user-access at af34214200
{1}[Hardware Error]: It has been corrected by h/w and requires no further action
mce: [Hardware Error]: Machine check events logged
{1}[Hardware Error]: event severity: corrected
Memory failure: 0xaf34214: reserved kernel page still referenced by 1 users
[..]
Memory failure: 0xaf34214: recovery action for reserved kernel page: Failed
mce: Memory error not recovered
<reboot>
...and with these changes:
Injecting memory failure for pfn 0x20cb00 at process virtual address 0x7f763dd00000
Memory failure: 0x20cb00: Killing dax-pmd:5421 due to hardware memory corruption
Memory failure: 0x20cb00: recovery action for dax page: Recovered
Given all the cross dependencies I propose taking this through
nvdimm.git with acks from Naoya, x86/core, x86/RAS, and of course dax
folks"
* tag 'libnvdimm-for-4.19_dax-memory-failure' of gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm:
libnvdimm, pmem: Restore page attributes when clearing errors
x86/memory_failure: Introduce {set, clear}_mce_nospec()
x86/mm/pat: Prepare {reserve, free}_memtype() for "decoy" addresses
mm, memory_failure: Teach memory_failure() about dev_pagemap pages
filesystem-dax: Introduce dax_lock_mapping_entry()
mm, memory_failure: Collect mapping size in collect_procs()
mm, madvise_inject_error: Let memory_failure() optionally take a page reference
mm, dev_pagemap: Do not clear ->mapping on final put
mm, madvise_inject_error: Disable MADV_SOFT_OFFLINE for ZONE_DEVICE pages
filesystem-dax: Set page->index
device-dax: Set page->index
device-dax: Enable page_mapping()
device-dax: Convert to vmf_insert_mixed and vm_fault_t
Including:
- PASID table handling updates for the Intel VT-d driver. It
implements a global PASID space now so that applications
usings multiple devices will just have one PASID.
- A new config option to make iommu passthroug mode the default.
- New sysfs attribute for iommu groups to export the type of the
default domain.
- A debugfs interface (for debug only) usable by IOMMU drivers
to export internals to user-space.
- R-Car Gen3 SoCs support for the ipmmu-vmsa driver
- The ARM-SMMU now aborts transactions from unknown devices and
devices not attached to any domain.
- Various cleanups and smaller fixes all over the place.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJbf/9wAAoJECvwRC2XARrjcuYP/3dIsOFN7Xb4sTOB5wxk4wmD
2Rm5o/18cFekEy4M8fwIBCYkzH/McohgKbOFcH6XiCxIwJ5RdXzITLAwmp4PbvIO
KtwppXSp+MQtboip/bp6NDNBhABErgUtgdXawwENCCrFivXDsB8W4wnXESAOkLv9
4fLXrUgDFCAquLZpLqQobXHhajtGAkSekaasphlhejXFulFyF1YcEUcliU7eXZ0R
rZjL4Zqcyyi5kv6d3WhL+tvmmhr7wfMsMPaW18eRf9tXvMpWRM2GOAj65coI2AWs
1T1kW/jvvrxnewOsmo1nYlw7R07uiRkUfHmJ9tY65xW4120HJFhdFLPUQZXfrX/b
wcGbheYIh6cwAaZBtPJ35bPeW6pREkDOShohbzt45T62Q837cBkr3zyHhNsoOXHS
13YVtTd2vtPa4iLdu2qmEOC1OuhQnMvqHqX0iN8U74QbDxEYYvMfAdx0JL3hmPp/
uynY3QmXIKCeZg+vH2qcWHm07nfaAr5y8WSPA0crnqeznD5zJ4kvJf5dFGmDyTKr
pyTkhidkifm6ZejrJsDZveoZdLpHrOatrqKaoLFh2crMUG3d807NYqQ3JmA3NDjg
zPbYyU4joFGNVjd3XkSnRTGxR6YvLIwNbkQ3b/K/B5AqWJ6VrTbbTCOa4GSms6rF
Qm8wRrmYaycKxkcMqtls
=TeYQ
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
- PASID table handling updates for the Intel VT-d driver. It implements
a global PASID space now so that applications usings multiple devices
will just have one PASID.
- A new config option to make iommu passthroug mode the default.
- New sysfs attribute for iommu groups to export the type of the
default domain.
- A debugfs interface (for debug only) usable by IOMMU drivers to
export internals to user-space.
- R-Car Gen3 SoCs support for the ipmmu-vmsa driver
- The ARM-SMMU now aborts transactions from unknown devices and devices
not attached to any domain.
- Various cleanups and smaller fixes all over the place.
* tag 'iommu-updates-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (42 commits)
iommu/omap: Fix cache flushes on L2 table entries
iommu: Remove the ->map_sg indirection
iommu/arm-smmu-v3: Abort all transactions if SMMU is enabled in kdump kernel
iommu/arm-smmu-v3: Prevent any devices access to memory without registration
iommu/ipmmu-vmsa: Don't register as BUS IOMMU if machine doesn't have IPMMU-VMSA
iommu/ipmmu-vmsa: Clarify supported platforms
iommu/ipmmu-vmsa: Fix allocation in atomic context
iommu: Add config option to set passthrough as default
iommu: Add sysfs attribyte for domain type
iommu/arm-smmu-v3: sync the OVACKFLG to PRIQ consumer register
iommu/arm-smmu: Error out only if not enough context interrupts
iommu/io-pgtable-arm-v7s: Abort allocation when table address overflows the PTE
iommu/io-pgtable-arm: Fix pgtable allocation in selftest
iommu/vt-d: Remove the obsolete per iommu pasid tables
iommu/vt-d: Apply per pci device pasid table in SVA
iommu/vt-d: Allocate and free pasid table
iommu/vt-d: Per PCI device pasid table interfaces
iommu/vt-d: Add for_each_device_domain() helper
iommu/vt-d: Move device_domain_info to header
iommu/vt-d: Apply global PASID in SVA
...
Two users have reported [1] that they have an "extremely unlikely" system
with more than MAX_PA/2 memory and L1TF mitigation is not effective.
Make the warning more helpful by suggesting the proper mem=X kernel boot
parameter to make it effective and a link to the L1TF document to help
decide if the mitigation is worth the unusable RAM.
[1] https://bugzilla.suse.com/show_bug.cgi?id=1105536
Suggested-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/966571f0-9d7f-43dc-92c6-a10eec7a1254@suse.cz
If we don't use paravirt; don't play unnecessary and complicated games
to free page-tables.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Make the idle loop handle stopped scheduler tick correctly (Rafael
Wysocki).
- Prevent the menu cpuidle governor from letting CPUs spend too much
time in shallow idle states when it is invoked with scheduler tick
stopped and clean it up somewhat (Rafael Wysocki).
- Avoid invoking the platform firmware to make the platform enter
the ACPI S3 sleep state with suspended PCIe root ports which may
confuse the firmware and cause it to crash (Rafael Wysocki).
- Fix sysfs-related race in the ondemand and conservative cpufreq
governors which may cause the system to crash if the governor
module is removed during an update of CPU frequency limits (Henry
Willard).
- Select SRCU when building the system wakeup framework to avoid a
build issue in it (zhangyi).
- Make the descriptions of ACPI C-states vendor-neutral to avoid
confusion (Prarit Bhargava).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJbfSnVAAoJEILEb/54YlRxBn4QAKQ8PqkSYkBby+1hb90ET4dk
VaLkbCYXuzLK5rIDvnbYOALhVKo4B29Ex5GdCLN7cWkZMkrVKe7oX8QQTnp3/7lF
URjTKgTNec5uJG652PrE3ESAa3X/kYggj6aeQOxDR4iYKzcpJEQ92ekFW+SoJTNp
Jc2kZh3qkC2On64GB3ibsZaKnmHfPvLg0t4agwzuYq/Gff8NRJFk7kMwAPzqGzZo
b2UVRcYFWIRkJjgmU9iInoeHIY8mBdT3IiKwTemZP1dOhb5T1AHOXwGTk6/cS+RH
A9qx4eg7I3R00KmnYvO8WytYJeOu2qb83GIUx4fIJGOqfvevm5xkxB9F+nfE+ouj
ROBqO4+X4XfQGPw8slayg0rJjI9JSkXLnLdl0Qw2WRlbc4/fVWntra1C57EeKFBR
EG9UAF9+7nUUx0bOCLsfFF3+r9R3SDUjk7b4thyhYncyQRsYC+FL7ztlxnMzVtzW
M5SF2sPrpcQzqmcszdUdbESI10n5X8m/crJW4rsbTxBpAM+coO+uLcvHWOY4MpkW
BgBsR6bMDAlG/VlTFgeeP/tkCRd5zNlJi7yBFItXuOoVKXpnHCJuxq2WkZ1Rb74M
Gk1d3TduekHJm8VsLEdCJR/tEk1cMc0zVUD/a1yzI4Z21QxvXUCqMDdws4/Ey184
qmKgNR9R94vSC5xIPRhM
=9GrU
-----END PGP SIGNATURE-----
Merge tag 'pm-4.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
"These fix the main idle loop and the menu cpuidle governor, clean up
the latter, fix a mistake in the PCI bus type's support for system
suspend and resume, fix the ondemand and conservative cpufreq
governors, address a build issue in the system wakeup framework and
make the ACPI C-states desciptions less confusing.
Specifics:
- Make the idle loop handle stopped scheduler tick correctly (Rafael
Wysocki).
- Prevent the menu cpuidle governor from letting CPUs spend too much
time in shallow idle states when it is invoked with scheduler tick
stopped and clean it up somewhat (Rafael Wysocki).
- Avoid invoking the platform firmware to make the platform enter the
ACPI S3 sleep state with suspended PCIe root ports which may
confuse the firmware and cause it to crash (Rafael Wysocki).
- Fix sysfs-related race in the ondemand and conservative cpufreq
governors which may cause the system to crash if the governor
module is removed during an update of CPU frequency limits (Henry
Willard).
- Select SRCU when building the system wakeup framework to avoid a
build issue in it (zhangyi).
- Make the descriptions of ACPI C-states vendor-neutral to avoid
confusion (Prarit Bhargava)"
* tag 'pm-4.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpuidle: menu: Handle stopped tick more aggressively
sched: idle: Avoid retaining the tick when it has been stopped
PCI / ACPI / PM: Resume all bridges on suspend-to-RAM
cpuidle: menu: Update stale polling override comment
cpufreq: governor: Avoid accessing invalid governor_data
x86/ACPI/cstate: Make APCI C1 FFH MWAIT C-state description vendor-neutral
cpuidle: menu: Fix white space
PM / sleep: wakeup: Fix build error caused by missing SRCU support
Currently memory_failure() returns zero if the error was handled. On
that result mce_unmap_kpfn() is called to zap the page out of the kernel
linear mapping to prevent speculative fetches of potentially poisoned
memory. However, in the case of dax mapped devmap pages the page may be
in active permanent use by the device driver, so it cannot be unmapped
from the kernel.
Instead of marking the page not present, marking the page UC should
be sufficient for preventing poison from being pre-fetched into the
cache. Convert mce_unmap_pfn() to set_mce_nospec() remapping the page as
UC, to hide it from speculative accesses.
Given that that persistent memory errors can be cleared by the driver,
include a facility to restore the page to cacheable operation,
clear_mce_nospec().
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: <linux-edac@vger.kernel.org>
Cc: <x86@kernel.org>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
The consolidation of the start_thread() functions removed the export
unintentionally. This breaks binfmt handlers built as a module.
Add it back.
Fixes: e634d8fc79 ("x86-64: merge the standard and compat start_thread() functions")
Signed-off-by: Rian Hunter <rian@alum.mit.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Dmitry Safonov <dima@arista.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180819230854.7275-1-rian@alum.mit.edu
For x86 this brings in PCID emulation and CR3 caching for shadow page
tables, nested VMX live migration, nested VMCS shadowing, an optimized
IPI hypercall, and some optimizations.
ARM will come next week.
There is a semantic conflict because tip also added an .init_platform
callback to kvm.c. Please keep the initializer from this branch,
and add a call to kvmclock_init (added by tip) inside kvm_init_platform
(added here).
Also, there is a backmerge from 4.18-rc6. This is because of a
refactoring that conflicted with a relatively late bugfix and
resulted in a particularly hellish conflict. Because the conflict
was only due to unfortunate timing of the bugfix, I backmerged and
rebased the refactoring rather than force the resolution on you.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJbdwNFAAoJEL/70l94x66DiPEH/1cAGZWGd85Y3yRu1dmTmqiz
kZy0V+WTQ5kyJF4ZsZKKOp+xK7Qxh5e9kLdTo70uPZCHwLu9IaGKN9+dL9Jar3DR
yLPX5bMsL8UUed9g9mlhdaNOquWi7d7BseCOnIyRTolb+cqnM5h3sle0gqXloVrS
UQb4QogDz8+86czqR8tNfazjQRKW/D2HEGD5NDNVY1qtpY+leCDAn9/u6hUT5c6z
EtufgyDh35UN+UQH0e2605gt3nN3nw3FiQJFwFF1bKeQ7k5ByWkuGQI68XtFVhs+
2WfqL3ftERkKzUOy/WoSJX/C9owvhMcpAuHDGOIlFwguNGroZivOMVnACG1AI3I=
=9Mgw
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull first set of KVM updates from Paolo Bonzini:
"PPC:
- minor code cleanups
x86:
- PCID emulation and CR3 caching for shadow page tables
- nested VMX live migration
- nested VMCS shadowing
- optimized IPI hypercall
- some optimizations
ARM will come next week"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (85 commits)
kvm: x86: Set highest physical address bits in non-present/reserved SPTEs
KVM/x86: Use CC_SET()/CC_OUT in arch/x86/kvm/vmx.c
KVM: X86: Implement PV IPIs in linux guest
KVM: X86: Add kvm hypervisor init time platform setup callback
KVM: X86: Implement "send IPI" hypercall
KVM/x86: Move X86_CR4_OSXSAVE check into kvm_valid_sregs()
KVM: x86: Skip pae_root shadow allocation if tdp enabled
KVM/MMU: Combine flushing remote tlb in mmu_set_spte()
KVM: vmx: skip VMWRITE of HOST_{FS,GS}_BASE when possible
KVM: vmx: skip VMWRITE of HOST_{FS,GS}_SEL when possible
KVM: vmx: always initialize HOST_{FS,GS}_BASE to zero during setup
KVM: vmx: move struct host_state usage to struct loaded_vmcs
KVM: vmx: compute need to reload FS/GS/LDT on demand
KVM: nVMX: remove a misleading comment regarding vmcs02 fields
KVM: vmx: rename __vmx_load_host_state() and vmx_save_host_state()
KVM: vmx: add dedicated utility to access guest's kernel_gs_base
KVM: vmx: track host_state.loaded using a loaded_vmcs pointer
KVM: vmx: refactor segmentation code in vmx_save_host_state()
kvm: nVMX: Fix fault priority for VMX operations
kvm: nVMX: Fix fault vector for VMX operation at CPL > 0
...
Here is the bit set of char/misc drivers for 4.19-rc1
There is a lot here, much more than normal, seems like everyone is
writing new driver subsystems these days... Anyway, major things here
are:
- new FSI driver subsystem, yet-another-powerpc low-level
hardware bus
- gnss, finally an in-kernel GPS subsystem to try to tame all of
the crazy out-of-tree drivers that have been floating around
for years, combined with some really hacky userspace
implementations. This is only for GNSS receivers, but you
have to start somewhere, and this is great to see.
Other than that, there are new slimbus drivers, new coresight drivers,
new fpga drivers, and loads of DT bindings for all of these and existing
drivers.
Full details of everything is in the shortlog.
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCW3g7ew8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykfBgCeOG0RkSI92XVZe0hs/QYFW9kk8JYAnRBf3Qpm
cvW7a+McOoKz/MGmEKsi
=TNfn
-----END PGP SIGNATURE-----
Merge tag 'char-misc-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here is the bit set of char/misc drivers for 4.19-rc1
There is a lot here, much more than normal, seems like everyone is
writing new driver subsystems these days... Anyway, major things here
are:
- new FSI driver subsystem, yet-another-powerpc low-level hardware
bus
- gnss, finally an in-kernel GPS subsystem to try to tame all of the
crazy out-of-tree drivers that have been floating around for years,
combined with some really hacky userspace implementations. This is
only for GNSS receivers, but you have to start somewhere, and this
is great to see.
Other than that, there are new slimbus drivers, new coresight drivers,
new fpga drivers, and loads of DT bindings for all of these and
existing drivers.
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (255 commits)
android: binder: Rate-limit debug and userspace triggered err msgs
fsi: sbefifo: Bump max command length
fsi: scom: Fix NULL dereference
misc: mic: SCIF Fix scif_get_new_port() error handling
misc: cxl: changed asterisk position
genwqe: card_base: Use true and false for boolean values
misc: eeprom: assignment outside the if statement
uio: potential double frees if __uio_register_device() fails
eeprom: idt_89hpesx: clean up an error pointer vs NULL inconsistency
misc: ti-st: Fix memory leak in the error path of probe()
android: binder: Show extra_buffers_size in trace
firmware: vpd: Fix section enabled flag on vpd_section_destroy
platform: goldfish: Retire pdev_bus
goldfish: Use dedicated macros instead of manual bit shifting
goldfish: Add missing includes to goldfish.h
mux: adgs1408: new driver for Analog Devices ADGS1408/1409 mux
dt-bindings: mux: add adi,adgs1408
Drivers: hv: vmbus: Cleanup synic memory free path
Drivers: hv: vmbus: Remove use of slow_virt_to_phys()
Drivers: hv: vmbus: Reset the channel callback in vmbus_onoffer_rescind()
...
The split of .system_keyring into .builtin_trusted_keys and
.secondary_trusted_keys broke kexec, thereby preventing kernels signed by
keys which are now in the secondary keyring from being kexec'd.
Fix this by passing VERIFY_USE_SECONDARY_KEYRING to
verify_pefile_signature().
Fixes: d3bfe84129 ("certs: Add a secondary system keyring that can be added to dynamically")
Signed-off-by: Yannik Sembritzki <yannik@sembritzki.me>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: kexec@lists.infradead.org
Cc: keyrings@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAlt1f9AUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vxbdhAArnhRvkwOk4m4/LCuKF6HpmlxbBNC
TjnBCenNf+lFXzWskfDFGFl/Wif4UzGbRTSCNQrwMzj3Ww3f/6R2QIq9rEJvyNC4
VdxQnaBEZSUgN87q5UGqgdjMTo3zFvlFH6fpb5XDiQ5IX/QZeXeYqoB64w+HvKPU
M+IsoOvnA5gb7pMcpchrGUnSfS1e6AqQbbTt6tZflore6YCEA4cH5OnpGx8qiZIp
ut+CMBvQjQB01fHeBc/wGrVte4NwXdONrXqpUb4sHF7HqRNfEh0QVyPhvebBi+k1
kquqoBQfPFTqgcab31VOcQhg70dEx+1qGm5/YBAwmhCpHR/g2gioFXoROsr+iUOe
BtF6LZr+Y8cySuhJnkCrJBqWvvBaKbJLg0KMbI+7p4o9MZpod2u7LS5LFrlRDyKW
3nz3o+b1+v3tCCKVKIhKo0ljolgkweQtR1f6KIHvq93wBODHVQnAOt9NlPfHVyks
ryGBnOhMjoU5hvfexgIWFk9Ph9MEVQSffkI+TeFPO/tyGBfGfQyGtESiXuEaMQaH
FGdZHX2RLkY3pWHOtWeMzRHzOnr2XjpDFcAqL3HBGPdJ30K3Umv3WOgoFe2SaocG
0gaddPjKSwwM4Sa/VP+O5cjGuzi7QnczSDdpYjxIGZzBav32hqx4/rsnLw7bHH8y
XkEme7cYJc8MGsA=
=2Dmn
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci updates from Bjorn Helgaas:
- Decode AER errors with names similar to "lspci" (Tyler Baicar)
- Expose AER statistics in sysfs (Rajat Jain)
- Clear AER status bits selectively based on the type of recovery (Oza
Pawandeep)
- Honor "pcie_ports=native" even if HEST sets FIRMWARE_FIRST (Alexandru
Gagniuc)
- Don't clear AER status bits if we're using the "Firmware-First"
strategy where firmware owns the registers (Alexandru Gagniuc)
- Use sysfs_match_string() to simplify ASPM sysfs parsing (Andy
Shevchenko)
- Remove unnecessary includes of <linux/pci-aspm.h> (Bjorn Helgaas)
- Defer DPC event handling to work queue (Keith Busch)
- Use threaded IRQ for DPC bottom half (Keith Busch)
- Print AER status while handling DPC events (Keith Busch)
- Work around IDT switch ACS Source Validation erratum (James
Puthukattukaran)
- Emit diagnostics for all cases of PCIe Link downtraining (Links
operating slower than they're capable of) (Alexandru Gagniuc)
- Skip VFs when configuring Max Payload Size (Myron Stowe)
- Reduce Root Port Max Payload Size if necessary when hot-adding a
device below it (Myron Stowe)
- Simplify SHPC existence/permission checks (Bjorn Helgaas)
- Remove hotplug sample skeleton driver (Lukas Wunner)
- Convert pciehp to threaded IRQ handling (Lukas Wunner)
- Improve pciehp tolerance of missed events and initially unstable
links (Lukas Wunner)
- Clear spurious pciehp events on resume (Lukas Wunner)
- Add pciehp runtime PM support, including for Thunderbolt controllers
(Lukas Wunner)
- Support interrupts from pciehp bridges in D3hot (Lukas Wunner)
- Mark fall-through switch cases before enabling -Wimplicit-fallthrough
(Gustavo A. R. Silva)
- Move DMA-debug PCI init from arch code to PCI core (Christoph
Hellwig)
- Fix pci_request_irq() usage of IRQF_ONESHOT when no handler is
supplied (Heiner Kallweit)
- Unify PCI and DMA direction #defines (Shunyong Yang)
- Add PCI_DEVICE_DATA() macro (Andy Shevchenko)
- Check for VPD completion before checking for timeout (Bert Kenward)
- Limit Netronome NFP5000 config space size to work around erratum
(Jakub Kicinski)
- Set IRQCHIP_ONESHOT_SAFE for PCI MSI irqchips (Heiner Kallweit)
- Document ACPI description of PCI host bridges (Bjorn Helgaas)
- Add "pci=disable_acs_redir=" parameter to disable ACS redirection for
peer-to-peer DMA support (we don't have the peer-to-peer support yet;
this is just one piece) (Logan Gunthorpe)
- Clean up devm_of_pci_get_host_bridge_resources() resource allocation
(Jan Kiszka)
- Fixup resizable BARs after suspend/resume (Christian König)
- Make "pci=earlydump" generic (Sinan Kaya)
- Fix ROM BAR access routines to stay in bounds and check for signature
correctly (Rex Zhu)
- Add DMA alias quirk for Microsemi Switchtec NTB (Doug Meyer)
- Expand documentation for pci_add_dma_alias() (Logan Gunthorpe)
- To avoid bus errors, enable PASID only if entire path supports
End-End TLP prefixes (Sinan Kaya)
- Unify slot and bus reset functions and remove hotplug knowledge from
callers (Sinan Kaya)
- Add Function-Level Reset quirks for Intel and Samsung NVMe devices to
fix guest reboot issues (Alex Williamson)
- Add function 1 DMA alias quirk for Marvell 88SS9183 PCIe SSD
Controller (Bjorn Helgaas)
- Remove Xilinx AXI-PCIe host bridge arch dependency (Palmer Dabbelt)
- Remove Aardvark outbound window configuration (Evan Wang)
- Fix Aardvark bridge window sizing issue (Zachary Zhang)
- Convert Aardvark to use pci_host_probe() to reduce code duplication
(Thomas Petazzoni)
- Correct the Cadence cdns_pcie_writel() signature (Alan Douglas)
- Add Cadence support for optional generic PHYs (Alan Douglas)
- Add Cadence power management ops (Alan Douglas)
- Remove redundant variable from Cadence driver (Colin Ian King)
- Add Kirin MSI support (Xiaowei Song)
- Drop unnecessary root_bus_nr setting from exynos, imx6, keystone,
armada8k, artpec6, designware-plat, histb, qcom, spear13xx (Shawn
Guo)
- Move link notification settings from DesignWare core to individual
drivers (Gustavo Pimentel)
- Add endpoint library MSI-X interfaces (Gustavo Pimentel)
- Correct signature of endpoint library IRQ interfaces (Gustavo
Pimentel)
- Add DesignWare endpoint library MSI-X callbacks (Gustavo Pimentel)
- Add endpoint library MSI-X test support (Gustavo Pimentel)
- Remove unnecessary GFP_ATOMIC from Hyper-V "new child" allocation
(Jia-Ju Bai)
- Add more devices to Broadcom PAXC quirk (Ray Jui)
- Work around corrupted Broadcom PAXC config space to enable SMMU and
GICv3 ITS (Ray Jui)
- Disable MSI parsing to work around broken Broadcom PAXC logic in some
devices (Ray Jui)
- Hide unconfigured functions to work around a Broadcom PAXC defect
(Ray Jui)
- Lower iproc log level to reduce console output during boot (Ray Jui)
- Fix mobiveil iomem/phys_addr_t type usage (Lorenzo Pieralisi)
- Fix mobiveil missing include file (Lorenzo Pieralisi)
- Add mobiveil Kconfig/Makefile support (Lorenzo Pieralisi)
- Fix mvebu I/O space remapping issues (Thomas Petazzoni)
- Use generic pci_host_bridge in mvebu instead of ARM-specific API
(Thomas Petazzoni)
- Whitelist VMD devices with fast interrupt handlers to avoid sharing
vectors with slow handlers (Keith Busch)
* tag 'pci-v4.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (153 commits)
PCI/AER: Don't clear AER bits if error handling is Firmware-First
PCI: Limit config space size for Netronome NFP5000
PCI/MSI: Set IRQCHIP_ONESHOT_SAFE for PCI-MSI irqchips
PCI/VPD: Check for VPD access completion before checking for timeout
PCI: Add PCI_DEVICE_DATA() macro to fully describe device ID entry
PCI: Match Root Port's MPS to endpoint's MPSS as necessary
PCI: Skip MPS logic for Virtual Functions (VFs)
PCI: Add function 1 DMA alias quirk for Marvell 88SS9183
PCI: Check for PCIe Link downtraining
PCI: Add ACS Redirect disable quirk for Intel Sunrise Point
PCI: Add device-specific ACS Redirect disable infrastructure
PCI: Convert device-specific ACS quirks from NULL termination to ARRAY_SIZE
PCI: Add "pci=disable_acs_redir=" parameter for peer-to-peer support
PCI: Allow specifying devices using a base bus and path of devfns
PCI: Make specifying PCI devices in kernel parameters reusable
PCI: Hide ACS quirk declarations inside PCI core
PCI: Delay after FLR of Intel DC P3700 NVMe
PCI: Disable Samsung SM961/PM961 NVMe before FLR
PCI: Export pcie_has_flr()
PCI: mvebu: Drop bogus comment above mvebu_pcie_map_registers()
...
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJbc41pAAoJEAx081l5xIa+ZrAP/AzKj4i4pBLVJcvNZ2BwD+UD
ZNSNj2iqCJ5+Jo/WtIwQ8tLct9UqfVssUwBke6tZksiLdTigGPTUyVIAdK+9kyWD
D00m3x/pToJrSF2D0FwxQlPUtPkohp9N+E6+TU7gd1oCasZfBzmcEpoVAmZf+NWE
kN1xXpmGxZWpu0wc7JA2lv9MuUTijCwIqJqa5E0bB3z06G5mw+PJ89kYzMx19OyA
ZYQK8y3A40ZGl8UbajZ4xg9pqFCRYFFHGqfYlpUWWTh0XMAXu8+Yqzh3dJxmak7r
4u2pdQBsxPMZO8qKBHpVvI7Zhoe0Ntnolc0XVD+2IbqqnTprVbQs0bWf3YyfUlQi
1/9bWFK67W0LEuzac6M7a7EQqFNiHF13Btao7aqENTIe/GaCZJoopaiRMAmh6EHD
4PezeYqrW8cSaPj6OKouL1BhW9Bjixsg0bvjS/uB6m4KekFCt1++BDFGzkqvm6Mo
SVW7nkJoCFpCASaR7DhUEOPexaHeJ65HCDDUvYdqz9jd2w1TgvvanEZWual1NwEm
ImA8A4wGZ/3KijpyyKm0gE96RX7+zMMZ3brW6p1vhUUKVYJCrvSr5jrXH5+2k6Aw
Y455doGL87IRkwyje/YbQF0I8pbUZD9QS5wII13tLGwOH9/uC/Xl6dHNM40gtqyh
W4gEdY+NAMJmYLvRNawa
=g9rD
-----END PGP SIGNATURE-----
Merge tag 'drm-next-2018-08-15' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
"This is the main drm pull request for 4.19.
Rob has some new hardware support for new qualcomm hw that I'll send
along separately. This has the display part of it, the remaining pull
is for the acceleration engine.
This also contains a wound-wait/wait-die mutex rework, Peter has acked
it for merging via my tree.
Otherwise mostly the usual level of activity. Summary:
core:
- Wound-wait/wait-die mutex rework
- Add writeback connector type
- Add "content type" property for HDMI
- Move GEM bo to drm_framebuffer
- Initial gpu scheduler documentation
- GPU scheduler fixes for dying processes
- Console deferred fbcon takeover support
- Displayport support for CEC tunneling over AUX
panel:
- otm8009a panel driver fixes
- Innolux TV123WAM and G070Y2-L01 panel driver
- Ilitek ILI9881c panel driver
- Rocktech RK070ER9427 LCD
- EDT ETM0700G0EDH6 and EDT ETM0700G0BDH6
- DLC DLC0700YZG-1
- BOE HV070WSA-100
- newhaven, nhd-4.3-480272ef-atxl LCD
- DataImage SCF0700C48GGU18
- Sharp LQ035Q7DB03
- p079zca: Refactor to support multiple panels
tinydrm:
- ILI9341 display panel
New driver:
- vkms - virtual kms driver to testing.
i915:
- Icelake:
Display enablement
DSI support
IRQ support
Powerwell support
- GPU reset fixes and improvements
- Full ppgtt support refactoring
- PSR fixes and improvements
- Execlist improvments
- GuC related fixes
amdgpu:
- Initial amdgpu documentation
- JPEG engine support on VCN
- CIK uses powerplay by default
- Move to using core PCIE functionality for gens/lanes
- DC/Powerplay interface rework
- Stutter mode support for RV
- Vega12 Powerplay updates
- GFXOFF fixes
- GPUVM fault debugging
- Vega12 GFXOFF
- DC improvements
- DC i2c/aux changes
- UVD 7.2 fixes
- Powerplay fixes for Polaris12, CZ/ST
- command submission bo_list fixes
amdkfd:
- Raven support
- Power management fixes
udl:
- Cleanups and fixes
nouveau:
- misc fixes and cleanups.
msm:
- DPU1 support display controller in sdm845
- GPU coredump support.
vmwgfx:
- Atomic modesetting validation fixes
- Support for multisample surfaces
armada:
- Atomic modesetting support completed.
exynos:
- IPPv2 fixes
- Move g2d to component framework
- Suspend/resume support cleanups
- Driver cleanups
imx:
- CSI configuration improvements
- Driver cleanups
- Use atomic suspend/resume helpers
- ipu-v3 V4L2 XRGB32/XBGR32 support
pl111:
- Add Nomadik LCDC variant
v3d:
- GPU scheduler jobs management
sun4i:
- R40 display engine support
- TCON TOP driver
mediatek:
- MT2712 SoC support
rockchip:
- vop fixes
omapdrm:
- Workaround for DRA7 errata i932
- Fix mm_list locking
mali-dp:
- Writeback implementation
PM improvements
- Internal error reporting debugfs
tilcdc:
- Single fix for deferred probing
hdlcd:
- Teardown fixes
tda998x:
- Converted to a bridge driver.
etnaviv:
- Misc fixes"
* tag 'drm-next-2018-08-15' of git://anongit.freedesktop.org/drm/drm: (1506 commits)
drm/amdgpu/sriov: give 8s for recover vram under RUNTIME
drm/scheduler: fix param documentation
drm/i2c: tda998x: correct PLL divider calculation
drm/i2c: tda998x: get rid of private fill_modes function
drm/i2c: tda998x: move mode_valid() to bridge
drm/i2c: tda998x: register bridge outside of component helper
drm/i2c: tda998x: cleanup from previous changes
drm/i2c: tda998x: allocate tda998x_priv inside tda998x_create()
drm/i2c: tda998x: convert to bridge driver
drm/scheduler: fix timeout worker setup for out of order job completions
drm/amd/display: display connected to dp-1 does not light up
drm/amd/display: update clk for various HDMI color depths
drm/amd/display: program display clock on cache match
drm/amd/display: Add NULL check for enabling dp ss
drm/amd/display: add vbios table check for enabling dp ss
drm/amd/display: Don't share clk source between DP and HDMI
drm/amd/display: Fix DP HBR2 Eye Diagram Pattern on Carrizo
drm/amd/display: Use calculated disp_clk_khz value for dce110
drm/amd/display: Implement custom degamma lut on dcn
drm/amd/display: Destroy aux_engines only once
...
Pull networking updates from David Miller:
"Highlights:
- Gustavo A. R. Silva keeps working on the implicit switch fallthru
changes.
- Support 802.11ax High-Efficiency wireless in cfg80211 et al, From
Luca Coelho.
- Re-enable ASPM in r8169, from Kai-Heng Feng.
- Add virtual XFRM interfaces, which avoids all of the limitations of
existing IPSEC tunnels. From Steffen Klassert.
- Convert GRO over to use a hash table, so that when we have many
flows active we don't traverse a long list during accumluation.
- Many new self tests for routing, TC, tunnels, etc. Too many
contributors to mention them all, but I'm really happy to keep
seeing this stuff.
- Hardware timestamping support for dpaa_eth/fsl-fman from Yangbo Lu.
- Lots of cleanups and fixes in L2TP code from Guillaume Nault.
- Add IPSEC offload support to netdevsim, from Shannon Nelson.
- Add support for slotting with non-uniform distribution to netem
packet scheduler, from Yousuk Seung.
- Add UDP GSO support to mlx5e, from Boris Pismenny.
- Support offloading of Team LAG in NFP, from John Hurley.
- Allow to configure TX queue selection based upon RX queue, from
Amritha Nambiar.
- Support ethtool ring size configuration in aquantia, from Anton
Mikaev.
- Support DSCP and flowlabel per-transport in SCTP, from Xin Long.
- Support list based batching and stack traversal of SKBs, this is
very exciting work. From Edward Cree.
- Busyloop optimizations in vhost_net, from Toshiaki Makita.
- Introduce the ETF qdisc, which allows time based transmissions. IGB
can offload this in hardware. From Vinicius Costa Gomes.
- Add parameter support to devlink, from Moshe Shemesh.
- Several multiplication and division optimizations for BPF JIT in
nfp driver, from Jiong Wang.
- Lots of prepatory work to make more of the packet scheduler layer
lockless, when possible, from Vlad Buslov.
- Add ACK filter and NAT awareness to sch_cake packet scheduler, from
Toke Høiland-Jørgensen.
- Support regions and region snapshots in devlink, from Alex Vesker.
- Allow to attach XDP programs to both HW and SW at the same time on
a given device, with initial support in nfp. From Jakub Kicinski.
- Add TLS RX offload and support in mlx5, from Ilya Lesokhin.
- Use PHYLIB in r8169 driver, from Heiner Kallweit.
- All sorts of changes to support Spectrum 2 in mlxsw driver, from
Ido Schimmel.
- PTP support in mv88e6xxx DSA driver, from Andrew Lunn.
- Make TCP_USER_TIMEOUT socket option more accurate, from Jon
Maxwell.
- Support for templates in packet scheduler classifier, from Jiri
Pirko.
- IPV6 support in RDS, from Ka-Cheong Poon.
- Native tproxy support in nf_tables, from Máté Eckl.
- Maintain IP fragment queue in an rbtree, but optimize properly for
in-order frags. From Peter Oskolkov.
- Improvde handling of ACKs on hole repairs, from Yuchung Cheng"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1996 commits)
bpf: test: fix spelling mistake "REUSEEPORT" -> "REUSEPORT"
hv/netvsc: Fix NULL dereference at single queue mode fallback
net: filter: mark expected switch fall-through
xen-netfront: fix warn message as irq device name has '/'
cxgb4: Add new T5 PCI device ids 0x50af and 0x50b0
net: dsa: mv88e6xxx: missing unlock on error path
rds: fix building with IPV6=m
inet/connection_sock: prefer _THIS_IP_ to current_text_addr
net: dsa: mv88e6xxx: bitwise vs logical bug
net: sock_diag: Fix spectre v1 gadget in __sock_diag_cmd()
ieee802154: hwsim: using right kind of iteration
net: hns3: Add vlan filter setting by ethtool command -K
net: hns3: Set tx ring' tc info when netdev is up
net: hns3: Remove tx ring BD len register in hns3_enet
net: hns3: Fix desc num set to default when setting channel
net: hns3: Fix for phy link issue when using marvell phy driver
net: hns3: Fix for information of phydev lost problem when down/up
net: hns3: Fix for command format parsing error in hclge_is_all_function_id_zero
net: hns3: Add support for serdes loopback selftest
bnxt_en: take coredump_record structure off stack
...
- Clean up devm_of_pci_get_host_bridge_resources() resource allocation
(Jan Kiszka)
- Fixup resizable BARs after suspend/resume (Christian König)
- Make "pci=earlydump" generic (Sinan Kaya)
- Fix ROM BAR access routines to stay in bounds and check for signature
correctly (Rex Zhu)
* pci/resource:
PCI: Make pci_get_rom_size() static
PCI: Add check code for last image indicator not set
PCI: Avoid accessing memory outside the ROM BAR
PCI: Make early dump functionality generic
PCI: Cleanup PCI_REBAR_CTRL_BAR_SHIFT handling
PCI: Restore resized BAR state on resume
PCI: Clean up resource allocation in devm_of_pci_get_host_bridge_resources()
# Conflicts:
# Documentation/admin-guide/kernel-parameters.txt
allmodconfig+CONFIG_INTEL_KVM=n results in the following build error.
ERROR: "l1tf_vmx_mitigation" [arch/x86/kvm/kvm.ko] undefined!
Fixes: 5b76a3cff0 ("KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry")
Reported-by: Meelis Roos <mroos@linux.ee>
Cc: Meelis Roos <mroos@linux.ee>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCW3LkCgAKCRCAXGG7T9hj
vtyfAQDTMUqfBlpz9XqFyTBTFRkP3aVtnEeE7BijYec+RXPOxwEAsiXwZPsmW/AN
up+NEHqPvMOcZC8zJZ9THCiBgOxligY=
=F51X
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.19-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
- add dma-buf functionality to Xen grant table handling
- fix for booting the kernel as Xen PVH dom0
- fix for booting the kernel as a Xen PV guest with
CONFIG_DEBUG_VIRTUAL enabled
- other minor performance and style fixes
* tag 'for-linus-4.19-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/balloon: fix balloon initialization for PVH Dom0
xen: don't use privcmd_call() from xen_mc_flush()
xen/pv: Call get_cpu_address_sizes to set x86_virt/phys_bits
xen/biomerge: Use true and false for boolean values
xen/gntdev: don't dereference a null gntdev_dmabuf on allocation failure
xen/spinlock: Don't use pvqspinlock if only 1 vCPU
xen/gntdev: Implement dma-buf import functionality
xen/gntdev: Implement dma-buf export functionality
xen/gntdev: Add initial support for dma-buf UAPI
xen/gntdev: Make private routines/structures accessible
xen/gntdev: Allow mappings for DMA buffers
xen/grant-table: Allow allocating buffers suitable for DMA
xen/balloon: Share common memory reservation routines
xen/grant-table: Make set/clear page private code shared
Commit 5209654a46 (x86/ACPI/cstate: Allow ACPI C1 FFH MWAIT use on
AMD systems) forgot to update the ACPI C1 idle state description and
tools like turbostat display "ACPI FFH INTEL MWAIT 0x0" which is
quite confusing on an AMD system.
Drop the "INTEL" part from the ACPI C1 FFH MWAIT C-state description
to avoid confusion.
Fixes: 5209654a46 (x86/ACPI/cstate: Allow ACPI C1 FFH MWAIT use on AMD systems)
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The function has an inline "return false;" definition with CONFIG_SMP=n
but the "real" definition is also visible leading to "redefinition of
‘apic_id_is_primary_thread’" compiler error.
Guard it with #ifdef CONFIG_SMP
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Fixes: 6a4d2657e0 ("x86/smp: Provide topology_is_primary_thread()")
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- a series from Robin to fix bus imposed dma limits by adding a separate
mask for them to struct device instead of trying to squeeze a second
meaning out of the existing dma mask as we did before. This has ACKs
from the various other subsystems touched
- a small swiotlb cleanup from Kees (acked by Konrad)
- conversion of nios2 and sh to the new generic dma-noncoherent code.
Various other architecture conversions will come through the
architectures maintainers trees.
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAltxLuILHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYPKMA//WY7UGhTrQ9kSjrfCRYJnSgLPjkqwY1wuYdXg1FSw
rUYtQXCem59W48aX9QqHzGiCXf3uVJDBCO7/ikP8R/hHJTC/ih6tTJAtKMBTaDm5
NEVU4lZCzHfzliF3vhj/3+ifWThI0gSo5QQolf4hD5255a95HMLZgJ9zt0IRVgdj
aKfFfKG824LX+G4iuN27jwmvfOJM4KwzaAFID6Rc4Pt1BXy3CmLkED63p6YvTAk/
pCR9LZpYWPccEuA1YfHoDJrB4onTw7zkVHFBtMZ5xW+ypJwRACSpLp/8IjeG841x
/vA3BsDNu8zrRj2cHpD+fViPriMm/qdMew45keII/0jCeCs640wg9Cy7f6yl2QQh
nsCJA8DEKvY8ebaX12XyWC3GNbLNru1XORrMl9+Lqu0kUHbdNxRDg47fWILLdxGc
QvgassFOxb7wg9GwIsoGpOItagl2hHP3j0vMrHlfWaBNefi5ZtWiZs8aJBgm2rJm
kzjmXGzx9IBy5SWSpou5UDCvf4xF7CkTA11YKPyBt2H3hquFKBH6R+6SzAm0n4rn
WcAlqx4BQX1no/p0hPM4b01yPwgd3OmNmITAaYKWTvDXKpImRkPYWIfvLQBx4/PW
9TyVd1QWO1wMbCziVCaHVwpsNAHFTK/kw41OLjK4H1P+HMorGH0crhimfl5BiaJY
XJk=
=WNwP
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-4.19' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- a series from Robin to fix bus imposed dma limits by adding a
separate mask for them to struct device instead of trying to squeeze
a second meaning out of the existing dma mask as we did before.
This has ACKs from the various other subsystems touched
- a small swiotlb cleanup from Kees (acked by Konrad)
- conversion of nios2 and sh to the new generic dma-noncoherent code.
Various other architecture conversions will come through the
architectures maintainers trees.
* tag 'dma-mapping-4.19' of git://git.infradead.org/users/hch/dma-mapping:
sh: use generic dma_noncoherent_ops
sh: split arch/sh/mm/consistent.c
sh: use dma_direct_ops for the CONFIG_DMA_COHERENT case
sh: introduce a sh_cacheop_vaddr helper
sh: simplify get_arch_dma_ops
OF: Don't set default coherent DMA mask
ACPI/IORT: Don't set default coherent DMA mask
iommu/dma: Respect bus DMA limit for IOVAs
of/device: Set bus DMA mask as appropriate
ACPI/IORT: Set bus DMA mask as appropriate
dma-mapping: Generalise dma_32bit_limit flag
ACPI/IORT: Support address size limit for root complexes
of/platform: Initialise default DMA masks
nios2: use generic dma_noncoherent_ops
swiotlb: clean up reporting
dma-mapping: relax warning for per-device areas
Merge L1 Terminal Fault fixes from Thomas Gleixner:
"L1TF, aka L1 Terminal Fault, is yet another speculative hardware
engineering trainwreck. It's a hardware vulnerability which allows
unprivileged speculative access to data which is available in the
Level 1 Data Cache when the page table entry controlling the virtual
address, which is used for the access, has the Present bit cleared or
other reserved bits set.
If an instruction accesses a virtual address for which the relevant
page table entry (PTE) has the Present bit cleared or other reserved
bits set, then speculative execution ignores the invalid PTE and loads
the referenced data if it is present in the Level 1 Data Cache, as if
the page referenced by the address bits in the PTE was still present
and accessible.
While this is a purely speculative mechanism and the instruction will
raise a page fault when it is retired eventually, the pure act of
loading the data and making it available to other speculative
instructions opens up the opportunity for side channel attacks to
unprivileged malicious code, similar to the Meltdown attack.
While Meltdown breaks the user space to kernel space protection, L1TF
allows to attack any physical memory address in the system and the
attack works across all protection domains. It allows an attack of SGX
and also works from inside virtual machines because the speculation
bypasses the extended page table (EPT) protection mechanism.
The assoicated CVEs are: CVE-2018-3615, CVE-2018-3620, CVE-2018-3646
The mitigations provided by this pull request include:
- Host side protection by inverting the upper address bits of a non
present page table entry so the entry points to uncacheable memory.
- Hypervisor protection by flushing L1 Data Cache on VMENTER.
- SMT (HyperThreading) control knobs, which allow to 'turn off' SMT
by offlining the sibling CPU threads. The knobs are available on
the kernel command line and at runtime via sysfs
- Control knobs for the hypervisor mitigation, related to L1D flush
and SMT control. The knobs are available on the kernel command line
and at runtime via sysfs
- Extensive documentation about L1TF including various degrees of
mitigations.
Thanks to all people who have contributed to this in various ways -
patches, review, testing, backporting - and the fruitful, sometimes
heated, but at the end constructive discussions.
There is work in progress to provide other forms of mitigations, which
might be less horrible performance wise for a particular kind of
workloads, but this is not yet ready for consumption due to their
complexity and limitations"
* 'l1tf-final' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits)
x86/microcode: Allow late microcode loading with SMT disabled
tools headers: Synchronise x86 cpufeatures.h for L1TF additions
x86/mm/kmmio: Make the tracer robust against L1TF
x86/mm/pat: Make set_memory_np() L1TF safe
x86/speculation/l1tf: Make pmd/pud_mknotpresent() invert
x86/speculation/l1tf: Invert all not present mappings
cpu/hotplug: Fix SMT supported evaluation
KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry
x86/speculation: Use ARCH_CAPABILITIES to skip L1D flush on vmentry
x86/speculation: Simplify sysfs report of VMX L1TF vulnerability
Documentation/l1tf: Remove Yonah processors from not vulnerable list
x86/KVM/VMX: Don't set l1tf_flush_l1d from vmx_handle_external_intr()
x86/irq: Let interrupt handlers set kvm_cpu_l1tf_flush_l1d
x86: Don't include linux/irq.h from asm/hardirq.h
x86/KVM/VMX: Introduce per-host-cpu analogue of l1tf_flush_l1d
x86/irq: Demote irq_cpustat_t::__softirq_pending to u16
x86/KVM/VMX: Move the l1tf_flush_l1d test to vmx_l1d_flush()
x86/KVM/VMX: Replace 'vmx_l1d_flush_always' with 'vmx_l1d_flush_cond'
x86/KVM/VMX: Don't set l1tf_flush_l1d to true from vmx_l1d_flush()
cpu/hotplug: detect SMT disabled by BIOS
...
Pull x86 timer updates from Thomas Gleixner:
"Early TSC based time stamping to allow better boot time analysis.
This comes with a general cleanup of the TSC calibration code which
grew warts and duct taping over the years and removes 250 lines of
code. Initiated and mostly implemented by Pavel with help from various
folks"
* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
x86/kvmclock: Mark kvm_get_preset_lpj() as __init
x86/tsc: Consolidate init code
sched/clock: Disable interrupts when calling generic_sched_clock_init()
timekeeping: Prevent false warning when persistent clock is not available
sched/clock: Close a hole in sched_clock_init()
x86/tsc: Make use of tsc_calibrate_cpu_early()
x86/tsc: Split native_calibrate_cpu() into early and late parts
sched/clock: Use static key for sched_clock_running
sched/clock: Enable sched clock early
sched/clock: Move sched clock initialization and merge with generic clock
x86/tsc: Use TSC as sched clock early
x86/tsc: Initialize cyc2ns when tsc frequency is determined
x86/tsc: Calibrate tsc only once
ARM/time: Remove read_boot_clock64()
s390/time: Remove read_boot_clock64()
timekeeping: Default boot time offset to local_clock()
timekeeping: Replace read_boot_clock64() with read_persistent_wall_and_boot_offset()
s390/time: Add read_persistent_wall_and_boot_offset()
x86/xen/time: Output xen sched_clock time from 0
x86/xen/time: Initialize pv xen time in init_hypervisor_platform()
...
Pull x86 PTI updates from Thomas Gleixner:
"The Speck brigade sadly provides yet another large set of patches
destroying the perfomance which we carefully built and preserved
- PTI support for 32bit PAE. The missing counter part to the 64bit
PTI code implemented by Joerg.
- A set of fixes for the Global Bit mechanics for non PCID CPUs which
were setting the Global Bit too widely and therefore possibly
exposing interesting memory needlessly.
- Protection against userspace-userspace SpectreRSB
- Support for the upcoming Enhanced IBRS mode, which is preferred
over IBRS. Unfortunately we dont know the performance impact of
this, but it's expected to be less horrible than the IBRS
hammering.
- Cleanups and simplifications"
* 'x86/pti' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
x86/mm/pti: Move user W+X check into pti_finalize()
x86/relocs: Add __end_rodata_aligned to S_REL
x86/mm/pti: Clone kernel-image on PTE level for 32 bit
x86/mm/pti: Don't clear permissions in pti_clone_pmd()
x86/mm/pti: Fix 32 bit PCID check
x86/mm/init: Remove freed kernel image areas from alias mapping
x86/mm/init: Add helper for freeing kernel image pages
x86/mm/init: Pass unconverted symbol addresses to free_init_pages()
mm: Allow non-direct-map arguments to free_reserved_area()
x86/mm/pti: Clear Global bit more aggressively
x86/speculation: Support Enhanced IBRS on future CPUs
x86/speculation: Protect against userspace-userspace spectreRSB
x86/kexec: Allocate 8k PGDs for PTI
Revert "perf/core: Make sure the ring-buffer is mapped in all page-tables"
x86/mm: Remove in_nmi() warning from vmalloc_fault()
x86/entry/32: Check for VM86 mode in slow-path check
perf/core: Make sure the ring-buffer is mapped in all page-tables
x86/pti: Check the return value of pti_user_pagetable_walk_pmd()
x86/pti: Check the return value of pti_user_pagetable_walk_p4d()
x86/entry/32: Add debug code to check entry/exit CR3
...
Pull misc x86 fixes from Thomas Gleixner:
"Two fixes for x86:
- Provide a declaration for native_save_fl() which unbreaks the
wreckage caused by making it 'extern inline'.
- Fix the failing paravirt patching which is supposed to replace
indirect with direct calls. The wreckage is caused by an incorrect
clobber test"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/paravirt: Fix spectre-v2 mitigations for paravirt guests
x86/irqflags: Provide a declaration for native_save_fl
Pull x86 platform updates from Thomas Gleixner:
"Trivial cleanups and improvements"
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/platform/UV: Remove redundant check of p == q
x86/platform/olpc: Use PTR_ERR_OR_ZERO()
x86/platform/UV: Mark memblock related init code and data correctly
Pull x86 cache QoS (RDT/CAR) updates from Thomas Gleixner:
"Add support for pseudo-locked cache regions.
Cache Allocation Technology (CAT) allows on certain CPUs to isolate a
region of cache and 'lock' it. Cache pseudo-locking builds on the fact
that a CPU can still read and write data pre-allocated outside its
current allocated area on cache hit. With cache pseudo-locking data
can be preloaded into a reserved portion of cache that no application
can fill, and from that point on will only serve cache hits. The cache
pseudo-locked memory is made accessible to user space where an
application can map it into its virtual address space and thus have a
region of memory with reduced average read latency.
The locking is not perfect and gets totally screwed by WBINDV and
similar mechanisms, but it provides a reasonable enhancement for
certain types of latency sensitive applications.
The implementation extends the current CAT mechanism and provides a
generally useful exclusive CAT mode on which it builds the extra
pseude-locked regions"
* 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
x86/intel_rdt: Disable PMU access
x86/intel_rdt: Fix possible circular lock dependency
x86/intel_rdt: Make CPU information accessible for pseudo-locked regions
x86/intel_rdt: Support restoration of subset of permissions
x86/intel_rdt: Fix cleanup of plr structure on error
x86/intel_rdt: Move pseudo_lock_region_clear()
x86/intel_rdt: Limit C-states dynamically when pseudo-locking active
x86/intel_rdt: Support L3 cache performance event of Broadwell
x86/intel_rdt: More precise L2 hit/miss measurements
x86/intel_rdt: Create character device exposing pseudo-locked region
x86/intel_rdt: Create debugfs files for pseudo-locking testing
x86/intel_rdt: Create resctrl debug area
x86/intel_rdt: Ensure RDT cleanup on exit
x86/intel_rdt: Resctrl files reflect pseudo-locked information
x86/intel_rdt: Support creation/removal of pseudo-locked region
x86/intel_rdt: Pseudo-lock region creation/removal core
x86/intel_rdt: Discover supported platforms via prefetch disable bits
x86/intel_rdt: Add utilities to test pseudo-locked region possibility
x86/intel_rdt: Split resource group removal in two
x86/intel_rdt: Enable entering of pseudo-locksetup mode
...
Pull x86 dump printing cleanup from Thomas Gleixner:
"Clean up the show_opcodes() printout so nested dumps can be properly
differentiated"
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Avoid pr_cont() in show_opcodes()
Pull x86 cpu updates from Thomas Gleixner:
"Two small updates for the CPU code:
- Improve NUMA emulation
- Add the EPT_AD CPU feature bit"
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpufeatures: Add EPT_AD feature bit
x86/numa_emulation: Introduce uniform split capability
x86/numa_emulation: Fix emulated-to-physical node mapping
Pull x86 cleanups from Thomas Gleixner:
"Trival cleanups"
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/iommu: Use NULL instead of 0
x86/platform/pcspeaker: Use PTR_ERR_OR_ZERO() to fix ptr_ret.cocci warning
Pull x86 asm updates from Thomas Gleixner:
"The lowlevel and ASM code updates for x86:
- Make stack trace unwinding more reliable
- ASM instruction updates for better code generation
- Various cleanups"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/entry/64: Add two more instruction suffixes
x86/asm/64: Use 32-bit XOR to zero registers
x86/build/vdso: Simplify 'cmd_vdso2c'
x86/build/vdso: Remove unused vdso-syms.lds
x86/stacktrace: Enable HAVE_RELIABLE_STACKTRACE for the ORC unwinder
x86/unwind/orc: Detect the end of the stack
x86/stacktrace: Do not fail for ORC with regs on stack
x86/stacktrace: Clarify the reliable success paths
x86/stacktrace: Remove STACKTRACE_DUMP_ONCE
x86/stacktrace: Do not unwind after user regs
x86/asm: Use CC_SET/CC_OUT in percpu_cmpxchg8b_double() to micro-optimize code generation
Pull x86 apic update from Thomas Gleixner:
"Trivial cleanups of the APIC related code"
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Trivial coding style fixes
x86/vector: Merge allocate_vector() into assign_vector_locked()
Pull perf update from Thomas Gleixner:
"The perf crowd presents:
Kernel updates:
- Removal of jprobes
- Cleanup and consolidatation the handling of kprobes
- Cleanup and consolidation of hardware breakpoints
- The usual pile of fixes and updates to PMUs and event descriptors
Tooling updates:
- Updates and improvements all over the place. Nothing outstanding,
just the (good) boring incremental grump work"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits)
perf trace: Do not require --no-syscalls to suppress strace like output
perf bpf: Include uapi/linux/bpf.h from the 'perf trace' script's bpf.h
perf tools: Allow overriding MAX_NR_CPUS at compile time
perf bpf: Show better message when failing to load an object
perf list: Unify metric group description format with PMU event description
perf vendor events arm64: Update ThunderX2 implementation defined pmu core events
perf cs-etm: Generate branch sample for CS_ETM_TRACE_ON packet
perf cs-etm: Generate branch sample when receiving a CS_ETM_TRACE_ON packet
perf cs-etm: Support dummy address value for CS_ETM_TRACE_ON packet
perf cs-etm: Fix start tracing packet handling
perf build: Fix installation directory for eBPF
perf c2c report: Fix crash for empty browser
perf tests: Fix indexing when invoking subtests
perf trace: Beautify the AF_INET & AF_INET6 'socket' syscall 'protocol' args
perf trace beauty: Add beautifiers for 'socket''s 'protocol' arg
perf trace beauty: Do not print NULL strarray entries
perf beauty: Add a generator for IPPROTO_ socket's protocol constants
tools include uapi: Grab a copy of linux/in.h
perf tests: Fix complex event name parsing
perf evlist: Fix error out while applying initial delay and LBR
...
Pull scheduler updates from Thomas Gleixner:
- Cleanup and improvement of NUMA balancing
- Refactoring and improvements to the PELT (Per Entity Load Tracking)
code
- Watchdog simplification and related cleanups
- The usual pile of small incremental fixes and improvements
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
watchdog: Reduce message verbosity
stop_machine: Reflow cpu_stop_queue_two_works()
sched/numa: Move task_numa_placement() closer to numa_migrate_preferred()
sched/numa: Use group_weights to identify if migration degrades locality
sched/numa: Update the scan period without holding the numa_group lock
sched/numa: Remove numa_has_capacity()
sched/numa: Modify migrate_swap() to accept additional parameters
sched/numa: Remove unused task_capacity from 'struct numa_stats'
sched/numa: Skip nodes that are at 'hoplimit'
sched/debug: Reverse the order of printing faults
sched/numa: Use task faults only if numa_group is not yet set up
sched/numa: Set preferred_node based on best_cpu
sched/numa: Simplify load_too_imbalanced()
sched/numa: Evaluate move once per node
sched/numa: Remove redundant field
sched/debug: Show the sum wait time of a task group
sched/fair: Remove #ifdefs from scale_rt_capacity()
sched/core: Remove get_cpu() from sched_fork()
sched/cpufreq: Clarify sugov_get_util()
sched/sysctl: Remove unused sched_time_avg_ms sysctl
...
Pull x86 RAS updates from Thomas Gleixner:
"A small set of changes to the RAS core:
- Rework of the MCE bank scanning code
- Y2038 converion"
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Cleanup __mc_scan_banks()
x86/mce: Carve out bank scanning code
x86/mce: Remove !banks check
x86/mce: Carve out the crashing_cpu check
x86/mce: Always use 64-bit timestamps
The kernel unnecessarily prevents late microcode loading when SMT is
disabled. It should be safe to allow it if all the primary threads are
online.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Nadav reported that on guests we're failing to rewrite the indirect
calls to CALLEE_SAVE paravirt functions. In particular the
pv_queued_spin_unlock() call is left unpatched and that is all over the
place. This obviously wrecks Spectre-v2 mitigation (for paravirt
guests) which relies on not actually having indirect calls around.
The reason is an incorrect clobber test in paravirt_patch_call(); this
function rewrites an indirect call with a direct call to the _SAME_
function, there is no possible way the clobbers can be different
because of this.
Therefore remove this clobber check. Also put WARNs on the other patch
failure case (not enough room for the instruction) which I've not seen
trigger in my (limited) testing.
Three live kernel image disassemblies for lock_sock_nested (as a small
function that illustrates the problem nicely). PRE is the current
situation for guests, POST is with this patch applied and NATIVE is with
or without the patch for !guests.
PRE:
(gdb) disassemble lock_sock_nested
Dump of assembler code for function lock_sock_nested:
0xffffffff817be970 <+0>: push %rbp
0xffffffff817be971 <+1>: mov %rdi,%rbp
0xffffffff817be974 <+4>: push %rbx
0xffffffff817be975 <+5>: lea 0x88(%rbp),%rbx
0xffffffff817be97c <+12>: callq 0xffffffff819f7160 <_cond_resched>
0xffffffff817be981 <+17>: mov %rbx,%rdi
0xffffffff817be984 <+20>: callq 0xffffffff819fbb00 <_raw_spin_lock_bh>
0xffffffff817be989 <+25>: mov 0x8c(%rbp),%eax
0xffffffff817be98f <+31>: test %eax,%eax
0xffffffff817be991 <+33>: jne 0xffffffff817be9ba <lock_sock_nested+74>
0xffffffff817be993 <+35>: movl $0x1,0x8c(%rbp)
0xffffffff817be99d <+45>: mov %rbx,%rdi
0xffffffff817be9a0 <+48>: callq *0xffffffff822299e8
0xffffffff817be9a7 <+55>: pop %rbx
0xffffffff817be9a8 <+56>: pop %rbp
0xffffffff817be9a9 <+57>: mov $0x200,%esi
0xffffffff817be9ae <+62>: mov $0xffffffff817be993,%rdi
0xffffffff817be9b5 <+69>: jmpq 0xffffffff81063ae0 <__local_bh_enable_ip>
0xffffffff817be9ba <+74>: mov %rbp,%rdi
0xffffffff817be9bd <+77>: callq 0xffffffff817be8c0 <__lock_sock>
0xffffffff817be9c2 <+82>: jmp 0xffffffff817be993 <lock_sock_nested+35>
End of assembler dump.
POST:
(gdb) disassemble lock_sock_nested
Dump of assembler code for function lock_sock_nested:
0xffffffff817be970 <+0>: push %rbp
0xffffffff817be971 <+1>: mov %rdi,%rbp
0xffffffff817be974 <+4>: push %rbx
0xffffffff817be975 <+5>: lea 0x88(%rbp),%rbx
0xffffffff817be97c <+12>: callq 0xffffffff819f7160 <_cond_resched>
0xffffffff817be981 <+17>: mov %rbx,%rdi
0xffffffff817be984 <+20>: callq 0xffffffff819fbb00 <_raw_spin_lock_bh>
0xffffffff817be989 <+25>: mov 0x8c(%rbp),%eax
0xffffffff817be98f <+31>: test %eax,%eax
0xffffffff817be991 <+33>: jne 0xffffffff817be9ba <lock_sock_nested+74>
0xffffffff817be993 <+35>: movl $0x1,0x8c(%rbp)
0xffffffff817be99d <+45>: mov %rbx,%rdi
0xffffffff817be9a0 <+48>: callq 0xffffffff810a0c20 <__raw_callee_save___pv_queued_spin_unlock>
0xffffffff817be9a5 <+53>: xchg %ax,%ax
0xffffffff817be9a7 <+55>: pop %rbx
0xffffffff817be9a8 <+56>: pop %rbp
0xffffffff817be9a9 <+57>: mov $0x200,%esi
0xffffffff817be9ae <+62>: mov $0xffffffff817be993,%rdi
0xffffffff817be9b5 <+69>: jmpq 0xffffffff81063aa0 <__local_bh_enable_ip>
0xffffffff817be9ba <+74>: mov %rbp,%rdi
0xffffffff817be9bd <+77>: callq 0xffffffff817be8c0 <__lock_sock>
0xffffffff817be9c2 <+82>: jmp 0xffffffff817be993 <lock_sock_nested+35>
End of assembler dump.
NATIVE:
(gdb) disassemble lock_sock_nested
Dump of assembler code for function lock_sock_nested:
0xffffffff817be970 <+0>: push %rbp
0xffffffff817be971 <+1>: mov %rdi,%rbp
0xffffffff817be974 <+4>: push %rbx
0xffffffff817be975 <+5>: lea 0x88(%rbp),%rbx
0xffffffff817be97c <+12>: callq 0xffffffff819f7160 <_cond_resched>
0xffffffff817be981 <+17>: mov %rbx,%rdi
0xffffffff817be984 <+20>: callq 0xffffffff819fbb00 <_raw_spin_lock_bh>
0xffffffff817be989 <+25>: mov 0x8c(%rbp),%eax
0xffffffff817be98f <+31>: test %eax,%eax
0xffffffff817be991 <+33>: jne 0xffffffff817be9ba <lock_sock_nested+74>
0xffffffff817be993 <+35>: movl $0x1,0x8c(%rbp)
0xffffffff817be99d <+45>: mov %rbx,%rdi
0xffffffff817be9a0 <+48>: movb $0x0,(%rdi)
0xffffffff817be9a3 <+51>: nopl 0x0(%rax)
0xffffffff817be9a7 <+55>: pop %rbx
0xffffffff817be9a8 <+56>: pop %rbp
0xffffffff817be9a9 <+57>: mov $0x200,%esi
0xffffffff817be9ae <+62>: mov $0xffffffff817be993,%rdi
0xffffffff817be9b5 <+69>: jmpq 0xffffffff81063ae0 <__local_bh_enable_ip>
0xffffffff817be9ba <+74>: mov %rbp,%rdi
0xffffffff817be9bd <+77>: callq 0xffffffff817be8c0 <__lock_sock>
0xffffffff817be9c2 <+82>: jmp 0xffffffff817be993 <lock_sock_nested+35>
End of assembler dump.
Fixes: 63f70270cc ("[PATCH] i386: PARAVIRT: add common patching machinery")
Fixes: 3010a0663f ("x86/paravirt, objtool: Annotate indirect calls")
Reported-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: stable@vger.kernel.org
Josh reported that the late SMT evaluation in cpu_smt_state_init() sets
cpu_smt_control to CPU_SMT_NOT_SUPPORTED in case that 'nosmt' was supplied
on the kernel command line as it cannot differentiate between SMT disabled
by BIOS and SMT soft disable via 'nosmt'. That wreckages the state and
makes the sysfs interface unusable.
Rework this so that during bringup of the non boot CPUs the availability of
SMT is determined in cpu_smt_allowed(). If a newly booted CPU is not a
'primary' thread then set the local cpu_smt_available marker and evaluate
this explicitely right after the initial SMP bringup has finished.
SMT evaulation on x86 is a trainwreck as the firmware has all the
information _before_ booting the kernel, but there is no interface to query
it.
Fixes: 73d5e2b472 ("cpu/hotplug: detect SMT disabled by BIOS")
Reported-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Commit d94a155c59 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits
adjustment corruption") has moved the query and calculation of the
x86_virt_bits and x86_phys_bits fields of the cpuinfo_x86 struct
from the get_cpu_cap function to a new function named
get_cpu_address_sizes.
One of the call sites related to Xen PV VMs was unfortunately missed
in the aforementioned commit. This prevents successful boot-up of
kernel versions 4.17 and up in Xen PV VMs if CONFIG_DEBUG_VIRTUAL
is enabled, due to the following code path:
enlighten_pv.c::xen_start_kernel
mmu_pv.c::xen_reserve_special_pages
page.h::__pa
physaddr.c::__phys_addr
physaddr.h::phys_addr_valid
phys_addr_valid uses boot_cpu_data.x86_phys_bits to validate physical
addresses. boot_cpu_data.x86_phys_bits is no longer populated before
the call to xen_reserve_special_pages due to the aforementioned commit
though, so the validation performed by phys_addr_valid fails, which
causes __phys_addr to trigger a BUG, preventing boot-up.
Signed-off-by: M. Vefa Bicakci <m.v.b@runbox.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: xen-devel@lists.xenproject.org
Cc: x86@kernel.org
Cc: stable@vger.kernel.org # for v4.17 and up
Fixes: d94a155c59 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption")
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Implement paravirtual apic hooks to enable PV IPIs for KVM if the "send IPI"
hypercall is available. The hypercall lets a guest send IPIs, with
at most 128 destinations per hypercall in 64-bit mode and 64 vCPUs per
hypercall in 32-bit mode.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add kvm hypervisor init time platform setup callback which
will be used to replace native apic hooks by pararvirtual
hooks.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>