Pull x86 PTI and Spectre related fixes and updates from Ingo Molnar:
"Here's the latest set of Spectre and PTI related fixes and updates:
Spectre:
- Add entry code register clearing to reduce the Spectre attack
surface
- Update the Spectre microcode blacklist
- Inline the KVM Spectre helpers to get close to v4.14 performance
again.
- Fix indirect_branch_prediction_barrier()
- Fix/improve Spectre related kernel messages
- Fix array_index_nospec_mask() asm constraint
- KVM: fix two MSR handling bugs
PTI:
- Fix a paranoid entry PTI CR3 handling bug
- Fix comments
objtool:
- Fix paranoid_entry() frame pointer warning
- Annotate WARN()-related UD2 as reachable
- Various fixes
- Add Add Peter Zijlstra as objtool co-maintainer
Misc:
- Various x86 entry code self-test fixes
- Improve/simplify entry code stack frame generation and handling
after recent heavy-handed PTI and Spectre changes. (There's two
more WIP improvements expected here.)
- Type fix for cache entries
There's also some low risk non-fix changes I've included in this
branch to reduce backporting conflicts:
- rename a confusing x86_cpu field name
- de-obfuscate the naming of single-TLB flushing primitives"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
x86/entry/64: Fix CR3 restore in paranoid_exit()
x86/cpu: Change type of x86_cache_size variable to unsigned int
x86/spectre: Fix an error message
x86/cpu: Rename cpu_data.x86_mask to cpu_data.x86_stepping
selftests/x86/mpx: Fix incorrect bounds with old _sigfault
x86/mm: Rename flush_tlb_single() and flush_tlb_one() to __flush_tlb_one_[user|kernel]()
x86/speculation: Add <asm/msr-index.h> dependency
nospec: Move array_index_nospec() parameter checking into separate macro
x86/speculation: Fix up array_index_nospec_mask() asm constraint
x86/debug: Use UD2 for WARN()
x86/debug, objtool: Annotate WARN()-related UD2 as reachable
objtool: Fix segfault in ignore_unreachable_insn()
selftests/x86: Disable tests requiring 32-bit support on pure 64-bit systems
selftests/x86: Do not rely on "int $0x80" in single_step_syscall.c
selftests/x86: Do not rely on "int $0x80" in test_mremap_vdso.c
selftests/x86: Fix build bug caused by the 5lvl test which has been moved to the VM directory
selftests/x86/pkeys: Remove unused functions
selftests/x86: Clean up and document sscanf() usage
selftests/x86: Fix vDSO selftest segfault for vsyscall=none
x86/entry/64: Remove the unused 'icebp' macro
...
flush_tlb_single() and flush_tlb_one() sound almost identical, but
they really mean "flush one user translation" and "flush one kernel
translation". Rename them to flush_tlb_one_user() and
flush_tlb_one_kernel() to make the semantics more obvious.
[ I was looking at some PTI-related code, and the flush-one-address code
is unnecessarily hard to understand because the names of the helpers are
uninformative. This came up during PTI review, but no one got around to
doing it. ]
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux-MM <linux-mm@kvack.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/3303b02e3c3d049dc5235d5651e0ae6d29a34354.1517414378.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Replaces hard-coded node ID shift for the descriptor base MMR to fix
initialization on UV4A while maintaining support for previous architectures.
Signed-off-by: Andrew Banman <abanman@hpe.com>
Acked-by: Mike Travis <mike.travis@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dimitri Sivanich <sivanich@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1515440592-44060-1-git-send-email-abanman@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since uv_flush_tlb_others() implements flush_tlb_others() which is
about flushing user mappings, we should use __flush_tlb_single(),
which too is about flushing user mappings.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andrew Banman <abanman@hpe.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Travis <mike.travis@hpe.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mark a couple of structures and functions as 'static', pointed out by Sparse:
warning: symbol 'bts_pmu' was not declared. Should it be static?
warning: symbol 'p4_event_aliases' was not declared. Should it be static?
warning: symbol 'rapl_attr_groups' was not declared. Should it be static?
symbol 'process_uv2_message' was not declared. Should it be static?
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Andrew Banman <abanman@hpe.com> # for the UV change
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: kernel-janitors@vger.kernel.org
Link: http://lkml.kernel.org/r/20170810155709.7094-1-colin.king@canonical.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The BAU confers no benefit to a UV system running with only one hub/socket.
Permanently disable the BAU driver if there are less than two hubs online
to avoid BAU overhead. We have observed failed boots on single-socket UV4
systems caused by BAU that are avoided with this patch.
Also, while at it, consolidate initialization error blocks and fix a
memory leak.
Signed-off-by: Andrew Banman <abanman@hpe.com>
Acked-by: Russ Anderson <rja@hpe.com>
Acked-by: Mike Travis <mike.travis@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: tony.ernst@hpe.com
Link: http://lkml.kernel.org/r/1500588351-78016-1-git-send-email-abanman@hpe.com
[ Minor cleanups. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Bug fix for the BAU tunable congested_cycles not being set to the user
defined value. Instead of referencing a global variable when deciding
on BAU shutdown, a node will reference its own tunable set
value ( cong_response_us). This results in the user set
tunable value congested_response_us taking effect correctly.
Signed-off-by: Justin Ernst <justin.ernst@hpe.com>
Acked-by: Andrew Banman <abanman@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: mike.travis@hpe.com
Cc: sivanich@hpe.com
Link: http://lkml.kernel.org/r/1499970803-282432-1-git-send-email-justin.ernst@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The functions handle_uv2_busy, uv_flush_send_and_wait and
find_another_by_swack are local to the source, so make them static.
Also remove normal_busy as it is no longer used.
Fixes various smatch warnings, such as:
"symbol 'find_another_by_swack' was not declared. Should it be static?"
"symbol 'handle_uv2_busy' was not declared. Should it be static?"
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Travis <mike.travis@hpe.com>
Cc: Andrew Banman <abanman@hpe.com>
Cc: kernel-janitors@vger.kernel.org
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Link: http://lkml.kernel.org/r/20170704083129.10559-1-colin.king@canonical.com
Pull x86 mm updates from Ingo Molnar:
"The main changes in this cycle were:
- Continued work to add support for 5-level paging provided by future
Intel CPUs. In particular we switch the x86 GUP code to the generic
implementation. (Kirill A. Shutemov)
- Continued work to add PCID CPU support to native kernels as well.
In this round most of the focus is on reworking/refreshing the TLB
flush infrastructure for the upcoming PCID changes. (Andy
Lutomirski)"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
x86/mm: Delete a big outdated comment about TLB flushing
x86/mm: Don't reenter flush_tlb_func_common()
x86/KASLR: Fix detection 32/64 bit bootloaders for 5-level paging
x86/ftrace: Exclude functions in head64.c from function-tracing
x86/mmap, ASLR: Do not treat unlimited-stack tasks as legacy mmap
x86/mm: Remove reset_lazy_tlbstate()
x86/ldt: Simplify the LDT switching logic
x86/boot/64: Put __startup_64() into .head.text
x86/mm: Add support for 5-level paging for KASLR
x86/mm: Make kernel_physical_mapping_init() support 5-level paging
x86/mm: Add sync_global_pgds() for configuration with 5-level paging
x86/boot/64: Add support of additional page table level during early boot
x86/boot/64: Rename init_level4_pgt and early_level4_pgt
x86/boot/64: Rewrite startup_64() in C
x86/boot/compressed: Enable 5-level paging during decompression stage
x86/boot/efi: Define __KERNEL32_CS GDT on 64-bit configurations
x86/boot/efi: Fix __KERNEL_CS definition of GDT entry on 64-bit configurations
x86/boot/efi: Cleanup initialization of GDT entries
x86/asm: Fix comment in return_from_SYSCALL_64()
x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation
...
Rather than passing all the contents of flush_tlb_info to
flush_tlb_others(), pass a pointer to the structure directly. For
consistency, this also removes the unnecessary cpu parameter from
uv_flush_tlb_others() to make its signature match the other
*flush_tlb_others() functions.
This serves two purposes:
- It will dramatically simplify future patches that change struct
flush_tlb_info, which I'm planning to do.
- struct flush_tlb_info is an adequate description of what to do
for a local flush, too, so by reusing it we can remove duplicated
code between local and remove flushes in a future patch.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
[ Fix build warning. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Replace the custom multi-value scheme with the more regular
seqcount_latch() scheme. Along with scrapping a lot of lines, the latch
scheme is better documented and used in more places.
The immediate benefit however is not being limited on the update side.
The current code has a limit where the writers block which is hit by
future changes.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
UV4 does not employ a software-timeout as in previous generations so a new
wait_completion routine without this logic is required. Certain completion
statuses require the AUX status bit in addition to ERROR and BUSY.
Add the read_status routine to construct the full completion status. Use
read_status in the uv4_wait_completion routine to handle all possible
completion statuses.
Signed-off-by: Andrew Banman <abanman@hpe.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Mike Travis <mike.travis@hpe.com>
Cc: sivanich@hpe.com
Cc: rja@hpe.com
Cc: akpm@linux-foundation.org
Link: http://lkml.kernel.org/r/1489077734-111753-7-git-send-email-abanman@hpe.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Remove the present wait_completion routine and add a function pointer by
the same name to the bau_operations struct. Rather than switching on the
UV hub version during message processing, set the architecture-specific
uv*_wait_completion during initialization.
The uv123_bau_ops struct must be split into uv1 and uv2_3 versions to
accommodate the corresponding wait_completion routines.
Signed-off-by: Andrew Banman <abanman@hpe.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Mike Travis <mike.travis@hpe.com>
Cc: sivanich@hpe.com
Cc: rja@hpe.com
Cc: akpm@linux-foundation.org
Link: http://lkml.kernel.org/r/1489077734-111753-6-git-send-email-abanman@hpe.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The location of the ERROR and BUSY status bits depends on the descriptor
index, i.e. the CPU, of the message. Since this index does not change,
there is no need to calculate the mmr and index location during message
processing. The less work we do in the hot path the better.
Add status_mmr and status_index fields to bau_control and compute their
values during initialization. Add kerneldoc descriptions for the new
fields. Update uv*_wait_completion to use these fields rather than
receiving the information as parameters.
Signed-off-by: Andrew Banman <abanman@hpe.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Mike Travis <mike.travis@hpe.com>
Cc: sivanich@hpe.com
Cc: rja@hpe.com
Cc: akpm@linux-foundation.org
Link: http://lkml.kernel.org/r/1489077734-111753-5-git-send-email-abanman@hpe.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Move the bau_operations declaration after bau struct declarations so the
bau structs can be referenced when adding new functions to
bau_operations. That way we avoid forward declarations of the bau
structs.
Likewise, move uv*_bau_ops structs down to avoid forward declarations of
new functions defined in the same file. Declare these structs __initconst
since they are only used during initialization. Similarly, declare the
bau_operations ops instance __ro_after_init as it is read-only after
initialization.
This is a preparatory patch for adding wait_completion to bau_operations.
Signed-off-by: Andrew Banman <abanman@hpe.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Mike Travis <mike.travis@hpe.com>
Cc: sivanich@hpe.com
Cc: rja@hpe.com
Cc: akpm@linux-foundation.org
Link: http://lkml.kernel.org/r/1489077734-111753-4-git-send-email-abanman@hpe.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
On UV4, the destination agent verifies each message by checking the
descriptor qualifier field of the message payload. Messages without this
field set to 0x534749 will cause a hub error to assert. Split
bau_message_payload into uv1_2_3 and uv4 versions to account for the
different payload formats.
Enforce the size of each field by using the appropriate u** integer type.
Replace extraneous comments with KernelDoc comment.
Signed-off-by: Andrew Banman <abanman@hpe.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Mike Travis <mike.travis@hpe.com>
Cc: sivanich@hpe.com
Cc: rja@hpe.com
Cc: akpm@linux-foundation.org
Link: http://lkml.kernel.org/r/1489077734-111753-3-git-send-email-abanman@hpe.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Writing to the software acknowledge clear register when there are no
pending messages causes a HUB error to assert. The original intent of this
write was to clear the pending bits before start of operation, but this is
an incorrect method and has been determined to be unnecessary.
Signed-off-by: Andrew Banman <abanman@hpe.com>
Acked-by: Mike Travis <mike.travis@hpe.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: rja@hpe.com
Cc: sivanich@hpe.com
Link: http://lkml.kernel.org/r/1487351269-181133-1-git-send-email-abanman@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
One include less is always a good thing(tm). Good riddance.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20161209182912.2726-6-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull x86 platform changes from Ingo Molnar:
"The main changes in this cycle were:
- SGI UV updates (Andrew Banman)
- Intel MID updates (Andy Shevchenko)
- Initial Mellanox systems platform (Vadim Pasternak)"
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/platform/mellanox: Fix return value check in mlxplat_init()
x86/platform/mellanox: Introduce support for Mellanox systems platform
x86/platform/uv/BAU: Add UV4-specific functions
x86/platform/uv/BAU: Fix payload queue setup on UV4 hardware
x86/platform/uv/BAU: Disable software timeout on UV4 hardware
x86/platform/uv/BAU: Populate ->uvhub_version with UV4 version information
x86/platform/uv/BAU: Use generic function pointers
x86/platform/uv/BAU: Add generic function pointers
x86/platform/uv/BAU: Convert uv_physnodeaddr() use to uv_gpa_to_offset()
x86/platform/uv/BAU: Clean up pq_init()
x86/platform/uv/BAU: Clean up and update printks
x86/platform/uv/BAU: Clean up vertical alignment
x86/platform/intel-mid: Keep SRAM powered on at boot
x86/platform/intel-mid: Add Intel Penwell to ID table
x86/cpu: Rename Merrifield2 to Moorefield
x86/platform/intel-mid: Implement power off sequence
x86/platform/intel-mid: Enable SD card detection on Merrifield
x86/platform/intel-mid: Enable WiFi on Intel Edison
x86/platform/intel-mid: Run PWRMU command immediately
Add the UV4-specific function definitions and define an operations struct
to implement them in the BAU driver.
Many BAU MMRs, although functionally the same, have new addresses on UV4
due to hardware changes. Each MMR requires new read/write functions, but
their implementation in the driver does not change. Thus, it is enough to
enumerate them in the operations struct for the changes to take effect.
Signed-off-by: Andrew Banman <abanman@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mike Travis <travis@sgi.com>
Acked-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: akpm@linux-foundation.org
Cc: rja@sgi.com
Link: http://lkml.kernel.org/r/1474474161-265604-11-git-send-email-abanman@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The BAU on UV4 does not need to maintain the payload queue tail pointer. Do
not initialize the tail pointer MMR on UV4.
Note that write_payload_tail is not an abstracted BAU function since it is
an operation specific to pre-UV4 versions. Then we must switch on the UV
version to control its usage, for which we use uvhub_version rather than
is_uv*_hub because it is quicker/more concise.
Signed-off-by: Andrew Banman <abanman@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mike Travis <travis@sgi.com>
Acked-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: akpm@linux-foundation.org
Cc: rja@sgi.com
Link: http://lkml.kernel.org/r/1474474161-265604-10-git-send-email-abanman@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Software timeouts are not currently supported on BAU for UV4. Instead, the
BAU will rely on hardware-level fairness protocols to determine broadcast
timeouts.
Do not call enable_timeouts or calculate_destination_timeout on UV4. These
functions write to pre-UV4 MMRs so they generate error messages on UV4.
Signed-off-by: Andrew Banman <abanman@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mike Travis <travis@sgi.com>
Acked-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: akpm@linux-foundation.org
Cc: rja@sgi.com
Link: http://lkml.kernel.org/r/1474474161-265604-9-git-send-email-abanman@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Convert the use of UV version-specific functions to their abstracted
counterparts.
Signed-off-by: Andrew Banman <abanman@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mike Travis <travis@sgi.com>
Acked-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: akpm@linux-foundation.org
Cc: rja@sgi.com
Link: http://lkml.kernel.org/r/1474474161-265604-7-git-send-email-abanman@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Many BAU functions have different implementations depending on the UV
version. Rather than switching on the uvhub_version throughout the driver,
we can define a set of operations for each version. This is especially
beneficial for UV4, which will require many new MMR read/write functions.
Currently, the set of abstracted functions are the same for UV1, UV2, and
UV3. The functions were chosen because each one will have a different
implementation for UV4. Other functions will be added as needed to handle
new implementations or to cleanup the existing differences between UV1,
UV2, and UV3, i.e. read_status and wait_completion.
Signed-off-by: Andrew Banman <abanman@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mike Travis <travis@sgi.com>
Acked-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: akpm@linux-foundation.org
Cc: rja@sgi.com
Link: http://lkml.kernel.org/r/1474474161-265604-6-git-send-email-abanman@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The BAU driver should use the functions provided by uv_hub.h rather than
its own implementations. uv_physnodeaddr converts vaddrs to paddrs for
BAU MMR fields, but this is done better by uv_gpa_to_offset.
Signed-off-by: Andrew Banman <abanman@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mike Travis <travis@sgi.com>
Acked-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: akpm@linux-foundation.org
Cc: rja@sgi.com
Link: http://lkml.kernel.org/r/1474474161-265604-5-git-send-email-abanman@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The payload queue first MMR requires the physical memory address and hub
GNODE of where the payload queue resides in memory, but the associated
variables are named as if the PNODE were used. Rename gnode-related
variables and clarify the definitions of the payload queue head, last, and
tail pointers.
Signed-off-by: Andrew Banman <abanman@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mike Travis <travis@sgi.com>
Acked-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: akpm@linux-foundation.org
Cc: rja@sgi.com
Link: http://lkml.kernel.org/r/1474474161-265604-4-git-send-email-abanman@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Move references to blade local processor ID to the new per cpu info
structs. Create an access function that makes this move, and other
potential moves opaque to callers of this function. Define a flag
that indicates to callers in external GPL modules that this function
replaces any local definition. This allows calling source code to be
built for both pre-UV4 kernels as well as post-UV4 kernels.
Tested-by: John Estabrook <estabrook@sgi.com>
Tested-by: Gary Kroening <gfk@sgi.com>
Tested-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160429215404.644173122@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
For several years, the common practice has been to boot UVs with the
"nobau" parameter on the command line, to disable the BAU. We've
decided that it makes more sense to just disable the BAU by default in
the kernel, and provide the option to turn it on, if desired.
For now, having the on/off switch doesn't buy us any more than just
reversing the logic would, but we're working towards having the BAU
enabled by default on UV4. When those changes are in place, having the
on/off switch will make more sense than an enable flag, since the
default behavior will be different depending on the system version.
I've also added a bit of documentation for the new parameter to
Documentation/kernel-parameters.txt.
Signed-off-by: Alex Thorlton <athorlton@sgi.com>
Reviewed-by: Hedi Berriche <hedi@sgi.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459451909-121845-1-git-send-email-athorlton@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 platform changes from Ingo Molnar:
"A handful of numachip APIC driver updates/fixes, and two small SGI/UV
fixes"
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: numachip: APIC driver cleanups
x86: numachip: Elide self-IPI ICR polling
x86: numachip: Fix 16-bit APIC ID truncation
* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: UV BAU: Increase maximum CPUs per socket/hub
x86: UV BAU: Avoid NULL pointer reference in ptc_seq_show
In init_per_cpu(), when get_cpu_topology() fails, init_per_cpu_tunables()
is not called afterwards. This means that bau_control->statp is NULL.
If a user then reads /proc/sgi_uv/ptc_statistics ptc_seq_show() references
a NULL pointer. Therefore, since uv_bau_init calls set_bau_off when
init_per_cpu() fails, we add code that detects when the bau is off in
ptc_seq_show() to avoid referencing a NULL pointer.
Signed-off-by: James Custer <jcuster@sgi.com>
Cc: Russ Anderson <rja@sgi.com>
Link: http://lkml.kernel.org/r/1414952199-185319-2-git-send-email-jcuster@sgi.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Update of TLB shootdown code for UV3.
Kernel function native_flush_tlb_others() calls
uv_flush_tlb_others() on UV to invalidate tlb page definitions
on remote cpus. The UV systems have a hardware 'broadcast assist
unit' which can be used to broadcast shootdown messages to all
cpu's of selected nodes.
The behavior of the BAU has changed only slightly with UV3:
- UV3 is recognized with is_uv3_hub().
- UV2 functions and structures (uv2_xxx) are in most cases
simply renamed to uv2_3_xxx.
- Some UV2 error workarounds are not needed for UV3.
(see uv_bau_message_interrupt and enable_timeouts)
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Link: http://lkml.kernel.org/r/E1WkgWh-0001yJ-3K@eag09.americas.sgi.com
[ Removed a few linebreak uglies. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Use a ring-buffer like multi-version object structure which allows
always having a coherent object; we use this to avoid having to
disable IRQs while reading sched_clock() and avoids a problem when
getting an NMI while changing the cyc2ns data.
MAINLINE PRE POST
sched_clock_stable: 1 1 1
(cold) sched_clock: 329841 331312 257223
(cold) local_clock: 301773 310296 309889
(warm) sched_clock: 38375 38247 25280
(warm) local_clock: 100371 102713 85268
(warm) rdtsc: 27340 27289 24247
sched_clock_stable: 0 0 0
(cold) sched_clock: 382634 372706 301224
(cold) local_clock: 396890 399275 399870
(warm) sched_clock: 38194 38124 25630
(warm) local_clock: 143452 148698 129629
(warm) rdtsc: 27345 27365 24307
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-s567in1e5ekq2nlyhn8f987r@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The SGI UV tlb shootdown code panics the system with a NULL
pointer deference if 'nobau' is specified on the boot
commandline.
uv_flush_tlb_other() gets called for every flush, whether the
BAU is disabled or not. It should not be keeping the s_enters
statistic while the BAU is disabled.
The panic occurs because during initialization
init_per_cpu_tunables() does not set the bcp->statp pointer if
'nobau' was specified.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: <stable@vger.kernel.org> # 3.12.x
Link: http://lkml.kernel.org/r/E1VnzBi-0005yF-MU@eag09.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 cleanup patches from Ingo Molnar:
"Misc smaller cleanups"
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: ptrace.c only needs export.h and not the full module.h
x86, apb_timer: remove unused variable percpu_timer
um: don't compare a pointer to 0
arch/x86/platform/uv: use ARRAY_SIZE where possible
The flush tlb optimization code has logical issue on UV
platform. It doesn't flush the full range at all, since it
simply ignores its 'end' parameter (and hence also the "all"
indicator) in uv_flush_tlb_others() function.
Cliff's notes:
| I tested the patch on a UV. It has the effect of either
| clearing 1 or all TLBs in a cpu. I added some debugging to
| test for the cases when clearing all TLBs is overkill, and in
| practice it happens very seldom.
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Alex Shi <alex.shi@intel.com>
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Tested-by: Cliff Wickman <cpw@sgi.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86/mm changes from Peter Anvin:
"The big change here is the patchset by Alex Shi to use INVLPG to flush
only the affected pages when we only need to flush a small page range.
It also removes the special INVALIDATE_TLB_VECTOR interrupts (32
vectors!) and replace it with an ordinary IPI function call."
Fix up trivial conflicts in arch/x86/include/asm/apic.h (added code next
to changed line)
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tlb: Fix build warning and crash when building for !SMP
x86/tlb: do flush_tlb_kernel_range by 'invlpg'
x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR
x86/tlb: enable tlb flush range support for x86
mm/mmu_gather: enable tlb flush range in generic mmu_gather
x86/tlb: add tlb_flushall_shift knob into debugfs
x86/tlb: add tlb_flushall_shift for specific CPU
x86/tlb: fall back to flush all when meet a THP large page
x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range
x86/tlb_info: get last level TLB entry number of CPU
x86: Add read_mostly declaration/definition to variables from smp.h
x86: Define early read-mostly per-cpu macros
x86 has no flush_tlb_range support in instruction level. Currently the
flush_tlb_range just implemented by flushing all page table. That is not
the best solution for all scenarios. In fact, if we just use 'invlpg' to
flush few lines from TLB, we can get the performance gain from later
remain TLB lines accessing.
But the 'invlpg' instruction costs much of time. Its execution time can
compete with cr3 rewriting, and even a bit more on SNB CPU.
So, on a 512 4KB TLB entries CPU, the balance points is at:
(512 - X) * 100ns(assumed TLB refill cost) =
X(TLB flush entries) * 100ns(assumed invlpg cost)
Here, X is 256, that is 1/2 of 512 entries.
But with the mysterious CPU pre-fetcher and page miss handler Unit, the
assumed TLB refill cost is far lower then 100ns in sequential access. And
2 HT siblings in one core makes the memory access more faster if they are
accessing the same memory. So, in the patch, I just do the change when
the target entries is less than 1/16 of whole active tlb entries.
Actually, I have no data support for the percentage '1/16', so any
suggestions are welcomed.
As to hugetlb, guess due to smaller page table, and smaller active TLB
entries, I didn't see benefit via my benchmark, so no optimizing now.
My micro benchmark show in ideal scenarios, the performance improves 70
percent in reading. And in worst scenario, the reading/writing
performance is similar with unpatched 3.4-rc4 kernel.
Here is the reading data on my 2P * 4cores *HT NHM EP machine, with THP
'always':
multi thread testing, '-t' paramter is thread number:
with patch unpatched 3.4-rc4
./mprotect -t 1 14ns 24ns
./mprotect -t 2 13ns 22ns
./mprotect -t 4 12ns 19ns
./mprotect -t 8 14ns 16ns
./mprotect -t 16 28ns 26ns
./mprotect -t 32 54ns 51ns
./mprotect -t 128 200ns 199ns
Single process with sequencial flushing and memory accessing:
with patch unpatched 3.4-rc4
./mprotect 7ns 11ns
./mprotect -p 4096 -l 8 -n 10240
21ns 21ns
[ hpa: http://lkml.kernel.org/r/1B4B44D9196EFF41AE41FDA404FC0A100BFF94@SHSMSX101.ccr.corp.intel.com
has additional performance numbers. ]
Signed-off-by: Alex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/r/1340845344-27557-3-git-send-email-alex.shi@intel.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
On SGI's UV2 the BAU (Broadcast Assist Unit) driver can hang
under a heavy load. To cure this:
- Disable the UV2 extended status mode (see UV2_EXT_SHFT), as
this mode changes BAU behavior in more ways then just delivering
an extra bit of status. Revert status to just two meaningful bits,
like UV1.
- Use no IPI-style resets on UV2. Just give up the request for
whatever the reason it failed and let it be accomplished with
the legacy IPI method.
- Use no alternate sending descriptor (the former UV2 workaround
bcp->using_desc and handle_uv2_busy() stuff). Just disable the
use of the BAU for a period of time in favor of the legacy IPI
method when the h/w bug leaves a descriptor busy.
-- new tunable: giveup_limit determines the threshold at which a hub is
so plugged that it should do all requests with the legacy IPI method for a
period of time
-- generalize disable_for_congestion() (renamed disable_for_period()) for
use whenever a hub should avoid using the BAU for a period of time
Also:
- Fix find_another_by_swack(), which is part of the UV2 bug workaround
- Correct and clarify the statistics (new stats s_overipilimit, s_giveuplimit,
s_enters, s_ipifordisabled, s_plugged, s_congested)
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Link: http://lkml.kernel.org/r/20120622131459.GC31884@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch enables the BAU to be turned on or off dynamically.
echo "on" > /proc/sgi_uv/ptc_statistics
echo "off" > /proc/sgi_uv/ptc_statistics
The system may be booted with or without the nobau option.
Whether the system currently has the BAU off can be seen in
the /proc file -- normally with the baustats script.
Each cpu will have a 1 in the bauoff field if the BAU was turned
off, so baustats will give a count of cpus that have it off.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Link: http://lkml.kernel.org/r/20120622131330.GB31884@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Correct the calculation of a destination timeout period, which
is used to distinguish between a destination timeout and the
situation where all the target software ack resources are full
and a request is returned immediately.
The problem is that integer arithmetic was overflowing, yielding
a very large result.
Without this fix destination timeouts are identified as resource
'plugged' events and an ipi method of resource releasing is
unnecessarily employed.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Link: http://lkml.kernel.org/r/20120622131212.GA31884@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The SGI Altix UV2 BAU (Broadcast Assist Unit) as used for
tlb-shootdown (selective broadcast mode) always uses UV2
broadcast descriptor format. There is no need to clear the
'legacy' (UV1) mode, because the hardware always uses UV2 mode
for selective broadcast.
But the BIOS uses general broadcast and legacy mode, and the
hardware pays attention to the legacy mode bit for general
broadcast. So the kernel must not clear that mode bit.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/E1SccoO-0002Lh-Cb@eag09.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>