Some platforms, e.g. Knights Landing, use a common PCI device ID for
multiple instances of an uncore PMU device type. So it is impossible to
locate the specific instances only by PCI device ID.
The current code specially handles Knights Landing by arbitrarily pointing
an instance to an unused uncore box. However, we still have no idea
which uncore device is mapped to which box.
Furthermore, there could be more platforms which use a common PCI device ID
for uncore devices. We have to specially handle them one by one.
This patch records full device information (slot, func, and device ID)
in id_table[]. So the probe function can point the instance to a specific
uncore box by checking the full device information.
Tested-by: Lukasz Odzioba <lukasz.odzioba@intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: tglx@linutronix.de
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: bp@suse.de
Cc: harish.chegondi@intel.com
Cc: hubert.chrzaniuk@intel.com
Cc: lawrence.f.meadows@intel.com
Link: http://lkml.kernel.org/r/1463379504-39003-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On rapl cleanup path, kfree() is given by mistake the address of the
pointer of the structure to free (rapl_pmus->pmus + i). Pass the pointer
instead (rapl_pmus->pmus[i]).
Fixes: 9de8d68695 "perf/x86/intel/rapl: Convert it to a per package facility"
Signed-off-by: Vincent Stehlé <vincent.stehle@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1464101629-14905-1-git-send-email-vincent.stehle@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Remove an extraneous space to fix up indentation. Trivial and no
functional change
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1463503215-18339-1-git-send-email-colin.king@canonical.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
User visible:
- Honour the kernel.perf_event_max_stack knob more precisely by not counting
PERF_CONTEXT_{KERNEL,USER} when deciding when to stop adding entries to
the perf_sample->ip_callchain[] array (Arnaldo Carvalho de Melo)
- Fix identation of 'stalled-backend-cycles' in 'perf stat' (Namhyung Kim)
- Update runtime using 'cpu-clock' event in 'perf stat' (Namhyung Kim)
- Use 'cpu-clock' for cpu targets in 'perf stat' (Namhyung Kim)
- Avoid fractional digits for integer scales in 'perf stat' (Andi Kleen)
- Store vdso buildid unconditionally, as it appears in callchains and
we're not checking those when creating the build-id table, so we
end up not being able to resolve VDSO symbols when doing analysis
on a different machine than the one where recording was done, possibly
of a different arch even (arm -> x86_64) (He Kuang)
Infrastructure:
- Generalize max_stack sysctl handler, will be used for configuring
multiple kernel knobs related to callchains (Arnaldo Carvalho de Melo)
Cleanups:
- Introduce DSO__NAME_KALLSYMS and DSO__NAME_KCORE, to stop using
open coded strings (Masami Hiramatsu)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJXOn7eAAoJENZQFvNTUqpAsOAP/3f/XJekPQAnMcKRBp2noCuj
nRu1kBltVJyP8iOU5PKSJwel4F9ykNNMl+/rzzxHDo13IM8uc+HnZOJZ6e9mJIJ1
xqjdqM4EDlYYoFApJzCjTK6CMlevCazosdQT1bbmMDYVPc2uQR/GnutFrzqf/Plg
hEougIGtfrdy85g95CRdxpy2yMwDK4EwsiDRm9ib1hnuamQZl97buWemBVqSJmLY
p82E2aMU5Fv5+B8AO4I7V88ZmgpmryjxpM+LjffgNUDSKsSHrlG4NiQ3znV1bgst
Rc++w78+qxoIozOu6/IX8eSI2L/1eyM/yQ6Qre0KuvYXCl+NopTAYSSJlaA4tyHF
c55z7HucuyATN3PrFRHlbWUT/RMIVC0j0lnZOc7SJLl90hJQ+nv0iZcbYwMbeHu1
3LGlcd9jDwQYiClbaT9ATxZJ8B9An0/k/HJdatbAHN0wRomP2Ozz/qD2nmEbUwpV
sCyLOo/LJkvVkuUjSg6ZiOArNIk4iTSPSAUV+SAL6YOEOZMAX5ISUJQ174+zFC9a
gqtVsCXvwLIsndXb8ys1r9/fit/MUci0OzKX3SG1K765+E4Bk23KcAgMNbM/a7lp
ZmHDXMC+yBYcnYNnaxkp7c55CWUlKGOeR4e+KmB99KoeIleYgPhD2UM5beo61TmN
yUEPtiiFiZmTRkiAu83R
=7OdF
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo-20160516' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
- Honour the kernel.perf_event_max_stack knob more precisely by not counting
PERF_CONTEXT_{KERNEL,USER} when deciding when to stop adding entries to
the perf_sample->ip_callchain[] array (Arnaldo Carvalho de Melo)
- Fix identation of 'stalled-backend-cycles' in 'perf stat' (Namhyung Kim)
- Update runtime using 'cpu-clock' event in 'perf stat' (Namhyung Kim)
- Use 'cpu-clock' for cpu targets in 'perf stat' (Namhyung Kim)
- Avoid fractional digits for integer scales in 'perf stat' (Andi Kleen)
- Store vdso buildid unconditionally, as it appears in callchains and
we're not checking those when creating the build-id table, so we
end up not being able to resolve VDSO symbols when doing analysis
on a different machine than the one where recording was done, possibly
of a different arch even (arm -> x86_64) (He Kuang)
Infrastructure changes:
- Generalize max_stack sysctl handler, will be used for configuring
multiple kernel knobs related to callchains (Arnaldo Carvalho de Melo)
Cleanups:
- Introduce DSO__NAME_KALLSYMS and DSO__NAME_KCORE, to stop using
open coded strings (Masami Hiramatsu)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When booting with nr_cpus=1, uncore_pci_probe tries to init the PCI/uncore
also for the other packages and fails with warning when they are not found.
The warning is bogus because it's correct to fail here for packages which are
not initialized. Remove it and return silently.
Fixes: cf6d445f68 "perf/x86/uncore: Track packages, not per CPU data"
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: stable@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We will use it to count how many addresses are in the entry->ip[] array,
excluding PERF_CONTEXT_{KERNEL,USER,etc} entries, so that we can really
return the number of entries specified by the user via the relevant
sysctl, kernel.perf_event_max_contexts, or via the per event
perf_event_attr.sample_max_stack knob.
This way we keep the perf_sample->ip_callchain->nr meaning, that is the
number of entries, be it real addresses or PERF_CONTEXT_ entries, while
honouring the max_stack knobs, i.e. the end result will be max_stack
entries if we have at least that many entries in a given stack trace.
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-s8teto51tdqvlfhefndtat9r@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This makes perf_callchain_{user,kernel}() receive the max stack
as context for the perf_callchain_entry, instead of accessing
the global sysctl_perf_event_max_stack.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lkml.kernel.org/n/tip-kolmn1yo40p7jhswxwrc7rrd@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull x86 platform updates from Ingo Molnar:
"The main change is the addition of SGI/UV4 support"
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
x86/platform/UV: Fix incorrect nodes and pnodes for cpuless and memoryless nodes
x86/platform/UV: Remove Obsolete GRU MMR address translation
x86/platform/UV: Update physical address conversions for UV4
x86/platform/UV: Build GAM reference tables
x86/platform/UV: Support UV4 socket address changes
x86/platform/UV: Add obtaining GAM Range Table from UV BIOS
x86/platform/UV: Add UV4 addressing discovery function
x86/platform/UV: Fold blade info into per node hub info structs
x86/platform/UV: Allocate common per node hub info structs on local node
x86/platform/UV: Move blade local processor ID to the per cpu info struct
x86/platform/UV: Move scir info to the per cpu info struct
x86/platform/UV: Create per cpu info structs to replace per hub info structs
x86/platform/UV: Update MMIOH setup function to work for both UV3 and UV4
x86/platform/UV: Clean up redunduncies after merge of UV4 MMR definitions
x86/platform/UV: Add UV4 Specific MMR definitions
x86/platform/UV: Prep for UV4 MMR updates
x86/platform/UV: Add UV MMR Illegal Access Function
x86/platform/UV: Add UV4 Specific Defines
x86/platform/UV: Add UV Architecture Defines
x86/platform/UV: Add Initial UV4 definitions
...
Pull x86 debug cleanup from Ingo Molnar:
"A printk() output simplification"
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/dumpstack: Combine some printk()s
Pull x86 boot updates from Ingo Molnar:
"The biggest changes in this cycle were:
- prepare for more KASLR related changes, by restructuring, cleaning
up and fixing the existing boot code. (Kees Cook, Baoquan He,
Yinghai Lu)
- simplifly/concentrate subarch handling code, eliminate
paravirt_enabled() usage. (Luis R Rodriguez)"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
x86/KASLR: Clarify purpose of each get_random_long()
x86/KASLR: Add virtual address choosing function
x86/KASLR: Return earliest overlap when avoiding regions
x86/KASLR: Add 'struct slot_area' to manage random_addr slots
x86/boot: Add missing file header comments
x86/KASLR: Initialize mapping_info every time
x86/boot: Comment what finalize_identity_maps() does
x86/KASLR: Build identity mappings on demand
x86/boot: Split out kernel_ident_mapping_init()
x86/boot: Clean up indenting for asm/boot.h
x86/KASLR: Improve comments around the mem_avoid[] logic
x86/boot: Simplify pointer casting in choose_random_location()
x86/KASLR: Consolidate mem_avoid[] entries
x86/boot: Clean up pointer casting
x86/boot: Warn on future overlapping memcpy() use
x86/boot: Extract error reporting functions
x86/boot: Correctly bounds-check relocations
x86/KASLR: Clean up unused code from old 'run_size' and rename it to 'kernel_total_size'
x86/boot: Fix "run_size" calculation
x86/boot: Calculate decompression size during boot not build
...
Pull x86 asm updates from Ingo Molnar:
"The main changes in this cycle were:
- MSR access API fixes and enhancements (Andy Lutomirski)
- early exception handling improvements (Andy Lutomirski)
- user-space FS/GS prctl usage fixes and improvements (Andy
Lutomirski)
- Remove the cpu_has_*() APIs and replace them with equivalents
(Borislav Petkov)
- task switch micro-optimization (Brian Gerst)
- 32-bit entry code simplification (Denys Vlasenko)
- enhance PAT handling in enumated CPUs (Toshi Kani)
... and lots of other cleanups/fixlets"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
x86/arch_prctl/64: Restore accidentally removed put_cpu() in ARCH_SET_GS
x86/entry/32: Remove asmlinkage_protect()
x86/entry/32: Remove GET_THREAD_INFO() from entry code
x86/entry, sched/x86: Don't save/restore EFLAGS on task switch
x86/asm/entry/32: Simplify pushes of zeroed pt_regs->REGs
selftests/x86/ldt_gdt: Test set_thread_area() deletion of an active segment
x86/tls: Synchronize segment registers in set_thread_area()
x86/asm/64: Rename thread_struct's fs and gs to fsbase and gsbase
x86/arch_prctl/64: Remove FSBASE/GSBASE < 4G optimization
x86/segments/64: When load_gs_index fails, clear the base
x86/segments/64: When loadsegment(fs, ...) fails, clear the base
x86/asm: Make asm/alternative.h safe from assembly
x86/asm: Stop depending on ptrace.h in alternative.h
x86/entry: Rename is_{ia32,x32}_task() to in_{ia32,x32}_syscall()
x86/asm: Make sure verify_cpu() has a good stack
x86/extable: Add a comment about early exception handlers
x86/msr: Set the return value to zero when native_rdmsr_safe() fails
x86/paravirt: Make "unsafe" MSR accesses unsafe even if PARAVIRT=y
x86/paravirt: Add paravirt_{read,write}_msr()
x86/msr: Carry on after a non-"safe" MSR access fails
...
Pull scheduler updates from Ingo Molnar:
- massive CPU hotplug rework (Thomas Gleixner)
- improve migration fairness (Peter Zijlstra)
- CPU load calculation updates/cleanups (Yuyang Du)
- cpufreq updates (Steve Muckle)
- nohz optimizations (Frederic Weisbecker)
- switch_mm() micro-optimization on x86 (Andy Lutomirski)
- ... lots of other enhancements, fixes and cleanups.
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (66 commits)
ARM: Hide finish_arch_post_lock_switch() from modules
sched/core: Provide a tsk_nr_cpus_allowed() helper
sched/core: Use tsk_cpus_allowed() instead of accessing ->cpus_allowed
sched/loadavg: Fix loadavg artifacts on fully idle and on fully loaded systems
sched/fair: Correct unit of load_above_capacity
sched/fair: Clean up scale confusion
sched/nohz: Fix affine unpinned timers mess
sched/fair: Fix fairness issue on migration
sched/core: Kill sched_class::task_waking to clean up the migration logic
sched/fair: Prepare to fix fairness problems on migration
sched/fair: Move record_wakee()
sched/core: Fix comment typo in wake_q_add()
sched/core: Remove unused variable
sched: Make hrtick_notifier an explicit call
sched/fair: Make ilb_notifier an explicit call
sched/hotplug: Make activate() the last hotplug step
sched/hotplug: Move migration CPU_DYING to sched_cpu_dying()
sched/migration: Move CPU_ONLINE into scheduler state
sched/migration: Move calc_load_migrate() into CPU_DYING
sched/migration: Move prepare transition to SCHED_STARTING state
...
Pull RAS updates from Ingo Molnar:
"Main changes in this cycle were:
- AMD MCE/RAS handling updates (Yazen Ghannam, Aravind
Gopalakrishnan)
- Cleanups (Borislav Petkov)
- logging fix (Tony Luck)"
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/RAS: Add SMCA support to AMD Error Injector
EDAC, mce_amd: Detect SMCA using X86_FEATURE_SMCA
x86/mce: Update AMD mcheck init to use cpu_has() facilities
x86/cpu: Add detection of AMD RAS Capabilities
x86/mce/AMD: Save an indentation level in prepare_threshold_block()
x86/mce/AMD: Disable LogDeferredInMcaStat for SMCA systems
x86/mce/AMD: Log Deferred Errors using SMCA MCA_DE{STAT,ADDR} registers
x86/mce: Detect local MCEs properly
x86/mce: Look in genpool instead of mcelog for pending error records
x86/mce: Detect and use SMCA-specific msr_ops
x86/mce: Define vendor-specific MSR accessors
x86/mce: Carve out writes to MCx_STATUS and MCx_CTL
x86/mce: Grade uncorrected errors for SMCA-enabled systems
x86/mce: Log MCEs after a warm rest on AMD, Fam17h and later
x86/mce: Remove explicit smp_rmb() when starting CPUs sync
x86/RAS: Rename AMD MCE injector config item
Pull perf updates from Ingo Molnar:
"Bigger kernel side changes:
- Add backwards writing capability to the perf ring-buffer code,
which is preparation for future advanced features like robust
'overwrite support' and snapshot mode. (Wang Nan)
- Add pause and resume ioctls for the perf ringbuffer (Wang Nan)
- x86 Intel cstate code cleanups and reorgnization (Thomas Gleixner)
- x86 Intel uncore and CPU PMU driver updates (Kan Liang, Peter
Zijlstra)
- x86 AUX (Intel PT) related enhancements and updates (Alexander
Shishkin)
- x86 MSR PMU driver enhancements and updates (Huang Rui)
- ... and lots of other changes spread out over 40+ commits.
Biggest tooling side changes:
- 'perf trace' features and enhancements. (Arnaldo Carvalho de Melo)
- BPF tooling updates (Wang Nan)
- 'perf sched' updates (Jiri Olsa)
- 'perf probe' updates (Masami Hiramatsu)
- ... plus 200+ other enhancements, fixes and cleanups to tools/
The merge commits, the shortlog and the changelogs contain a lot more
details"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (249 commits)
perf/core: Disable the event on a truncated AUX record
perf/x86/intel/pt: Generate PMI in the STOP region as well
perf buildid-cache: Use lsdir() for looking up buildid caches
perf symbols: Use lsdir() for the search in kcore cache directory
perf tools: Use SBUILD_ID_SIZE where applicable
perf tools: Fix lsdir to set errno correctly
perf trace: Move seccomp args beautifiers to tools/perf/trace/beauty/
perf trace: Move flock op beautifier to tools/perf/trace/beauty/
perf build: Add build-test for debug-frame on arm/arm64
perf build: Add build-test for libunwind cross-platforms support
perf script: Fix export of callchains with recursion in db-export
perf script: Fix callchain addresses in db-export
perf script: Fix symbol insertion behavior in db-export
perf symbols: Add dso__insert_symbol function
perf scripting python: Use Py_FatalError instead of die()
perf tools: Remove xrealloc and ALLOC_GROW
perf help: Do not use ALLOC_GROW in add_cmd_list
perf pmu: Make pmu_formats_string to check return value of strbuf
perf header: Make topology checkers to check return value of strbuf
perf tools: Make alias handler to check return value of strbuf
...
Pull support for killable rwsems from Ingo Molnar:
"This, by Michal Hocko, implements down_write_killable().
The main usecase will be to update mm_sem usage sites to use this new
API, to allow the mm-reaper introduced in commit aac4536355 ("mm,
oom: introduce oom reaper") to tear down oom victim address spaces
asynchronously with minimum latencies and without deadlock worries"
[ The vfs will want it too as the inode lock is changed from a mutex to
a rwsem due to the parallel lookup and readdir updates ]
* 'locking-rwsem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rwsem: Fix comment on register clobbering
locking/rwsem: Fix down_write_killable()
locking/rwsem, x86: Add frame annotation for call_rwsem_down_write_failed_killable()
locking/rwsem: Provide down_write_killable()
locking/rwsem, x86: Provide __down_write_killable()
locking/rwsem, s390: Provide __down_write_killable()
locking/rwsem, ia64: Provide __down_write_killable()
locking/rwsem, alpha: Provide __down_write_killable()
locking/rwsem: Introduce basis for down_write_killable()
locking/rwsem, sparc: Drop superfluous arch specific implementation
locking/rwsem, sh: Drop superfluous arch specific implementation
locking/rwsem, xtensa: Drop superfluous arch specific implementation
locking/rwsem: Drop explicit memory barriers
locking/rwsem: Get rid of __down_write_nested()
Pull EFI updates from Ingo Molnar:
"The main changes in this cycle were:
- Drop the unused EFI_SYSTEM_TABLES efi.flags bit and ensure the
ARM/arm64 EFI System Table mapping is read-only (Ard Biesheuvel)
- Add a comment to explain that one of the code paths in the x86/pat
code is only executed for EFI boot (Matt Fleming)
- Improve Secure Boot status checks on arm64 and handle unexpected
errors (Linn Crosetto)
- Remove the global EFI memory map variable 'memmap' as the same
information is already available in efi::memmap (Matt Fleming)
- Add EFI Memory Attribute table support for ARM/arm64 (Ard
Biesheuvel)
- Add EFI GOP framebuffer support for ARM/arm64 (Ard Biesheuvel)
- Add EFI Bootloader Control driver for storing reboot(2) data in EFI
variables for consumption by bootloaders (Jeremy Compostella)
- Add Core EFI capsule support (Matt Fleming)
- Add EFI capsule char driver (Kweh, Hock Leong)
- Unify EFI memory map code for ARM and arm64 (Ard Biesheuvel)
- Add generic EFI support for detecting when firmware corrupts CPU
status register bits (like IRQ flags) when performing EFI runtime
service calls (Mark Rutland)
... and other misc cleanups"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
efivarfs: Make efivarfs_file_ioctl() static
efi: Merge boolean flag arguments
efi/capsule: Move 'capsule' to the stack in efi_capsule_supported()
efibc: Fix excessive stack footprint warning
efi/capsule: Make efi_capsule_pending() lockless
efi: Remove unnecessary (and buggy) .memmap initialization from the Xen EFI driver
efi/runtime-wrappers: Remove ARCH_EFI_IRQ_FLAGS_MASK #ifdef
x86/efi: Enable runtime call flag checking
arm/efi: Enable runtime call flag checking
arm64/efi: Enable runtime call flag checking
efi/runtime-wrappers: Detect firmware IRQ flag corruption
efi/runtime-wrappers: Remove redundant #ifdefs
x86/efi: Move to generic {__,}efi_call_virt()
arm/efi: Move to generic {__,}efi_call_virt()
arm64/efi: Move to generic {__,}efi_call_virt()
efi/runtime-wrappers: Add {__,}efi_call_virt() templates
efi/arm-init: Reserve rather than unmap the memory map for ARM as well
efi: Add misc char driver interface to update EFI firmware
x86/efi: Force EFI reboot to process pending capsules
efi: Add 'capsule' update support
...
Pull core signal updates from Ingo Molnar:
"These updates from Stas Sergeev and Andy Lutomirski, improve the
sigaltstack interface by extending its ABI with the SS_AUTODISARM
feature, which makes it possible to use swapcontext() in a sighandler
that works on sigaltstack. Without this flag, the subsequent signal
will corrupt the state of the switched-away sighandler.
The inspiration is more robust dosemu signal handling"
* 'core-signals-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
signals/sigaltstack: Change SS_AUTODISARM to (1U << 31)
signals/sigaltstack: Report current flag bits in sigaltstack()
selftests/sigaltstack: Fix the sigaltstack test on old kernels
signals/sigaltstack: If SS_AUTODISARM, bypass on_sig_stack()
selftests/sigaltstack: Add new testcase for sigaltstack(SS_ONSTACK|SS_AUTODISARM)
signals/sigaltstack: Implement SS_AUTODISARM flag
signals/sigaltstack: Prepare to add new SS_xxx flags
signals/sigaltstack, x86/signals: Unify the x86 sigaltstack check with other architectures
Pull x86 fix from Thomas Gleixner:
"Just the missing compat entry for the new pread/writev2"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Use compat version for preadv2 and pwritev2
Pull perf fixes from Ingo Molnar:
"An uncharacteristically large number of bugs popped up in the last
week:
- various tooling fixes, two crashes and build problems
- two Intel PT fixes
- an KNL uncore driver fix
- an Intel PMU driver fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf stat: Fallback to user only counters when perf_event_paranoid > 1
perf evsel: Handle EACCESS + perf_event_paranoid=2 in fallback()
perf evsel: Improve EPERM error handling in open_strerror()
tools lib traceevent: Do not reassign parg after collapse_tree()
perf probe: Check if dwarf_getlocations() is available
perf dwarf: Guard !x86_64 definitions under #ifdef else clause
perf tools: Use readdir() instead of deprecated readdir_r()
perf thread_map: Use readdir() instead of deprecated readdir_r()
perf script: Use readdir() instead of deprecated readdir_r()
perf tools: Use readdir() instead of deprecated readdir_r()
perf/core: Disable the event on a truncated AUX record
perf/x86/intel/pt: Generate PMI in the STOP region as well
perf/x86: Fix undefined shift on 32-bit kernels
perf/x86/msr: Fix SMI overflow
perf/x86/intel/uncore: Fix CHA registers configuration procedure for Knights Landing platform
perf diff: Fix duplicated output column
Currently, the PT driver always sets the PMI bit one region (page) before
the STOP region so that we can wake up the consumer before we run out of
room in the buffer and have to disable the event. However, we also need
an interrupt in the last output region, so that we actually get to disable
the event (if no more room from new data is available at that point),
otherwise hardware just quietly refuses to start, but the event is
scheduled in and we end up losing trace data till the event gets removed.
For a cpu-wide event it is even worse since there may not be any
re-scheduling at all and no chance for the ring buffer code to notice
that its buffer is filled up and the event needs to be disabled (so that
the consumer can re-enable it when it finishes reading the data out). In
other words, all the trace data will be lost after the buffer gets filled
up.
This patch makes PT also generate a PMI when the last output region is
full.
Reported-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/1462886313-13660-2-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Similar to preadv and pwritev, preadv2 and pwritev2 need compat entries
in the 32-bit syscall table.
This bug was found by strace test suite.
Fixes: 4babf2c5ef ("x86: wire up preadv2 and pwritev2")
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Acked-by: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20160511084817.GA29823@altlinux.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Currently, the PT driver always sets the PMI bit one region (page) before
the STOP region so that we can wake up the consumer before we run out of
room in the buffer and have to disable the event. However, we also need
an interrupt in the last output region, so that we actually get to disable
the event (if no more room from new data is available at that point),
otherwise hardware just quietly refuses to start, but the event is
scheduled in and we end up losing trace data till the event gets removed.
For a cpu-wide event it is even worse since there may not be any
re-scheduling at all and no chance for the ring buffer code to notice
that its buffer is filled up and the event needs to be disabled (so that
the consumer can re-enable it when it finishes reading the data out). In
other words, all the trace data will be lost after the buffer gets filled
up.
This patch makes PT also generate a PMI when the last output region is
full.
Reported-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/1462886313-13660-2-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jim reported:
UBSAN: Undefined behaviour in arch/x86/events/intel/core.c:3708:12
shift exponent 35 is too large for 32-bit type 'long unsigned int'
The use of 'unsigned long' type obviously is not correct here, make it
'unsigned long long' instead.
Reported-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Imre Palik <imrep@amazon.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 2c33645d36 ("perf/x86: Honor the architectural performance monitoring version")
Link: http://lkml.kernel.org/r/1462974711-10037-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We compute 'delta' and properly sign extend it and then ignore it and
recompute the raw value, loosing the sign extention.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: kan.liang@intel.com
Cc: linux-kernel@vger.kernel.org
Cc: luto@kernel.org
Cc: ray.huang@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add a new CPUID leaf to hold the contents of CPUID 0x80000007_EBX (RasCap).
Define bits that are currently in use:
Bit 0: McaOverflowRecov
Bit 1: SUCCOR
Bit 3: ScalableMca
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
[ Shorten comment. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462971509-3856-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Do the !SMCA work first and then save us an indentation level for the
SMCA code.
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462971509-3856-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Disable Deferred Error logging in MCA_{STATUS,ADDR} additionally for
SMCA systems as this information will retrieved from MCA_DE{STAT,ADDR}
on those systems.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
[ Simplify, drop SMCA_MCAX_EN_OFF define too. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462971509-3856-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Scalable MCA provides new registers for all banks for logging deferred
errors: MCA_DESTAT and MCA_DEADDR. Deferred errors are always logged to
these registers.
Update the AMD deferred error handler to use these registers, if
available.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
[ Sanity-check __log_error() args, massage a bit. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462971509-3856-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The x86 exception table sorting was changed in commit 29934b0fb8
("x86/extable: use generic search and sort routines") to use the arch
independent code in lib/extable.c. However, the patch was mangled
somehow on its way into the kernel from the last version posted at [1].
The committed version kind of attempted to incorporate the changes of
commit 548acf1923 ("x86/mm: Expand the exception table logic to allow
new handling options") as in _completely_ _ignoring_ the x86 specific
'handler' member of struct exception_table_entry. This effectively
broke the sorting as entries will only partly be swapped now.
Fortunately, the x86 Kconfig selects BUILDTIME_EXTABLE_SORT, so the
exception table doesn't need to be sorted at runtime. However, in case
that ever changes, we better not break the exception table sorting just
because of that.
[ Ard Biesheuvel points out that BUILDTIME_EXTABLE_SORT applies to the
core image only, but we still rely on the sorting routines for modules
in that case - Linus ]
Fix this by providing a swap_ex_entry_fixup() macro that takes care of
the 'handler' member.
[1] https://lkml.org/lkml/2016/1/27/232
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Fixes: 29934b0fb8 ("x86/extable: use generic search and sort routines")
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
an objtool warning for CONFIG_PROFILE_ALL_BRANCHES.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJXM0h+AAoJEL/70l94x66DhjwH/2l24ouNC32DJGoY511yWogF
QrBrmQnnT1r+v9IDCC5ITCCkJp+EPF0uFwLI+T6URzRdSdNUN/EqYymUyhsICWwb
9xIsR9VraeB4OhH5IMps1Pkwyx6FMjeVqvVx8dib+NUQpV2o5KxkkFsE9oIX7mtA
PgwRaLsV8MFZH06g/qvwYPs5NNUecQU8AFCAps0PzWSXG6KHmzsjJZjsXbDhb3n4
oOGJTLXlRS+SSOWFjngYJNIQ7Z8V1WHM6qftZXtcDWIFbq8CuWn9dZR2WhpUu+nK
J/I5WC99stxgljFOjI/zdPKQbRy0eBQyDAPHAblDo83vSKsDRgYGX3608gGz3xY=
=s75+
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"Two small x86 patches, improving "make kvmconfig" and fixing an
objtool warning for CONFIG_PROFILE_ALL_BRANCHES"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvmconfig: add more virtio drivers
x86/kvm: Add stack frame dependency to fastop() inline asm
"make defconfig kvmconfig" is supposed to end up with usable kernel for
KVM guest. In practice, it won't work for e.g. Hetzner VPS (KVM-based)
unless you add these options.
Signed-off-by: Andrey Utkin <andrey_utkin@fastmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The kbuild test robot reported this objtool warning [1]:
arch/x86/kvm/emulate.o: warning: objtool: fastop()+0x69: call without frame pointer save/setup
The issue seems to be caused by CONFIG_PROFILE_ALL_BRANCHES. With that
option, for some reason gcc decides not to create a stack frame in
fastop() before doing the inline asm call, which can result in a bad
stack trace.
Force a stack frame to be created if CONFIG_FRAME_POINTER is enabled by
listing the stack pointer as an output operand for the inline asm
statement.
This change has no effect for !CONFIG_PROFILE_ALL_BRANCHES.
[1] https://lists.01.org/pipermail/kbuild-all/2016-March/018249.html
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KASLR will be calling get_random_long() twice, but the debug output
won't distinguishing between them. This patch adds a report on when it
is fetching the physical vs virtual address. With this, once the virtual
offset is separate, the report changes from:
KASLR using RDTSC...
KASLR using RDTSC...
into:
Physical KASLR using RDTSC...
Virtual KASLR using RDTSC...
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1462825332-10505-7-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
To support randomizing the kernel virtual address separately from the
physical address, this patch adds find_random_virt_addr() to choose
a slot anywhere between LOAD_PHYSICAL_ADDR and KERNEL_IMAGE_SIZE.
Since this address is virtual, not physical, we can place the kernel
anywhere in this region, as long as it is aligned and (in the case of
kernel being larger than the slot size) placed with enough room to load
the entire kernel image.
For clarity and readability, find_random_addr() is renamed to
find_random_phys_addr() and has "size" renamed to "image_size" to match
find_random_virt_addr().
Signed-off-by: Baoquan He <bhe@redhat.com>
[ Rewrote changelog, refactored slot calculation for readability. ]
[ Renamed find_random_phys_addr() and size argument. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1462825332-10505-6-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In preparation for being able to detect where to split up contiguous
memory regions that overlap with memory regions to avoid, we need to
pass back what the earliest overlapping region was. This modifies the
overlap checker to return that information.
Based on a separate mem_min_overlap() implementation by Baoquan He.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1462825332-10505-5-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In order to support KASLR moving the kernel anywhere in physical memory
(which could be up to 64TB), we need to handle counting the potential
randomization locations in a more efficient manner.
In the worst case with 64TB, there could be roughly 32 * 1024 * 1024
randomization slots if CONFIG_PHYSICAL_ALIGN is 0x1000000. Currently
the starting address of candidate positions is stored into the slots[]
array, one at a time. This method would cost too much memory and it's
also very inefficient to get and save the slot information into the slot
array one by one.
This patch introduces 'struct slot_area' to manage each contiguous region
of randomization slots. Each slot_area will contain the starting address
and how many available slots are in this area. As with the original code,
the slot_areas[] will avoid the mem_avoid[] regions.
Since setup_data is a linked list, it could contain an unknown number
of memory regions to be avoided, which could cause us to fragment
the contiguous memory that the slot_area array is tracking. In normal
operation this level of fragmentation will be extremely rare, but we
choose a suitably large value (100) for the array. If setup_data forces
the slot_area array to become highly fragmented and there are more
slots available beyond the first 100 found, the rest will be ignored
for KASLR selection.
The function store_slot_info() is used to calculate the number of slots
available in the passed-in memory region and stores it into slot_areas[]
after adjusting for alignment and size requirements.
Signed-off-by: Baoquan He <bhe@redhat.com>
[ Rewrote changelog, squashed with new functions. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1462825332-10505-4-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There were some files with missing header comments. Since they are
included from both compressed and regular kernels, make note of that.
Also corrects a typo in the mem_avoid comments.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1462825332-10505-3-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As it turns out, mapping_info DOES need to be initialized every
time, because pgt_data address could be changed during kernel
relocation. So it can not be build time assigned.
Without this, page tables were not being corrected updated, which
could cause reboots when a physical address beyond 2G was chosen.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1462825332-10505-2-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>