This commit introduces code for the live patching core. It implements
an ftrace-based mechanism and kernel interface for doing live patching
of kernel and kernel module functions.
It represents the greatest common functionality set between kpatch and
kgraft and can accept patches built using either method.
This first version does not implement any consistency mechanism that
ensures that old and new code do not run together. In practice, ~90% of
CVEs are safe to apply in this way, since they simply add a conditional
check. However, any function change that can not execute safely with
the old version of the function can _not_ be safely applied in this
version.
[ jkosina@suse.cz: due to the number of contributions that got folded into
this original patch from Seth Jennings, add SUSE's copyright as well, as
discussed via e-mail ]
Signed-off-by: Seth Jennings <sjenning@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Pull x86 apic updates from Thomas Gleixner:
"After stopping the full x86/apic branch, I took some time to go
through the first block of patches again, which are mostly cleanups
and preparatory work for the irqdomain conversion and ioapic hotplug
support.
Unfortunaly one of the real problematic commits was right at the
beginning, so I rebased this portion of the pending patches without
the offenders.
It would be great to get this into 3.19. That makes reworking the
problematic parts simpler. The usual tip testing did not unearth any
issues and it is fully bisectible now.
I'm pretty confident that this wont affect the calmness of the xmas
season.
Changes:
- Split the convoluted io_apic.c code into domain specific parts
(vector, ioapic, msi, htirq)
- Introduce proper helper functions to retrieve irq specific data
instead of open coded dereferencing of pointers
- Preparatory work for ioapic hotplug and irqdomain conversion
- Removal of the non functional pci-ioapic driver
- Removal of unused irq entry stubs
- Make native_smp_prepare_cpus() preemtible to avoid GFP_ATOMIC
allocations for everything which is called from there.
- Small cleanups and fixes"
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
iommu/amd: Use helpers to access irq_cfg data structure associated with IRQ
iommu/vt-d: Use helpers to access irq_cfg data structure associated with IRQ
x86: irq_remapping: Use helpers to access irq_cfg data structure associated with IRQ
x86, irq: Use helpers to access irq_cfg data structure associated with IRQ
x86, irq: Make MSI and HT_IRQ indepenent of X86_IO_APIC
x86, irq: Move IRQ initialization routines from io_apic.c into vector.c
x86, irq: Move IOAPIC related declarations from hw_irq.h into io_apic.h
x86, irq: Move HT IRQ related code from io_apic.c into htirq.c
x86, irq: Move PCI MSI related code from io_apic.c into msi.c
x86, irq: Replace printk(KERN_LVL) with pr_lvl() utilities
x86, irq: Make UP version of irq_complete_move() an inline stub
x86, irq: Move local APIC related code from io_apic.c into vector.c
x86, irq: Introduce helpers to access struct irq_cfg
x86, irq: Protect __clear_irq_vector() with vector_lock
x86, irq: Rename local APIC related functions in io_apic.c as apic_xxx()
x86, irq: Refine hw_irq.h to prepare for irqdomain support
x86, irq: Convert irq_2_pin list to generic list
x86, irq: Kill useless parameter 'irq_attr' of IO_APIC_get_PCI_irq_vector()
x86, irq, acpi: Get rid of special handling of GSI for ACPI SCI
x86, irq: Introduce helper to check whether an IOAPIC has been registered
...
Pull x86 MPX fixes from Thomas Gleixner:
"Three updates for the new MPX infrastructure:
- Use the proper error check in the trap handler
- Add a proper config option for it
- Bring documentation up to date"
* 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, mpx: Give MPX a real config option prompt
x86, mpx: Update documentation
x86_64/traps: Fix always true condition
Pull x86 fix from Ingo Molnar:
"This contains a single TLS ABI validation fix from Andy Lutomirski"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tls: Don't validate lm in set_thread_area() after all
Pull perf fixes and cleanups from Ingo Molnar:
"A kernel fix plus mostly tooling fixes, but also some tooling
restructuring and cleanups"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
perf: Fix building warning on ARM 32
perf symbols: Fix use after free in filename__read_build_id
perf evlist: Use roundup_pow_of_two
tools: Adopt roundup_pow_of_two
perf tools: Make the mmap length autotuning more robust
tools: Adopt rounddown_pow_of_two and deps
tools: Adopt fls_long and deps
tools: Move bitops.h from tools/perf/util to tools/
tools: Introduce asm-generic/bitops.h
tools lib: Move asm-generic/bitops/find.h code to tools/include and tools/lib
tools: Whitespace prep patches for moving bitops.h
tools: Move code originally from asm-generic/atomic.h into tools/include/asm-generic/
tools: Move code originally from linux/log2.h to tools/include/linux/
tools: Move __ffs implementation to tools/include/asm-generic/bitops/__ffs.h
perf evlist: Do not use hard coded value for a mmap_pages default
perf trace: Let the perf_evlist__mmap autosize the number of pages to use
perf evlist: Improve the strerror_mmap method
perf evlist: Clarify sterror_mmap variable names
perf evlist: Fixup brown paper bag on "hint" for --mmap-pages cmdline arg
perf trace: Provide a better explanation when mmap fails
...
- Fix a regression in leds-gpio introduced by a recent commit that
inadvertently changed the name of one of the properties used by
the driver (Fabio Estevam).
- Fix a regression in the ACPI backlight driver introduced by a
recent fix that missed one special case that had to be taken
into account (Aaron Lu).
- Drop the level of some new kernel messages from the ACPI core
introduced by a recent commit to KERN_DEBUG which they should
have used from the start and drop some other unuseful KERN_ERR
messages printed by ACPI (Rafael J Wysocki).
- Revert an incorrect commit modifying the cpupower tool
(Prarit Bhargava).
- Fix two regressions introduced by recent commits in the OPP
library and clean up some existing minor issues in that code
(Viresh Kumar).
- Continue to replace CONFIG_PM_RUNTIME with CONFIG_PM throughout
the tree (or drop it where that can be done) in order to make
it possible to eliminate CONFIG_PM_RUNTIME (Rafael J Wysocki,
Ulf Hansson, Ludovic Desroches). There will be one more
"CONFIG_PM_RUNTIME removal" batch after this one, because some
new uses of it have been introduced during the current merge
window, but that should be sufficient to finally get rid of it.
- Make the ACPI EC driver more robust against race conditions
related to GPE handler installation failures (Lv Zheng).
- Prevent the ACPI device PM core code from attempting to
disable GPEs that it has not enabled which confuses ACPICA
and makes it report errors unnecessarily (Rafael J Wysocki).
- Add a "force" command line switch to the intel_pstate driver
to make it possible to override the blacklisting of some
systems in that driver if needed (Ethan Zhao).
- Improve intel_pstate code documentation and add a MAINTAINERS
entry for it (Kristen Carlson Accardi).
- Make the ACPI fan driver create cooling device interfaces
witn names that reflect the IDs of the ACPI device objects
they are associated with, except for "generic" ACPI fans
(PNP ID "PNP0C0B"). That's necessary for user space thermal
management tools to be able to connect the fans with the
parts of the system they are supposed to be cooling properly.
From Srinivas Pandruvada.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJUk0IDAAoJEILEb/54YlRx7fgP/3+yF/0TnEW93j2ALDAQFiLF
tSv2A2vQC8vtMJjjWx0z/HqPh86gfaReEFZmUJD/Q/e2LXEnxNZJ+QMjcekPVkDM
mTvcIMc2MR8vOA/oMkgxeaKregrrx7RkCfojd+NWZhVukkjl+mvBHgAnYjXRL+NZ
unDWGlbHG97vq/3kGjPYhDS00nxHblw8NHFBu5HL5RxwABdWoeZJITwqxXWyuPLw
nlqNWlOxmwvtSbw2VMKz0uof1nFHyQLykYsMG0ZsyayCRdWUZYkEqmE7GGpCLkLu
D6yfmlpen6ccIOsEAae0eXBt50IFY9Tihk5lovx1mZmci2SNRg29BqMI105wIn0u
8b8Ej7MNHp7yMxRpB5WfU90p/y7ioJns9guFZxY0CKaRnrI2+BLt3RscMi3MPI06
Cu2/WkSSa09fhDPA+pk+VDYsmWgyVawigesNmMP5/cvYO/yYywVRjOuO1k77qQGp
4dSpFYEHfpxinejZnVZOk2V9MkvSLoSMux6wPV0xM0IE1iD0ulVpHjTJrwp80ph4
+bfUFVr/vrD1y7EKbf1PD363ZKvJhWhvQWDgETsM1vgLf21PfWO7C2kflIAsWsdQ
1ukD5nCBRlP4K73hG7bdM6kRztXhUdR0SHg85/t0KB/ExiVqtcXIzB60D0G1lENd
QlKbq3O4lim1WGuhazQY
=5fo2
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more ACPI and power management updates from Rafael Wysocki:
"These are regression fixes (leds-gpio, ACPI backlight driver,
operating performance points library, ACPI device enumeration
messages, cpupower tool), other bug fixes (ACPI EC driver, ACPI device
PM), some cleanups in the operating performance points (OPP)
framework, continuation of CONFIG_PM_RUNTIME elimination, a couple of
minor intel_pstate driver changes, a new MAINTAINERS entry for it and
an ACPI fan driver change needed for better support of thermal
management in user space.
Specifics:
- Fix a regression in leds-gpio introduced by a recent commit that
inadvertently changed the name of one of the properties used by the
driver (Fabio Estevam).
- Fix a regression in the ACPI backlight driver introduced by a
recent fix that missed one special case that had to be taken into
account (Aaron Lu).
- Drop the level of some new kernel messages from the ACPI core
introduced by a recent commit to KERN_DEBUG which they should have
used from the start and drop some other unuseful KERN_ERR messages
printed by ACPI (Rafael J Wysocki).
- Revert an incorrect commit modifying the cpupower tool (Prarit
Bhargava).
- Fix two regressions introduced by recent commits in the OPP library
and clean up some existing minor issues in that code (Viresh
Kumar).
- Continue to replace CONFIG_PM_RUNTIME with CONFIG_PM throughout the
tree (or drop it where that can be done) in order to make it
possible to eliminate CONFIG_PM_RUNTIME (Rafael J Wysocki, Ulf
Hansson, Ludovic Desroches).
There will be one more "CONFIG_PM_RUNTIME removal" batch after this
one, because some new uses of it have been introduced during the
current merge window, but that should be sufficient to finally get
rid of it.
- Make the ACPI EC driver more robust against race conditions related
to GPE handler installation failures (Lv Zheng).
- Prevent the ACPI device PM core code from attempting to disable
GPEs that it has not enabled which confuses ACPICA and makes it
report errors unnecessarily (Rafael J Wysocki).
- Add a "force" command line switch to the intel_pstate driver to
make it possible to override the blacklisting of some systems in
that driver if needed (Ethan Zhao).
- Improve intel_pstate code documentation and add a MAINTAINERS entry
for it (Kristen Carlson Accardi).
- Make the ACPI fan driver create cooling device interfaces witn
names that reflect the IDs of the ACPI device objects they are
associated with, except for "generic" ACPI fans (PNP ID "PNP0C0B").
That's necessary for user space thermal management tools to be able
to connect the fans with the parts of the system they are supposed
to be cooling properly. From Srinivas Pandruvada"
* tag 'pm+acpi-3.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (32 commits)
MAINTAINERS: add entry for intel_pstate
ACPI / video: update the skip case for acpi_video_device_in_dod()
power / PM: Eliminate CONFIG_PM_RUNTIME
NFC / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
SCSI / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
ACPI / EC: Fix unexpected ec_remove_handlers() invocations
Revert "tools: cpupower: fix return checks for sysfs_get_idlestate_count()"
tracing / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
x86 / PM: Replace CONFIG_PM_RUNTIME in io_apic.c
PM: Remove the SET_PM_RUNTIME_PM_OPS() macro
mmc: atmel-mci: use SET_RUNTIME_PM_OPS() macro
PM / Kconfig: Replace PM_RUNTIME with PM in dependencies
ARM / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
sound / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
phy / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
video / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
tty / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
spi: Replace CONFIG_PM_RUNTIME with CONFIG_PM
ACPI / PM: Do not disable wakeup GPEs that have not been enabled
ACPI / utils: Drop error messages from acpi_evaluate_reference()
...
- spring cleaning: removed support for IA64, and for hardware-assisted
virtualization on the PPC970
- ARM, PPC, s390 all had only small fixes
For x86:
- small performance improvements (though only on weird guests)
- usual round of hardware-compliancy fixes from Nadav
- APICv fixes
- XSAVES support for hosts and guests. XSAVES hosts were broken because
the (non-KVM) XSAVES patches inadvertently changed the KVM userspace
ABI whenever XSAVES was enabled; hence, this part is going to stable.
Guest support is just a matter of exposing the feature and CPUID leaves
support.
Right now KVM is broken for PPC BookE in your tree (doesn't compile).
I'll reply to the pull request with a patch, please apply it either
before the pull request or in the merge commit, in order to preserve
bisectability somewhat.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJUkpg+AAoJEL/70l94x66DUmoH/jzXYkptSW9NGgm79KqxGJlD
lzLnLBkitVvx++Mz5YBhdJEhKKLUlCtifFT1zPJQ/pthQhIRSaaAwZyNGgUs5w5x
yMGKHiPQFyZRbmQtZhCInW0BftJoYHHciO3nUfHCZnp34My9MP2D55W7/z+fYFfQ
DuqBSE9ThyZJtZ4zh8NRA9fCOeuqwVYRyoBs820Wbsh4cpIBoIK63Dg7k+CLE+ZV
MZa/mRL6bAfsn9W5bnOUAgHJ3SPznnWbO3/g0aV+roL/5pffblprJx9lKNR08xUM
6hDFLop2gDehDJesDkY/o8Ckp1hEouvfsVpSShry4vcgtn0hgh2O5/6Orbmj6vE=
=Zwq1
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM update from Paolo Bonzini:
"3.19 changes for KVM:
- spring cleaning: removed support for IA64, and for hardware-
assisted virtualization on the PPC970
- ARM, PPC, s390 all had only small fixes
For x86:
- small performance improvements (though only on weird guests)
- usual round of hardware-compliancy fixes from Nadav
- APICv fixes
- XSAVES support for hosts and guests. XSAVES hosts were broken
because the (non-KVM) XSAVES patches inadvertently changed the KVM
userspace ABI whenever XSAVES was enabled; hence, this part is
going to stable. Guest support is just a matter of exposing the
feature and CPUID leaves support"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (179 commits)
KVM: move APIC types to arch/x86/
KVM: PPC: Book3S: Enable in-kernel XICS emulation by default
KVM: PPC: Book3S HV: Improve H_CONFER implementation
KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register
KVM: PPC: Book3S HV: Remove code for PPC970 processors
KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions
KVM: PPC: Book3S HV: Simplify locking around stolen time calculations
arch: powerpc: kvm: book3s_paired_singles.c: Remove unused function
arch: powerpc: kvm: book3s_pr.c: Remove unused function
arch: powerpc: kvm: book3s.c: Remove some unused functions
arch: powerpc: kvm: book3s_32_mmu.c: Remove unused function
KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked
KVM: PPC: Book3S HV: ptes are big endian
KVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPI
KVM: PPC: Book3S HV: Fix KSM memory corruption
KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI
KVM: PPC: Book3S HV: Fix computation of tlbie operand
KVM: PPC: Book3S HV: Add missing HPTE unlock
KVM: PPC: BookE: Improve irq inject tracepoint
arm/arm64: KVM: Require in-kernel vgic for the arch timers
...
It turns out that there's a lurking ABI issue. GCC, when
compiling this in a 32-bit program:
struct user_desc desc = {
.entry_number = idx,
.base_addr = base,
.limit = 0xfffff,
.seg_32bit = 1,
.contents = 0, /* Data, grow-up */
.read_exec_only = 0,
.limit_in_pages = 1,
.seg_not_present = 0,
.useable = 0,
};
will leave .lm uninitialized. This means that anything in the
kernel that reads user_desc.lm for 32-bit tasks is unreliable.
Revert the .lm check in set_thread_area(). The value never did
anything in the first place.
Fixes: 0e58af4e1d ("x86/tls: Disallow unusual TLS segments")
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org # Only if 0e58af4e1d is backported
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/d7875b60e28c512f6a6fc0baf5714d58e7eaadbb.1418856405.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Use helpers to access irq_cfg data structure associated with IRQ,
instead of accessing irq_data->chip_data directly. Later we can
rewrite those helpers to support hierarchy irqdomain.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1414397531-28254-17-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Clean up code by moving IOAPIC related declarations from hw_irq.h into
io_apic.h.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: Aubrey <aubrey.li@linux.intel.com>
Cc: Ryan Desfosses <ryan@desfo.org>
Cc: Quentin Lambert <lambert.quentin@gmail.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: http://lkml.kernel.org/r/1414397531-28254-14-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Create arch/x86/kernel/apic/vector.c to host local APIC related code,
prepare for making MSI/HT_IRQ independent of IOAPIC.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Grant Likely <grant.likely@linaro.org>
Link: http://lkml.kernel.org/r/1414397531-28254-10-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Change irq_cfg() from static to extern, also introduce helper function
irqd_cfg(). Later we can rewrite these two helpers when enabling
hierarchy irqdomain.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Grant Likely <grant.likely@linaro.org>
Link: http://lkml.kernel.org/r/1414397531-28254-9-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Function __clear_irq_vector() accesses vector related data structure,
so protect it with vector_lock. Also rename it as clear_irq_vector().
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1414397531-28254-8-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Rename local APIC related functions in io_apic.c as apic_xxx() instead
of ioapic_xxx(), later they will be moved into separate file.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Grant Likely <grant.likely@linaro.org>
Link: http://lkml.kernel.org/r/1414397531-28254-7-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Use generic list to replace private list implementation so we can use
the existing helper functions.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Grant Likely <grant.likely@linaro.org>
Link: http://lkml.kernel.org/r/1414397531-28254-5-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Joerg Roedel <joro@8bytes.org>
None of the callers requires irq_attr to be filled
in. IO_APIC_get_PCI_irq_vector() does not do anything useful with it
either.
Remove the parameter and fixup the call sites.
[ tglx: Massaged changelog ]
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Ryan Desfosses <ryan@desfo.org>
Cc: Quentin Lambert <lambert.quentin@gmail.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: http://lkml.kernel.org/r/1414397531-28254-4-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The IOAPIC has logic to track IOAPIC pin status, so there's no need for
special treatment for GSI used by ACPI SCI in function mp_register_gsi()
and mp_unregister_gsi().
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Link: http://lkml.kernel.org/r/1414397531-28254-2-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Introduce acpi_ioapic_registered() to check whether an IOAPIC has already
been registered, it will be used when enabling IOAPIC hotplug.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Len Brown <len.brown@intel.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1414387308-27148-18-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Implement acpi_unregister_ioapic() to support ACPI based IOAPIC hot-removal.
An IOAPIC could only be removed when all its pins are unused.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1414387308-27148-17-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Implement acpi_register_ioapic() and enhance mp_register_ioapic()
to support ACPI based IOAPIC hot-addition.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1414387308-27148-16-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We are going to support ACPI based IOAPIC hotplug, so introduce a mutex
to protect IOAPIC data structures from IOAPIC hotplug. We choose to
serialize in ACPI instead of in the IOAPIC core because:
1) currently we only plan to support ACPI based IOAPIC hotplug
2) it's much more cleaner and easier
3) It does't affect IOAPIC discovered by devicetree, SFI and mpparse.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Link: http://lkml.kernel.org/r/1414908273-7552-15-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Refine mp_register_ioapic() to prepare for IOAPIC hotplug by:
1) change return value from void to int.
2) check for gsi range conflicts
3) check for IOAPIC physical address conflicts
4) enhance the way to allocate IOAPIC index
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1414387308-27148-14-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Remove __init marker for functions which will be used by IOAPIC hotplug
at runtime.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1414387308-27148-12-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Perfer the assigned ID in APIC ID register for x86_64 if it's still
available.
[ tglx: Added comments ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1414387308-27148-11-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Split out alloc_ioapic_save_registers() from early_irq_init(),
so it could be used for ioapic hotplug later.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1414387308-27148-10-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
There is no reason to keep preemption disabled in this function.
We only have two other threads live: kthreadd and idle. Neither of
them is going to preempt. But that preempt_disable forces all the code
inside to do GFP_ATOMIC allocations which is just insane.
Remove it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20141205084147.153643952@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When X86_LOCAL_APIC (i.e. unconditionally on x86-64),
first_system_vector will never end up being higher than
LOCAL_TIMER_VECTOR (0xef), and hence building stubs for vectors
0xef...0xff is pointlessly reducing code density. Deal with this at
build time already.
Taking into consideration that X86_64 implies X86_LOCAL_APIC, also
simplify (and hence make easier to read and more consistent with the
change done here) some #if-s in arch/x86/kernel/irqinit.c.
While we could further improve the packing of the IRQ entry stubs (the
four ones now left in the last set could be fit into the four padding
bytes each of the final four sets have) this doesn't seem to provide
any real benefit: Both irq_entries_start and common_interrupt getting
cache line aligned, eliminating the 30th set would just produce 32
bytes of padding between the 29th and common_interrupt.
[ tglx: Folded lguest fix from Dan Carpenter ]
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: lguest@lists.ozlabs.org
Cc: Rusty Russell <rusty@rustcorp.com.au>
Link: http://lkml.kernel.org/r/54574D5F0200007800044389@mail.emea.novell.com
Link: http://lkml.kernel.org/r/20141115185718.GB6530@mwanda
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
While f3761db164 ("x86, irq: Fix build error caused by
9eabc99a635a77cbf09") addressed the original build problem,
declaration, inline stub, and definition still seem misplaced: It isn't
really IO-APIC related, and it's being used solely in arch/x86/pci/.
This also means stubbing it out when !CONFIG_X86_IO_APIC was at least
questionable.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Link: http://lkml.kernel.org/r/545747BE020000780004436E@mail.emea.novell.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix building warning when CONFIG_X86_LOCAL_APIC is disabled.
arch/x86/kernel/acpi/boot.c:699:20: warning: ‘acpi_set_irq_model_ioapic’ defined but not used [-Wunused-function]
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Link: http://lkml.kernel.org/r/1414397531-28254-3-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull drm updates from Dave Airlie:
"Highlights:
- AMD KFD driver merge
This is the AMD HSA interface for exposing a lowlevel interface for
GPGPU use. They have an open source userspace built on top of this
interface, and the code looks as good as it was going to get out of
tree.
- Initial atomic modesetting work
The need for an atomic modesetting interface to allow userspace to
try and send a complete set of modesetting state to the driver has
arisen, and been suffering from neglect this past year. No more,
the start of the common code and changes for msm driver to use it
are in this tree. Ongoing work to get the userspace ioctl finished
and the code clean will probably wait until next kernel.
- DisplayID 1.3 and tiled monitor exposed to userspace.
Tiled monitor property is now exposed for userspace to make use of.
- Rockchip drm driver merged.
- imx gpu driver moved out of staging
Other stuff:
- core:
panel - MIPI DSI + new panels.
expose suggested x/y properties for virtual GPUs
- i915:
Initial Skylake (SKL) support
gen3/4 reset work
start of dri1/ums removal
infoframe tracking
fixes for lots of things.
- nouveau:
tegra k1 voltage support
GM204 modesetting support
GT21x memory reclocking work
- radeon:
CI dpm fixes
GPUVM improvements
Initial DPM fan control
- rcar-du:
HDMI support added
removed some support for old boards
slave encoder driver for Analog Devices adv7511
- exynos:
Exynos4415 SoC support
- msm:
a4xx gpu support
atomic helper conversion
- tegra:
iommu support
universal plane support
ganged-mode DSI support
- sti:
HDMI i2c improvements
- vmwgfx:
some late fixes.
- qxl:
use suggested x/y properties"
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (969 commits)
drm: sti: fix module compilation issue
drm/i915: save/restore GMBUS freq across suspend/resume on gen4
drm: sti: correctly cleanup CRTC and planes
drm: sti: add HQVDP plane
drm: sti: add cursor plane
drm: sti: enable auxiliary CRTC
drm: sti: fix delay in VTG programming
drm: sti: prepare sti_tvout to support auxiliary crtc
drm: sti: use drm_crtc_vblank_{on/off} instead of drm_vblank_{on/off}
drm: sti: fix hdmi avi infoframe
drm: sti: remove event lock while disabling vblank
drm: sti: simplify gdp code
drm: sti: clear all mixer control
drm: sti: remove gpio for HDMI hot plug detection
drm: sti: allow to change hdmi ddc i2c adapter
drm/doc: Document drm_add_modes_noedid() usage
drm/i915: Remove '& 0xffff' from the mask given to WA_REG()
drm/i915: Invert the mask and val arguments in wa_add() and WA_REG()
drm: Zero out DRM object memory upon cleanup
drm/i915/bdw: Fix the write setting up the WIZ hashing mode
...
Here's the set of driver core patches for 3.19-rc1.
They are dominated by the removal of the .owner field in platform
drivers. They touch a lot of files, but they are "simple" changes, just
removing a line in a structure.
Other than that, a few minor driver core and debugfs changes. There are
some ath9k patches coming in through this tree that have been acked by
the wireless maintainers as they relied on the debugfs changes.
Everything has been in linux-next for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlSOD20ACgkQMUfUDdst+ylLPACg2QrW1oHhdTMT9WI8jihlHVRM
53kAoLeteByQ3iVwWurwwseRPiWa8+MI
=OVRS
-----END PGP SIGNATURE-----
Merge tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core update from Greg KH:
"Here's the set of driver core patches for 3.19-rc1.
They are dominated by the removal of the .owner field in platform
drivers. They touch a lot of files, but they are "simple" changes,
just removing a line in a structure.
Other than that, a few minor driver core and debugfs changes. There
are some ath9k patches coming in through this tree that have been
acked by the wireless maintainers as they relied on the debugfs
changes.
Everything has been in linux-next for a while"
* tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (324 commits)
Revert "ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries"
fs: debugfs: add forward declaration for struct device type
firmware class: Deletion of an unnecessary check before the function call "vunmap"
firmware loader: fix hung task warning dump
devcoredump: provide a one-way disable function
device: Add dev_<level>_once variants
ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries
ath: use seq_file api for ath9k debugfs files
debugfs: add helper function to create device related seq_file
drivers/base: cacheinfo: remove noisy error boot message
Revert "core: platform: add warning if driver has no owner"
drivers: base: support cpu cache information interface to userspace via sysfs
drivers: base: add cpu_device_create to support per-cpu devices
topology: replace custom attribute macros with standard DEVICE_ATTR*
cpumask: factor out show_cpumap into separate helper function
driver core: Fix unbalanced device reference in drivers_probe
driver core: fix race with userland in device_add()
sysfs/kernfs: make read requests on pre-alloc files use the buffer.
sysfs/kernfs: allow attributes to request write buffer be pre-allocated.
fs: sysfs: return EGBIG on write if offset is larger than file size
...
Pull x86 fixes from Ingo Molnar:
"Misc fixes (mainly Andy's TLS fixes), plus a cleanup"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tls: Disallow unusual TLS segments
x86/tls: Validate TLS entries to protect espfix
MAINTAINERS: Add me as x86 VDSO submaintainer
x86/asm: Unify segment selector defines
x86/asm: Guard against building the 32/64-bit versions of the asm-offsets*.c file directly
x86_64, switch_to(): Load TLS descriptors before switching DS and ES
x86/mm: Use min() instead of min_t() in the e820 printout code
x86/mm: Fix zone ranges boot printout
x86/doc: Update documentation after file shuffling
Users have no business installing custom code segments into the
GDT, and segments that are not present but are otherwise valid
are a historical source of interesting attacks.
For completeness, block attempts to set the L bit. (Prior to
this patch, the L bit would have been silently dropped.)
This is an ABI break. I've checked glibc, musl, and Wine, and
none of them look like they'll have any trouble.
Note to stable maintainers: this is a hardening patch that fixes
no known bugs. Given the possibility of ABI issues, this
probably shouldn't be backported quickly.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: stable@vger.kernel.org # optional
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: security@kernel.org <security@kernel.org>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Installing a 16-bit RW data segment into the GDT defeats espfix.
AFAICT this will not affect glibc, Wine, or dosemu at all.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: stable@vger.kernel.org
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: security@kernel.org <security@kernel.org>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
1) Discovered by Fengguang Wu's tests. I changed the parameters to
the function graph x86 prepare_ftrace_return call but forgot
to update the call from entry_32 (i386 version). This patch corrects
that.
2) I was tracing some code and found that the sched_switch tracepoint
was showing tasks in the INTERRUPTIBLE state as RUNNING. This was
due to the updates to convert preempt_count into a per_cpu variable.
The tracepoint logic was made to use the tasks saved_preempt_count
which could hold a stale "PREEMPT_ACTIVE", instead of using the
current preempt_count() call.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEbBAABAgAGBQJUi75HAAoJEEjnJuOKh9ldYVoH+LZsoT0m5UVQT8vQmahXABnY
A3l19KGUAL3qaWRf4Au50sK2NdBTfGjvt8KGNkskPsvVv5X3z0GoXIIA76SD/MtX
ysyLUXGCayNCqb3akuzznGZxE8CNKcU5aj3Hy+hIvRI6tgg2sEDt67QBAYsukIOR
MN3us7ezvh0r+8muyPdrpC2OtkwsC1QvX2My1km0UU67CcWxo8zIuzUeeMSe+1+4
6eCjLsVWAznaTo5W9i2DTKpw85hZjfAFgaGn21yRrAHbC+REpaigB9mxZg3Bb1Qb
SovdyGSSEspke4/0Pu7bVXW/lx7dlnEsNWqc0RHWY1nd5FINQY+tRfbmgSoPRA==
=DMsV
-----END PGP SIGNATURE-----
Merge tag 'trace-fixes-v3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Here's two fixes:
1) Discovered by Fengguang Wu's tests. I changed the parameters to
the function graph x86 prepare_ftrace_return call but forgot to
update the call from entry_32 (i386 version). This patch corrects
that.
2) I was tracing some code and found that the sched_switch tracepoint
was showing tasks in the INTERRUPTIBLE state as RUNNING. This was
due to the updates to convert preempt_count into a per_cpu
variable. The tracepoint logic was made to use the tasks
saved_preempt_count which could hold a stale "PREEMPT_ACTIVE",
instead of using the current preempt_count() call"
* tag 'trace-fixes-v3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing/sched: Check preempt_count() for current when reading task->state
ftrace/x86: Update i386 call to prepare_ftrace_return()
After commit b2b49ccbdd (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is
selected) PM_RUNTIME is always set if PM is set, so #ifdef blocks
depending on CONFIG_PM_RUNTIME may now be changed to depend on
CONFIG_PM.
Replace CONFIG_PM_RUNTIME with CONFIG_PM in x86/kernel/apic/io_apic.c.
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The parameters for prepare_ftrace_return() used by the function graph
tracer were swapped to simplify the code on x86_64. But i386 function
graph trampoline also calls this function, and it did not have its
parameters swapped.
Link: http://lkml.kernel.org/r/20141210231732.GA24163@wfg-t540p.sh.intel.com
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Fixes: 6a06bdbf7f "ftrace/fgraph/x86: Have prepare_ftrace_return() take ip as first parameter"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Sometimes it is helpful to build a kernel compilation unit
directly, i.e.:
make .../<filename>.i
in order to look at compiler output.
Since asm-offsets_{32,64}.c are included by asm-offsets.c and
building them directly doesn't evaluate the macros used (thus
making the preprocessor output not very useful), error out when
an attempt is made to build them. Issue a hint for the user to
build asm-offsets.c instead.
Suggested-by: Michael Matz <matz@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1418139917-12722-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The type of "MAX_DMA_PFN" and "xXx_pfn" are both unsigned long
now, so use min() instead of min_t().
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
Cc: Linux MM <linux-mm@kvack.org>
Cc: <dave@sr71.net>
Cc: Rik van Riel <riel@redhat.com>
Link: http://lkml.kernel.org/r/5487AB3F.7050807@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The uncore_collect_events functions assumes that event group
might contain only uncore events which is wrong, because it
might contain any type of events.
This bug leads to uncore framework touching 'not' uncore events,
which could end up all sorts of bugs.
One was triggered by Vince's perf fuzzer, when the uncore code
touched breakpoint event private event space as if it was uncore
event and caused BUG:
BUG: unable to handle kernel paging request at ffffffff82822068
IP: [<ffffffff81020338>] uncore_assign_events+0x188/0x250
...
The code in uncore_assign_events() function was looking for
event->hw.idx data while the event was initialized as a
breakpoint with different members in event->hw union.
This patch forces uncore_collect_events() to collect only uncore
events.
Reported-by: Vince Weaver <vince@deater.net>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Yan, Zheng <zheng.z.yan@intel.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1418243031-20367-2-git-send-email-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This time we have some more new material than we used to have during
the last couple of development cycles.
The most important part of it to me is the introduction of a unified
interface for accessing device properties provided by platform
firmware. It works with Device Trees and ACPI in a uniform way and
drivers using it need not worry about where the properties come
from as long as the platform firmware (either DT or ACPI) makes
them available. It covers both devices and "bare" device node
objects without struct device representation as that turns out to
be necessary in some cases. This has been in the works for quite
a few months (and development cycles) and has been approved by
all of the relevant maintainers.
On top of that, some drivers are switched over to the new interface
(at25, leds-gpio, gpio_keys_polled) and some additional changes are
made to the core GPIO subsystem to allow device drivers to manipulate
GPIOs in the "canonical" way on platforms that provide GPIO information
in their ACPI tables, but don't assign names to GPIO lines (in which
case the driver needs to do that on the basis of what it knows about
the device in question). That also has been approved by the GPIO
core maintainers and the rfkill driver is now going to use it.
Second is support for hardware P-states in the intel_pstate driver.
It uses CPUID to detect whether or not the feature is supported by
the processor in which case it will be enabled by default. However,
it can be disabled entirely from the kernel command line if necessary.
Next is support for a platform firmware interface based on ACPI
operation regions used by the PMIC (Power Management Integrated
Circuit) chips on the Intel Baytrail-T and Baytrail-T-CR platforms.
That interface is used for manipulating power resources and for
thermal management: sensor temperature reporting, trip point setting
and so on.
Also the ACPI core is now going to support the _DEP configuration
information in a limited way. Basically, _DEP it supposed to reflect
off-the-hierarchy dependencies between devices which may be very
indirect, like when AML for one device accesses locations in an
operation region handled by another device's driver (usually, the
device depended on this way is a serial bus or GPIO controller).
The support added this time is sufficient to make the ACPI battery
driver work on Asus T100A, but it is general enough to be able to
cover some other use cases in the future.
Finally, we have a new cpufreq driver for the Loongson1B processor.
In addition to the above, there are fixes and cleanups all over the
place as usual and a traditional ACPICA update to a recent upstream
release.
As far as the fixes go, the ACPI LPSS (Low-power Subsystem) driver
for Intel platforms should be able to handle power management of
the DMA engine correctly, the cpufreq-dt driver should interact
with the thermal subsystem in a better way and the ACPI backlight
driver should handle some more corner cases, among other things.
On top of the ACPICA update there are fixes for race conditions
in the ACPICA's interrupt handling code which might lead to some
random and strange looking failures on some systems.
In the cleanups department the most visible part is the series
of commits targeted at getting rid of the CONFIG_PM_RUNTIME
configuration option. That was triggered by a discussion
regarding the generic power domains code during which we realized
that trying to support certain combinations of PM config options
was painful and not really worth it, because nobody would use them
in production anyway. For this reason, we decided to make
CONFIG_PM_SLEEP select CONFIG_PM_RUNTIME and that lead to the
conclusion that the latter became redundant and CONFIG_PM could
be used instead of it. The material here makes that replacement
in a major part of the tree, but there will be at least one more
batch of that in the second part of the merge window.
Specifics:
- Support for retrieving device properties information from ACPI
_DSD device configuration objects and a unified device properties
interface for device drivers (and subsystems) on top of that.
As stated above, this works with Device Trees and ACPI and allows
device drivers to be written in a platform firmware (DT or ACPI)
agnostic way. The at25, leds-gpio and gpio_keys_polled drivers
are now going to use this new interface and the GPIO subsystem
is additionally modified to allow device drivers to assign names
to GPIO resources returned by ACPI _CRS objects (in case _DSD is
not present or does not provide the expected data). The changes
in this set are mostly from Mika Westerberg, Rafael J Wysocki,
Aaron Lu, and Darren Hart with some fixes from others (Fabio Estevam,
Geert Uytterhoeven).
- Support for Hardware Managed Performance States (HWP) as described
in Volume 3, section 14.4, of the Intel SDM in the intel_pstate
driver. CPUID is used to detect whether or not the feature is
supported by the processor. If supported, it will be enabled
automatically unless the intel_pstate=no_hwp switch is present in
the kernel command line. From Dirk Brandewie.
- New Intel Broadwell-H ID for intel_pstate (Dirk Brandewie).
- Support for firmware interface based on ACPI operation regions
used by the PMIC chips on the Intel Baytrail-T and Baytrail-T-CR
platforms for power resource control and thermal management
(Aaron Lu).
- Limited support for retrieving off-the-hierarchy dependencies
between devices from ACPI _DEP device configuration objects
and deferred probing support for the ACPI battery driver based
on the _DEP information to make that driver work on Asus T100A
(Lan Tianyu).
- New cpufreq driver for the Loongson1B processor (Kelvin Cheung).
- ACPICA update to upstream revision 20141107 which only affects
tools (Bob Moore).
- Fixes for race conditions in the ACPICA's interrupt handling
code and in the ACPI code related to system suspend and resume
(Lv Zheng and Rafael J Wysocki).
- ACPI core fix for an RCU-related issue in the ioremap() regions
management code that slowed down significantly after CPUs had
been allowed to enter idle states even if they'd had RCU callbakcs
queued and triggered some problems in certain proprietary graphics
driver (and elsewhere). The fix replaces synchronize_rcu() in
that code with synchronize_rcu_expedited() which makes the issue
go away. From Konstantin Khlebnikov.
- ACPI LPSS (Low-Power Subsystem) driver fix to handle power
management of the DMA engine included into the LPSS correctly.
The problem is that the DMA engine doesn't have ACPI PM support
of its own and it simply is turned off when the last LPSS device
having ACPI PM support goes into D3cold. To work around that,
the PM domain used by the ACPI LPSS driver is redesigned so at
least one device with ACPI PM support will be on as long as the
DMA engine is in use. From Andy Shevchenko.
- ACPI backlight driver fix to avoid using it on "Win8-compatible"
systems where it doesn't work and where it was used by default by
mistake (Aaron Lu).
- Assorted minor ACPI core fixes and cleanups from Tomasz Nowicki,
Sudeep Holla, Huang Rui, Hanjun Guo, Fabian Frederick, and
Ashwin Chaugule (mostly related to the upcoming ARM64 support).
- Intel RAPL (Running Average Power Limit) power capping driver
fixes and improvements including new processor IDs (Jacob Pan).
- Generic power domains modification to power up domains after
attaching devices to them to meet the expectations of device
drivers and bus types assuming devices to be accessible at
probe time (Ulf Hansson).
- Preliminary support for controlling device clocks from the
generic power domains core code and modifications of the
ARM/shmobile platform to use that feature (Ulf Hansson).
- Assorted minor fixes and cleanups of the generic power
domains core code (Ulf Hansson, Geert Uytterhoeven).
- Assorted minor fixes and cleanups of the device clocks control
code in the PM core (Geert Uytterhoeven, Grygorii Strashko).
- Consolidation of device power management Kconfig options by making
CONFIG_PM_SLEEP select CONFIG_PM_RUNTIME and removing the latter
which is now redundant (Rafael J Wysocki and Kevin Hilman). That
is the first batch of the changes needed for this purpose.
- Core device runtime power management support code cleanup related
to the execution of callbacks (Andrzej Hajda).
- cpuidle ARM support improvements (Lorenzo Pieralisi).
- cpuidle cleanup related to the CPUIDLE_FLAG_TIME_VALID flag and
a new MAINTAINERS entry for ARM Exynos cpuidle (Daniel Lezcano and
Bartlomiej Zolnierkiewicz).
- New cpufreq driver callback (->ready) to be executed when the
cpufreq core is ready to use a given policy object and cpufreq-dt
driver modification to use that callback for cooling device
registration (Viresh Kumar).
- cpufreq core fixes and cleanups (Viresh Kumar, Vince Hsu,
James Geboski, Tomeu Vizoso).
- Assorted fixes and cleanups in the cpufreq-pcc, intel_pstate,
cpufreq-dt, pxa2xx cpufreq drivers (Lenny Szubowicz, Ethan Zhao,
Stefan Wahren, Petr Cvek).
- OPP (Operating Performance Points) framework modification to
allow OPPs to be removed too and update of a few cpufreq drivers
(cpufreq-dt, exynos5440, imx6q, cpufreq) to remove OPPs (added
during initialization) on driver removal (Viresh Kumar).
- Hibernation core fixes and cleanups (Tina Ruchandani and
Markus Elfring).
- PM Kconfig fix related to CPU power management (Pankaj Dubey).
- cpupower tool fix (Prarit Bhargava).
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJUhj6JAAoJEILEb/54YlRxTM4P/j5g5SfqvY0QKsn7sR7MGZ6v
nsgCBhJAqTw3ocNC7EAs8z9h2GWy1KbKpakKYWAh9Fs1yZoey7tFSlcv/Rgjlp70
uU5sDQHtpE9mHKiymdsowiQuWgpl962L4k+k8hUslhlvgk1PvVbpajR6OqG8G+pD
asuIW9eh1APNkLyXmRJ3ZPomzs0VmRdZJ0NEs0lKX9mJskqEvxPIwdaxq3iaJq9B
Fo0J345zUDcJnxWblDRdHlOigCimglElfN5qJwaC4KpwUKuBvLRKbp4f69+wfT0c
kYFiR29X5KjJ2kLfP/wKsLyuDCYYXRq3tCia5M1tAqOjZ+UA89H/GDftx/5lntmv
qUlBa35VfdS1SX4HyApZitOHiLgo+It/hl8Z9bJnhyVw66NxmMQ8JYN2imb8Lhqh
XCLR7BxLTah82AapLJuQ0ZDHPzZqMPG2veC2vAzRMYzVijict/p4Y2+qBqONltER
4rs9uRVn+hamX33lCLg8BEN8zqlnT3rJFIgGaKjq/wXHAU/zpE9CjOrKMQcAg9+s
t51XMNPwypHMAYyGVhEL89ImjXnXxBkLRuquhlmEpvQchIhR+mR3dLsarGn7da44
WPIQJXzcsojXczcwwfqsJCR4I1FTFyQIW+UNh02GkDRgRovQqo+Jk762U7vQwqH+
LBdhvVaS1VW4v+FWXEoZ
=5dox
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management updates from Rafael Wysocki:
"This time we have some more new material than we used to have during
the last couple of development cycles.
The most important part of it to me is the introduction of a unified
interface for accessing device properties provided by platform
firmware. It works with Device Trees and ACPI in a uniform way and
drivers using it need not worry about where the properties come from
as long as the platform firmware (either DT or ACPI) makes them
available. It covers both devices and "bare" device node objects
without struct device representation as that turns out to be necessary
in some cases. This has been in the works for quite a few months (and
development cycles) and has been approved by all of the relevant
maintainers.
On top of that, some drivers are switched over to the new interface
(at25, leds-gpio, gpio_keys_polled) and some additional changes are
made to the core GPIO subsystem to allow device drivers to manipulate
GPIOs in the "canonical" way on platforms that provide GPIO
information in their ACPI tables, but don't assign names to GPIO lines
(in which case the driver needs to do that on the basis of what it
knows about the device in question). That also has been approved by
the GPIO core maintainers and the rfkill driver is now going to use
it.
Second is support for hardware P-states in the intel_pstate driver.
It uses CPUID to detect whether or not the feature is supported by the
processor in which case it will be enabled by default. However, it
can be disabled entirely from the kernel command line if necessary.
Next is support for a platform firmware interface based on ACPI
operation regions used by the PMIC (Power Management Integrated
Circuit) chips on the Intel Baytrail-T and Baytrail-T-CR platforms.
That interface is used for manipulating power resources and for
thermal management: sensor temperature reporting, trip point setting
and so on.
Also the ACPI core is now going to support the _DEP configuration
information in a limited way. Basically, _DEP it supposed to reflect
off-the-hierarchy dependencies between devices which may be very
indirect, like when AML for one device accesses locations in an
operation region handled by another device's driver (usually, the
device depended on this way is a serial bus or GPIO controller). The
support added this time is sufficient to make the ACPI battery driver
work on Asus T100A, but it is general enough to be able to cover some
other use cases in the future.
Finally, we have a new cpufreq driver for the Loongson1B processor.
In addition to the above, there are fixes and cleanups all over the
place as usual and a traditional ACPICA update to a recent upstream
release.
As far as the fixes go, the ACPI LPSS (Low-power Subsystem) driver for
Intel platforms should be able to handle power management of the DMA
engine correctly, the cpufreq-dt driver should interact with the
thermal subsystem in a better way and the ACPI backlight driver should
handle some more corner cases, among other things.
On top of the ACPICA update there are fixes for race conditions in the
ACPICA's interrupt handling code which might lead to some random and
strange looking failures on some systems.
In the cleanups department the most visible part is the series of
commits targeted at getting rid of the CONFIG_PM_RUNTIME configuration
option. That was triggered by a discussion regarding the generic
power domains code during which we realized that trying to support
certain combinations of PM config options was painful and not really
worth it, because nobody would use them in production anyway. For
this reason, we decided to make CONFIG_PM_SLEEP select
CONFIG_PM_RUNTIME and that lead to the conclusion that the latter
became redundant and CONFIG_PM could be used instead of it. The
material here makes that replacement in a major part of the tree, but
there will be at least one more batch of that in the second part of
the merge window.
Specifics:
- Support for retrieving device properties information from ACPI _DSD
device configuration objects and a unified device properties
interface for device drivers (and subsystems) on top of that. As
stated above, this works with Device Trees and ACPI and allows
device drivers to be written in a platform firmware (DT or ACPI)
agnostic way. The at25, leds-gpio and gpio_keys_polled drivers are
now going to use this new interface and the GPIO subsystem is
additionally modified to allow device drivers to assign names to
GPIO resources returned by ACPI _CRS objects (in case _DSD is not
present or does not provide the expected data). The changes in
this set are mostly from Mika Westerberg, Rafael J Wysocki, Aaron
Lu, and Darren Hart with some fixes from others (Fabio Estevam,
Geert Uytterhoeven).
- Support for Hardware Managed Performance States (HWP) as described
in Volume 3, section 14.4, of the Intel SDM in the intel_pstate
driver. CPUID is used to detect whether or not the feature is
supported by the processor. If supported, it will be enabled
automatically unless the intel_pstate=no_hwp switch is present in
the kernel command line. From Dirk Brandewie.
- New Intel Broadwell-H ID for intel_pstate (Dirk Brandewie).
- Support for firmware interface based on ACPI operation regions used
by the PMIC chips on the Intel Baytrail-T and Baytrail-T-CR
platforms for power resource control and thermal management (Aaron
Lu).
- Limited support for retrieving off-the-hierarchy dependencies
between devices from ACPI _DEP device configuration objects and
deferred probing support for the ACPI battery driver based on the
_DEP information to make that driver work on Asus T100A (Lan
Tianyu).
- New cpufreq driver for the Loongson1B processor (Kelvin Cheung).
- ACPICA update to upstream revision 20141107 which only affects
tools (Bob Moore).
- Fixes for race conditions in the ACPICA's interrupt handling code
and in the ACPI code related to system suspend and resume (Lv Zheng
and Rafael J Wysocki).
- ACPI core fix for an RCU-related issue in the ioremap() regions
management code that slowed down significantly after CPUs had been
allowed to enter idle states even if they'd had RCU callbakcs
queued and triggered some problems in certain proprietary graphics
driver (and elsewhere). The fix replaces synchronize_rcu() in that
code with synchronize_rcu_expedited() which makes the issue go
away. From Konstantin Khlebnikov.
- ACPI LPSS (Low-Power Subsystem) driver fix to handle power
management of the DMA engine included into the LPSS correctly. The
problem is that the DMA engine doesn't have ACPI PM support of its
own and it simply is turned off when the last LPSS device having
ACPI PM support goes into D3cold. To work around that, the PM
domain used by the ACPI LPSS driver is redesigned so at least one
device with ACPI PM support will be on as long as the DMA engine is
in use. From Andy Shevchenko.
- ACPI backlight driver fix to avoid using it on "Win8-compatible"
systems where it doesn't work and where it was used by default by
mistake (Aaron Lu).
- Assorted minor ACPI core fixes and cleanups from Tomasz Nowicki,
Sudeep Holla, Huang Rui, Hanjun Guo, Fabian Frederick, and Ashwin
Chaugule (mostly related to the upcoming ARM64 support).
- Intel RAPL (Running Average Power Limit) power capping driver fixes
and improvements including new processor IDs (Jacob Pan).
- Generic power domains modification to power up domains after
attaching devices to them to meet the expectations of device
drivers and bus types assuming devices to be accessible at probe
time (Ulf Hansson).
- Preliminary support for controlling device clocks from the generic
power domains core code and modifications of the ARM/shmobile
platform to use that feature (Ulf Hansson).
- Assorted minor fixes and cleanups of the generic power domains core
code (Ulf Hansson, Geert Uytterhoeven).
- Assorted minor fixes and cleanups of the device clocks control code
in the PM core (Geert Uytterhoeven, Grygorii Strashko).
- Consolidation of device power management Kconfig options by making
CONFIG_PM_SLEEP select CONFIG_PM_RUNTIME and removing the latter
which is now redundant (Rafael J Wysocki and Kevin Hilman). That
is the first batch of the changes needed for this purpose.
- Core device runtime power management support code cleanup related
to the execution of callbacks (Andrzej Hajda).
- cpuidle ARM support improvements (Lorenzo Pieralisi).
- cpuidle cleanup related to the CPUIDLE_FLAG_TIME_VALID flag and a
new MAINTAINERS entry for ARM Exynos cpuidle (Daniel Lezcano and
Bartlomiej Zolnierkiewicz).
- New cpufreq driver callback (->ready) to be executed when the
cpufreq core is ready to use a given policy object and cpufreq-dt
driver modification to use that callback for cooling device
registration (Viresh Kumar).
- cpufreq core fixes and cleanups (Viresh Kumar, Vince Hsu, James
Geboski, Tomeu Vizoso).
- Assorted fixes and cleanups in the cpufreq-pcc, intel_pstate,
cpufreq-dt, pxa2xx cpufreq drivers (Lenny Szubowicz, Ethan Zhao,
Stefan Wahren, Petr Cvek).
- OPP (Operating Performance Points) framework modification to allow
OPPs to be removed too and update of a few cpufreq drivers
(cpufreq-dt, exynos5440, imx6q, cpufreq) to remove OPPs (added
during initialization) on driver removal (Viresh Kumar).
- Hibernation core fixes and cleanups (Tina Ruchandani and Markus
Elfring).
- PM Kconfig fix related to CPU power management (Pankaj Dubey).
- cpupower tool fix (Prarit Bhargava)"
* tag 'pm+acpi-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (120 commits)
i2c-omap / PM: Drop CONFIG_PM_RUNTIME from i2c-omap.c
dmaengine / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
tools: cpupower: fix return checks for sysfs_get_idlestate_count()
drivers: sh / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
e1000e / igb / PM: Eliminate CONFIG_PM_RUNTIME
MMC / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
MFD / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
misc / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
media / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
input / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
leds: leds-gpio: Fix multiple instances registration without 'label' property
iio / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
hsi / OMAP / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
i2c-hid / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
drm / exynos / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
gpio / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
hwrandom / exynos / PM: Use CONFIG_PM in #ifdef
block / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
USB / PM: Drop CONFIG_PM_RUNTIME from the USB core
PM: Merge the SET*_RUNTIME_PM_OPS() macros
...
clean ups from that branch.
This code solves the issue of performing stack dumps from NMI context.
The issue is that printk() is not safe from NMI context as if the NMI
were to trigger when a printk() was being performed, the NMI could
deadlock from the printk() internal locks. This has been seen in practice.
With lots of review from Petr Mladek, this code went through several
iterations, and we feel that it is now at a point of quality to be
accepted into mainline.
Here's what is contained in this patch set:
o Creates a "seq_buf" generic buffer utility that allows a descriptor
to be passed around where functions can write their own "printk()"
formatted strings into it. The generic version was pulled out of
the trace_seq() code that was made specifically for tracing.
o The seq_buf code was change to model the seq_file code. I have
a patch (not included for 3.19) that converts the seq_file.c code
over to use seq_buf.c like the trace_seq.c code does. This was done
to make sure that seq_buf.c is compatible with seq_file.c. I may
try to get that patch in for 3.20.
o The seq_buf.c file was moved to lib/ to remove it from being dependent
on CONFIG_TRACING.
o The printk() was updated to allow for a per_cpu "override" of
the internal calls. That is, instead of writing to the console, a call
to printk() may do something else. This made it easier to allow the
NMI to change what printk() does in order to call dump_stack() without
needing to update that code as well.
o Finally, the dump_stack from all CPUs via NMI code was converted to
use the seq_buf code. The caller to trigger the NMI code would wait
till all the NMIs finished, and then it would print the seq_buf
data to the console safely from a non NMI context.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJUhbrnAAoJEEjnJuOKh9ldsCoIAJ3sKIJ5B3jxJJTCHPAx/lZD
GVbV1J1mu4kTAZuhJZOAxW8D6PZGZMyEjg0y6ScDEnBGcjAZ9gTiWCdakPktf9EX
GfaPPqwiL9dZ18J9Qc6uR+7M1Ffpzzwbcc6lJrpoTcjRgkoH9wCiLS9ozFQyYzWb
/7m5UbUM/PIk9WAjLYXPW6UUVtPTPT0RdEQKofMGTeah+vgqj4TXCOROdlxsXXWF
77vqBvPd5TUPWFH9ftzJGDtZS8SroXVKCu3fZIqHgzAU0yqwVtH/JzDTy9u2UYhX
GzDEPeAIdp6m6Uyc406VuIf1QW0gfBgmA0ir80vFoP27uFMM6j5HlF7azgQfx34=
=YBgA
-----END PGP SIGNATURE-----
Merge tag 'trace-seq-buf-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull nmi-safe seq_buf printk update from Steven Rostedt:
"This code is a fork from the trace-3.19 pull as it needed the
trace_seq clean ups from that branch.
This code solves the issue of performing stack dumps from NMI context.
The issue is that printk() is not safe from NMI context as if the NMI
were to trigger when a printk() was being performed, the NMI could
deadlock from the printk() internal locks. This has been seen in
practice.
With lots of review from Petr Mladek, this code went through several
iterations, and we feel that it is now at a point of quality to be
accepted into mainline.
Here's what is contained in this patch set:
- Creates a "seq_buf" generic buffer utility that allows a descriptor
to be passed around where functions can write their own "printk()"
formatted strings into it. The generic version was pulled out of
the trace_seq() code that was made specifically for tracing.
- The seq_buf code was change to model the seq_file code. I have a
patch (not included for 3.19) that converts the seq_file.c code
over to use seq_buf.c like the trace_seq.c code does. This was
done to make sure that seq_buf.c is compatible with seq_file.c. I
may try to get that patch in for 3.20.
- The seq_buf.c file was moved to lib/ to remove it from being
dependent on CONFIG_TRACING.
- The printk() was updated to allow for a per_cpu "override" of the
internal calls. That is, instead of writing to the console, a call
to printk() may do something else. This made it easier to allow
the NMI to change what printk() does in order to call dump_stack()
without needing to update that code as well.
- Finally, the dump_stack from all CPUs via NMI code was converted to
use the seq_buf code. The caller to trigger the NMI code would
wait till all the NMIs finished, and then it would print the
seq_buf data to the console safely from a non NMI context
One added bonus is that this code also makes the NMI dump stack work
on PREEMPT_RT kernels. As printk() includes sleeping locks on
PREEMPT_RT, printk() only writes to console if the console does not
use any rt_mutex converted spin locks. Which a lot do"
* tag 'trace-seq-buf-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
x86/nmi: Fix use of unallocated cpumask_var_t
printk/percpu: Define printk_func when printk is not defined
x86/nmi: Perform a safe NMI stack trace on all CPUs
printk: Add per_cpu printk func to allow printk to be diverted
seq_buf: Move the seq_buf code to lib/
seq-buf: Make seq_buf_bprintf() conditional on CONFIG_BINARY_PRINTF
tracing: Add seq_buf_get_buf() and seq_buf_commit() helper functions
tracing: Have seq_buf use full buffer
seq_buf: Add seq_buf_can_fit() helper function
tracing: Add paranoid size check in trace_printk_seq()
tracing: Use trace_seq_used() and seq_buf_used() instead of len
tracing: Clean up tracing_fill_pipe_page()
seq_buf: Create seq_buf_used() to find out how much was written
tracing: Add a seq_buf_clear() helper and clear len and readpos in init
tracing: Convert seq_buf fields to be like seq_file fields
tracing: Convert seq_buf_path() to be like seq_path()
tracing: Create seq_buf layer in trace_seq
to the trace_seq code. It also removed the return values to the
trace_seq_*() functions and use trace_seq_has_overflowed() to see if
the buffer filled up or not. This is similar to work being done to the
seq_file code as well in another tree.
Some of the other goodies include:
o Added some "!" (NOT) logic to the tracing filter.
o Fixed the frame pointer logic to the x86_64 mcount trampolines
o Added the logic for dynamic trampolines on !CONFIG_PREEMPT systems.
That is, the ftrace trampoline can be dynamically allocated
and be called directly by functions that only have a single hook
to them.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJUhbLGAAoJEEjnJuOKh9ldRV4H/3NcLbgGB2iu96la1zdYE6pG
Q7cDJMxXK80YIIL70h9G0IItcD4t62LMb72lfBnMGRj3msgFb3AgISW57EuI0Pxk
xk24wuIPoTG2S7v9sc3SboNFwO8qbtIjxD2OBmqIUrGo2sZIiGjyj3gX7mCY3uzL
WB2bUOSFz/22OgaANinR5EELHA3pZZCf54Vz1K9ndmtK0xp0j1a7xJShD6TrMdYv
mZ3zH5ViIhW4A3mdcMceh6fy2JLQAiEKF0uPTvcMMz7NlVul0mxyL/+10P7AE/3R
Ehw4fzmm4NDshPDtBOkKH0LsppgXzuItFuQUTpact3JlqTg++bV6onSsrkt1hlY=
=Z7Cm
-----END PGP SIGNATURE-----
Merge tag 'trace-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"There was a lot of clean ups and minor fixes. One of those clean ups
was to the trace_seq code. It also removed the return values to the
trace_seq_*() functions and use trace_seq_has_overflowed() to see if
the buffer filled up or not. This is similar to work being done to
the seq_file code as well in another tree.
Some of the other goodies include:
- Added some "!" (NOT) logic to the tracing filter.
- Fixed the frame pointer logic to the x86_64 mcount trampolines
- Added the logic for dynamic trampolines on !CONFIG_PREEMPT systems.
That is, the ftrace trampoline can be dynamically allocated and be
called directly by functions that only have a single hook to them"
* tag 'trace-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (55 commits)
tracing: Truncated output is better than nothing
tracing: Add additional marks to signal very large time deltas
Documentation: describe trace_buf_size parameter more accurately
tracing: Allow NOT to filter AND and OR clauses
tracing: Add NOT to filtering logic
ftrace/fgraph/x86: Have prepare_ftrace_return() take ip as first parameter
ftrace/x86: Get rid of ftrace_caller_setup
ftrace/x86: Have save_mcount_regs macro also save stack frames if needed
ftrace/x86: Add macro MCOUNT_REG_SIZE for amount of stack used to save mcount regs
ftrace/x86: Simplify save_mcount_regs on getting RIP
ftrace/x86: Have save_mcount_regs store RIP in %rdi for first parameter
ftrace/x86: Rename MCOUNT_SAVE_FRAME and add more detailed comments
ftrace/x86: Move MCOUNT_SAVE_FRAME out of header file
ftrace/x86: Have static tracing also use ftrace_caller_setup
ftrace/x86: Have static function tracing always test for function graph
kprobes: Add IPMODIFY flag to kprobe_ftrace_ops
ftrace, kprobes: Support IPMODIFY flag to find IP modify conflict
kprobes/ftrace: Recover original IP if pre_handler doesn't change it
tracing/trivial: Fix typos and make an int into a bool
tracing: Deletion of an unnecessary check before iput()
...
Pull x86 microcode loading updates from Ingo Molnar:
"The main changes in this cycle are:
- Reload microcode when resuming and the case when only the early
loader has been utilized. (Borislav Petkov)
- Also, do not load the driver on paravirt guests. (Boris
Ostrovsky)"
* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/intel: Fish out the stashed microcode for the BSP
x86, microcode: Reload microcode on resume
x86, microcode: Don't initialize microcode code on paravirt
x86, microcode, intel: Drop unused parameter
x86, microcode, AMD: Do not use smp_processor_id() in preemtible context
Pull x86 vdso updates from Ingo Molnar:
"Various vDSO updates from Andy Lutomirski, mostly cleanups and
reorganization to improve maintainability, but also some
micro-optimizations and robustization changes"
* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86_64/vsyscall: Restore orig_ax after vsyscall seccomp
x86_64: Add a comment explaining the TASK_SIZE_MAX guard page
x86_64,vsyscall: Make vsyscall emulation configurable
x86_64, vsyscall: Rewrite comment and clean up headers in vsyscall code
x86_64, vsyscall: Turn vsyscalls all the way off when vsyscall==none
x86,vdso: Use LSL unconditionally for vgetcpu
x86: vdso: Fix build with older gcc
x86_64/vdso: Clean up vgetcpu init and merge the vdso initcalls
x86_64/vdso: Remove jiffies from the vvar page
x86/vdso: Make the PER_CPU segment 32 bits
x86/vdso: Make the PER_CPU segment start out accessed
x86/vdso: Change the PER_CPU segment to use struct desc_struct
x86_64/vdso: Move getcpu code from vsyscall_64.c to vdso/vma.c
x86_64/vsyscall: Move all of the gate_area code to vsyscall_64.c
Pull x86 RAS update from Ingo Molnar:
"The biggest change in this cycle is better support for UCNA
(UnCorrected No Action) events:
"Handle all uncorrected error reports in the same way (soft
offline the page). We used to only do that for SRAO
(software recoverable action optional) machine checks, but
it makes sense to also do it for UCNA (UnCorrected No
Action) logs found by CMCI or polling."
plus various x86 MCE handling updates and fixes"
* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Spell "panicked" correctly
x86, mce: Support memory error recovery for both UCNA and Deferred error in machine_check_poll
x86, mce, severity: Extend the the mce_severity mechanism to handle UCNA/DEFERRED error
x86, MCE, AMD: Assign interrupt handler only when bank supports it
x86, MCE, AMD: Drop software-defined bank in error thresholding
x86, MCE, AMD: Move invariant code out from loop body
x86, MCE, AMD: Correct thresholding error logging
x86, MCE, AMD: Use macros to compute bank MSRs
RAS, HWPOISON: Fix wrong error recovery status
GHES: Make ghes_estatus_caches static
APEI, GHES: Cleanup unnecessary function for lockless list
Pull x86 platform changes from Ingo Molnar:
"A handful of numachip APIC driver updates/fixes, and two small SGI/UV
fixes"
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: numachip: APIC driver cleanups
x86: numachip: Elide self-IPI ICR polling
x86: numachip: Fix 16-bit APIC ID truncation
* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: UV BAU: Increase maximum CPUs per socket/hub
x86: UV BAU: Avoid NULL pointer reference in ptc_seq_show
Pull x86 build, cleanup and defconfig updates from Ingo Molnar:
"A single minor build change to suppress a repetitive build messages,
misc cleanups and a defconfig update"
* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/purgatory, build: Suppress kexec-purgatory.c is up to date message
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, CPU, AMD: Move K8 TLB flush filter workaround to K8 code
x86, espfix: Remove stale ptemask
x86, msr: Use seek definitions instead of hard-coded values
x86, msr: Convert printk to pr_foo()
x86, msr: Use PTR_ERR_OR_ZERO
x86/simplefb: Use PTR_ERR_OR_ZERO
x86/sysfb: Use PTR_ERR_OR_ZERO
x86, cpuid: Use PTR_ERR_OR_ZERO
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/kconfig/defconfig: Enable CONFIG_FHANDLE=y
Pull x86 boot and percpu updates from Ingo Molnar:
"This tree contains a bootable images documentation update plus three
slightly misplaced x86/asm percpu changes/optimizations"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86-64: Use RIP-relative addressing for most per-CPU accesses
x86-64: Handle PC-relative relocations on per-CPU data
x86: Convert a few more per-CPU items to read-mostly ones
x86, boot: Document intermediates more clearly
Pull x86 MPX support from Thomas Gleixner:
"This enables support for x86 MPX.
MPX is a new debug feature for bound checking in user space. It
requires kernel support to handle the bound tables and decode the
bound violating instruction in the trap handler"
* 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
asm-generic: Remove asm-generic arch_bprm_mm_init()
mm: Make arch_unmap()/bprm_mm_init() available to all architectures
x86: Cleanly separate use of asm-generic/mm_hooks.h
x86 mpx: Change return type of get_reg_offset()
fs: Do not include mpx.h in exec.c
x86, mpx: Add documentation on Intel MPX
x86, mpx: Cleanup unused bound tables
x86, mpx: On-demand kernel allocation of bounds tables
x86, mpx: Decode MPX instruction to get bound violation information
x86, mpx: Add MPX-specific mmap interface
x86, mpx: Introduce VM_MPX to indicate that a VMA is MPX specific
x86, mpx: Add MPX to disabled features
ia64: Sync struct siginfo with general version
mips: Sync struct siginfo with general version
mpx: Extend siginfo structure to include bound violation information
x86, mpx: Rename cfg_reg_u and status_reg
x86: mpx: Give bndX registers actual names
x86: Remove arbitrary instruction size limit in instruction decoder
Pull irq domain updates from Thomas Gleixner:
"The real interesting irq updates:
- Support for hierarchical irq domains:
For complex interrupt routing scenarios where more than one
interrupt related chip is involved we had no proper representation
in the generic interrupt infrastructure so far. That made people
implement rather ugly constructs in their nested irq chip
implementations. The main offenders are x86 and arm/gic.
To distangle that mess we have now hierarchical irqdomains which
seperate the various interrupt chips and connect them via the
hierarchical domains. That keeps the domain specific details
internal to the particular hierarchy level and removes the
criss/cross referencing of chip internals. The resulting hierarchy
for a complex x86 system will look like this:
vector mapped: 74
msi-0 mapped: 2
dmar-ir-1 mapped: 69
ioapic-1 mapped: 4
ioapic-0 mapped: 20
pci-msi-2 mapped: 45
dmar-ir-0 mapped: 3
ioapic-2 mapped: 1
pci-msi-1 mapped: 2
htirq mapped: 0
Neither ioapic nor pci-msi know about the dmar interrupt remapping
between themself and the vector domain. If interrupt remapping is
disabled ioapic and pci-msi become direct childs of the vector
domain.
In hindsight we should have done that years ago, but in hindsight
we always know better :)
- Support for generic MSI interrupt domain handling
We have more and more non PCI related MSI interrupts, so providing
a generic infrastructure for this is better than having all
affected architectures implementing their own private hacks.
- Support for PCI-MSI interrupt domain handling, based on the generic
MSI support.
This part carries the pci/msi branch from Bjorn Helgaas pci tree to
avoid a massive conflict. The PCI/MSI parts are acked by Bjorn.
I have two more branches on top of this. The full conversion of x86
to hierarchical domains and a partial conversion of arm/gic"
* 'irq-irqdomain-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
genirq: Move irq_chip_write_msi_msg() helper to core
PCI/MSI: Allow an msi_controller to be associated to an irq domain
PCI/MSI: Provide mechanism to alloc/free MSI/MSIX interrupt from irqdomain
PCI/MSI: Enhance core to support hierarchy irqdomain
PCI/MSI: Move cached entry functions to irq core
genirq: Provide default callbacks for msi_domain_ops
genirq: Introduce msi_domain_alloc/free_irqs()
asm-generic: Add msi.h
genirq: Add generic msi irq domain support
genirq: Introduce callback irq_chip.irq_write_msi_msg
genirq: Work around __irq_set_handler vs stacked domains ordering issues
irqdomain: Introduce helper function irq_domain_add_hierarchy()
irqdomain: Implement a method to automatically call parent domains alloc/free
genirq: Introduce helper irq_domain_set_info() to reduce duplicated code
genirq: Split out flow handler typedefs into seperate header file
genirq: Add IRQ_SET_MASK_OK_DONE to support stacked irqchip
genirq: Introduce irq_chip.irq_compose_msi_msg() to support stacked irqchip
genirq: Add more helper functions to support stacked irq_chip
genirq: Introduce helper functions to support stacked irq_chip
irqdomain: Do irq_find_mapping and set_type for hierarchy irqdomain in case OF
...
paravirt_enabled has the following effects:
- Disables the F00F bug workaround warning. There is no F00F bug
workaround any more because Linux's standard IDT handling already
works around the F00F bug, but the warning still exists. This
is only cosmetic, and, in any event, there is no such thing as
KVM on a CPU with the F00F bug.
- Disables 32-bit APM BIOS detection. On a KVM paravirt system,
there should be no APM BIOS anyway.
- Disables tboot. I think that the tboot code should check the
CPUID hypervisor bit directly if it matters.
- paravirt_enabled disables espfix32. espfix32 should *not* be
disabled under KVM paravirt.
The last point is the purpose of this patch. It fixes a leak of the
high 16 bits of the kernel stack address on 32-bit KVM paravirt
guests. Fixes CVE-2014-8134.
Cc: stable@vger.kernel.org
Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
I'm such a moron! The simple solution of saving the BSP patch
for use on resume was too simple (and wrong!), hint:
sizeof(struct microcode_intel).
What needs to be done instead is to fish out the microcode patch
we have stashed previously and apply that on the BSP in case the
late loader hasn't been utilized.
So do that instead.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20141208110820.GB20057@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull leftover perf fixes from Ingo Molnar:
"Two perf fixes left over from the previous cycle"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf session: Do not fail on processing out of order event
x86/asm/traps: Disable tracing and kprobes in fixup_bad_iret and sync_regs
Pull perf events update from Ingo Molnar:
"On the kernel side there's few changes, the one that stands out is
PEBS machine state sampling support on x86, by Stephane Eranian.
On the tooling side:
User visible tooling changes:
- Don't open the DWARF info multiple times, keeping instead a dwfl
handle in struct dso, greatly speeding up 'perf report' on powerpc.
(Sukadev Bhattiprolu)
- Introduce PARSE_OPT_DISABLED option flag and use it to avoid
showing undersired options in tools that provides frontends to
'perf record', like sched, kvm, etc (Namhyung Kim)
- Fallback to kallsyms when using the minimal 'ELF' loader (Arnaldo
Carvalho de Melo)
- Fix annotation with kcore (Adrian Hunter)
- Support source line numbers in annotate using a hotkey (Andi Kleen)
- Callchain improvements including:
* Enable printing the srcline in the history
* Make get_srcline fall back to sym+offset (Andi Kleen)
- TUI hist_entry browser fixes, including showing missing overhead
value for first level callchain. Detected comparing the output of
--stdio/--gui (that matched) with --tui, that had this problem.
(Namhyung Kim)
- Support handling complete branch stacks as histograms (Andi Kleen)
Tooling infrastructure changes:
- Prep work for supporting per-pkg and snapshot counters in 'perf
stat' (Jiri Olsa)
- 'perf stat' refactorings, moving stuff from it to evsel.c to use in
per-pkg/snapshot format changes (Jiri Olsa)
- Add per-pkg format file parsing (Matt Fleming)
- Clean up libelf feature support code (Namhyung Kim)
- Add gzip decompression support for kernel modules (Namhyung Kim)
- More prep patches for Intel PT, including a a thread stack and more
stuff made available via the database export mechanism (Adrian
Hunter)
- More Intel PT work, including a facility to export sample data
(comms, threads, symbol names, etc) in a database friendly way,
with an script to use this to create a postgresql database.
(Adrian Hunter)
- Make sure that thread->mg->machine points to the machine where the
thread exists (it was being set only for the kmaps kernel modules
case, do it as well for the mmaps) and use it to shorten function
signatures (Arnaldo Carvalho de Melo)
... and lots of other fixes and smaller improvements"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (91 commits)
perf report: In branch stack mode use address history sorting
perf report: Add --branch-history option
perf callchain: Support handling complete branch stacks as histograms
perf stat: Add support for snapshot counters
perf stat: Add support for per-pkg counters
perf tools: Remove perf_evsel__read interface
perf stat: Use read_counter in read_counter_aggr
perf stat: Make read_counter work over the thread dimension
perf stat: Use perf_evsel__read_cb in read_counter
perf tools: Add snapshot format file parsing
perf tools: Add per-pkg format file parsing
perf evsel: Introduce perf_evsel__read_cb function
perf evsel: Introduce perf_counts_values__scale function
perf evsel: Introduce perf_evsel__compute_deltas function
perf tools: Allow to force redirect pr_debug to stderr.
perf tools: Fix segfault due to invalid kernel dso access
perf callchain: Make get_srcline fall back to sym+offset
perf symbols: Move bfd_demangle stubbing to its only user
perf callchain: Enable printing the srcline in the history
perf tools: Collapse first level callchain entry if it has sibling
...
* Enablement for AMD F15h models 0x60 CPUs. Most notably DDR4 RAM
support. Out of tree stuff is
arch/x86/kernel/amd_nb.c | 2 +
include/linux/pci_ids.h | 2 +
adding the required PCI IDs. From Aravind Gopalakrishnan.
* Enable amd64_edac for 32-bit due to popular demand. From Tomasz Pala.
* Convert the AMD MCE injection module to debugfs, where it belongs.
* Misc EDAC cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUhZbcAAoJEBLB8Bhh3lVKd5kQAK77qsB4ebpke1rEBerQl9jQ
YCVrByCKu7QTRt/xvlqU9Vyp7EvcnpNxFbRCCqzIpBcJjre9v17QVRA2/zFS0q81
QRDTOWf9uhMWbssI2Zu1hbjDNMWYiEb9+aeZHjScVtkzPDmsgYuKGdWfTSLw9dkS
AG/UUZ3ojwyc2dK3i0W3pCjakKYLUsCmijyTZBfb37+u4rRGuAgrQ9G8fBn2lL+l
huptgV2BsTCQqdL554zTs64Yt912PnidsJWYCPCjMubgEPSeNcWRzTTBYDUf9NIn
RxFXYHnOQxetPSQqfLVXlX3V/cGNQg0yEXFZ9S0tCt5uLNbbN/D8Uumtst0rq9x3
XkJ/EGHXBFP+KwHdV/i9j6OYM5rq4z+4ql4OqbWzsvPrEDbh/4p5gRbhqd1Hhy9U
zgHJjVPpD/l2t82Tpz0jIOscQruZ6VqGMDSYo3LiLnNt724pcrmr3DiN9mc6ljzJ
rsNsemMH0IoH8KbBHKGtMLnBVO6HbnrtC6iKFfocNBisvo1PKKzn9s2O1pdjmsCs
jHwz5njoM7Ki/ygkJhbKiSDMXPs67eggwoGIGzpNMoY4RWxrcQzYE9yKfKzRNxET
Qb3xUwDWDyL8ErwHtL3xMxGwyfkhb+SZdMd5aKYA5Rdbf+TN8P6iAv05nrnfpkyk
lPTv5o9EQYvr8/Tc9FZW
=ULmb
-----END PGP SIGNATURE-----
Merge tag 'edac_for_3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC updates from Borislav Petkov:
"EDAC updates all over the place:
- Enablement for AMD F15h models 0x60 CPUs. Most notably DDR4 RAM
support. Out of tree stuff is adding the required PCI IDs. From
Aravind Gopalakrishnan.
- Enable amd64_edac for 32-bit due to popular demand. From Tomasz
Pala.
- Convert the AMD MCE injection module to debugfs, where it belongs.
- Misc EDAC cleanups"
* tag 'edac_for_3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC, MCE, AMD: Correct formatting of decoded text
EDAC, mce_amd_inj: Add an injector function
EDAC, mce_amd_inj: Add hw-injection attributes
EDAC, mce_amd_inj: Enable direct writes to MCE MSRs
EDAC, mce_amd_inj: Convert mce_amd_inj module to debugfs
EDAC: Delete unnecessary check before calling pci_dev_put()
EDAC, pci_sysfs: remove unneccessary ifdef around entire file
ghes_edac: Use snprintf() to silence a static checker warning
amd64_edac: Build module on x86-32
EDAC, MCE, AMD: Add decoding table for MC6 xec
amd64_edac: Add F15h M60h support
{mv64x60,ppc4xx}_edac,: Remove deprecated IRQF_DISABLED
EDAC: Sync memory types and names
EDAC: Add DDR3 LRDIMM entries to edac_mem_types
x86, amd_nb: Add device IDs to NB tables for F15h M60h
pci_ids: Add PCI device IDs for F15h M60h
* pm-cpufreq: (21 commits)
intel_pstate: skip this driver if Sun server has _PPC method
cpufreq: arm_big_little: free OPP table created during ->init()
imx6q: free OPP table created during ->init()
exynos5440: free OPP table created during ->init()
cpufreq-dt: free OPP table created during ->init()
cpufreq-dt: register cooling device from ->ready() callback
cpufreq: Introduce ->ready() callback for cpufreq drivers
cpufreq-dt: pass 'policy->related_cpus' to of_cpufreq_cooling_register()
cpufreq: Fix formatting issues in 'struct cpufreq_driver'
cpufreq: pxa2xx: Add Kconfig entry
cpufreq: Ref the policy object sooner
cpufreq: Kconfig: Remove architecture specific menu entries
cpufreq: pcc: Enable autoload of pcc-cpufreq for ACPI processors
intel_pstate: Add CPUID for BDW-H CPU
intel_pstate: Add support for HWP
x86: Add support for Intel HWP feature detection.
cpufreq: respect the min/max settings from user space
cpufreq: cpufreq-dt: Handle regulator_get_voltage() failure
cpufreq: cpufreq-dt: Improve debug about matching OPP
cpufreq: Loongson1: Add cpufreq driver for Loongson1B
...
* pm-cpuidle:
cpuidle: add MAINTAINERS entry for ARM Exynos cpuidle driver
drivers: cpuidle: Remove cpuidle-arm64 duplicate error messages
drivers: cpuidle: Add idle-state-name description to ARM idle states
drivers: cpuidle: Add status property to ARM idle states
cpuidle: Invert CPUIDLE_FLAG_TIME_VALID logic
Pull AMD range breakpoints support from Frederic Weisbecker:
" - Extend breakpoint tools and core to support address range through perf
event with initial backend support for AMD extended breakpoints.
Syntax is:
perf record -e mem:addr/len:type
For example set write breakpoint from 0x1000 to 0x1200 (0x1000 + 512)
perf record -e mem:0x1000/512:w
- Clean up a bit breakpoint code validation
It has been acked by Jiri and Oleg. "
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJUhNLZAAoJEHm+PkMAQRiGAEcH/iclYDW7k2GKemMqboy+Ohmh
+ELbQothNhlGZlS1wWdD69LBiiXkkQ+ufVYFh/hC0oy0gUdfPMt5t+bOHy6cjn6w
9zOcACtpDKnqbOwRqXZjZgNmIabk7lRjbn7GK4GQqpIaW4oO0FWcT91FFhtGSPDa
tjtmGRqDmbNsqfzr18h0WPEpUZmT6MxIdv17AYDliPB1MaaRuAv1Kss05TJrXdfL
Oucv+C0uwnybD9UWAz6pLJ3H/HR9VJFdkaJ4Y0pbCHAuxdd1+swoTpicluHlsJA1
EkK5iWQRMpcmGwKvB0unCAQljNpaJiq4/Tlmmv8JlYpMlmIiVLT0D8BZx5q05QQ=
=oGNw
-----END PGP SIGNATURE-----
Merge tag 'v3.18' into drm-next
Linux 3.18
Backmerge Linus tree into -next as we had conflicts in i915/radeon/nouveau,
and everyone was solving them individually.
* tag 'v3.18': (57 commits)
Linux 3.18
watchdog: s3c2410_wdt: Fix the mask bit offset for Exynos7
uapi: fix to export linux/vm_sockets.h
i2c: cadence: Set the hardware time-out register to maximum value
i2c: davinci: generate STP always when NACK is received
ahci: disable MSI on SAMSUNG 0xa800 SSD
context_tracking: Restore previous state in schedule_user
slab: fix nodeid bounds check for non-contiguous node IDs
lib/genalloc.c: export devm_gen_pool_create() for modules
mm: fix anon_vma_clone() error treatment
mm: fix swapoff hang after page migration and fork
fat: fix oops on corrupted vfat fs
ipc/sem.c: fully initialize sem_array before making it visible
drivers/input/evdev.c: don't kfree() a vmalloc address
cxgb4: Fill in supported link mode for SFP modules
xen-netfront: Remove BUGs on paged skb data which crosses a page boundary
mm/vmpressure.c: fix race in vmpressure_work_fn()
mm: frontswap: invalidate expired data on a dup-store failure
mm: do not overwrite reserved pages counter at show_mem()
drm/radeon: kernel panic in drm_calc_vbltimestamp_from_scanoutpos with 3.18.0-rc6
...
Conflicts:
drivers/gpu/drm/i915/intel_display.c
drivers/gpu/drm/nouveau/nouveau_drm.c
drivers/gpu/drm/radeon/radeon_cs.c
Normally, we do reapply microcode on resume. However, in the cases where
that microcode comes from the early loader and the late loader hasn't
been utilized yet, there's no easy way for us to go and apply the patch
applied during boot by the early loader.
Thus, reuse the patch stashed by the early loader for the BSP.
Signed-off-by: Borislav Petkov <bp@suse.de>
Paravirtual guests are not expected to load microcode into processors
and therefore it is not necessary to initialize microcode loading
logic.
In fact, under certain circumstances initializing this logic may cause
the guest to crash. Specifically, 32-bit kernels use __pa_nodebug()
macro which does not work in Xen (the code path that leads to this macro
happens during resume when we call mc_bp_resume()->load_ucode_ap()
->check_loader_disabled_ap())
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1417469264-31470-1-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Pull x86 fixes from Thomas Gleixner:
"Two final fixlets for 3.18:
- Prevent microcode reload wreckage on 32bit
- Unbreak cross compilation"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, microcode: Limit the microcode reloading to 64-bit for now
x86: Use $(OBJDUMP) instead of plain objdump
get_xsave_addr is the API to access XSAVE states, and KVM would
like to use it. Export it.
Cc: stable@vger.kernel.org
Cc: x86@kernel.org
Cc: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The function graph helper function prepare_ftrace_return() which does the work
to hijack the parent pointer has that parent pointer as its first parameter.
Instead, if we make it the second parameter and have ip as the first parameter
(self_addr), then it can use the %rdi from save_mcount_regs that loads it
already.
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1411262304010.3961@nanos
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The save_mcount_regs macro saves and restores the required mcount regs that
need to be saved before calling C code. It is done for all the function hook
utilities (static tracing, dynamic tracing, regs, function graph).
When frame pointers are enabled, the ftrace trampolines need to set up
frames and pointers such that a back trace (dump stack) can continue passed
them. Currently, a separate macro is used (create_frame) to do this, but
it's only done for the ftrace_caller and ftrace_reg_caller functions. It
is not done for the static tracer or function graph tracing.
Instead of having a separate macro doing the recording of the frames,
have the save_mcount_regs perform this task. This also has all tracers
saving the frame pointers when needed.
Link: http://lkml.kernel.org/r/CA+55aFwF+qCGSKdGaEgW4p6N65GZ5_XTV=1NbtWDvxnd5yYLiw@mail.gmail.com
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1411262304010.3961@nanos
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The macro save_mcount_regs saves regs onto the stack. But to uncouple the
amount of stack used in that macro from the users of the macro, we need
to have a define that tells all the users how much stack is used by that
macro. This way we can change the amount of stack the macro uses without
breaking its users.
Also remove some dead code that was left over from commit fdc841b58c
"ftrace: x86: Remove check of obsolete variable function_trace_stop".
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1411262304010.3961@nanos
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Currently save_mcount_regs is passed a "skip" parameter to know how much
stack updated the pt_regs, as it tries to keep the saved pt_regs in the
same location for all users. This is rather stupid, especially since the
part stored on the pt_regs has nothing to do with what is suppose to be
in that location.
Instead of doing that, just pass in an "added" parameter that lets that
macro know how much stack was added before it was called so that it
can get to the RIP. But the difference is that it will now offset the
pt_regs by that "added" count. The caller now needs to take care of
the offset of the pt_regs.
This will make it easier to simplify the code later.
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1411262304010.3961@nanos
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The name MCOUNT_SAVE_FRAME is rather confusing as it really isn't a
function frame that is saved, but just the required mcount registers
that are needed to be saved before C code may be called. The word
"frame" confuses it as being a function frame which it is not.
Rename MCOUNT_SAVE_FRAME and MCOUNT_RESTORE_FRAME to save_mcount_regs
and restore_mcount_regs respectively. Noticed the lower case, which
keeps it from screaming at the reviewers.
Link: http://lkml.kernel.org/r/CA+55aFwF+qCGSKdGaEgW4p6N65GZ5_XTV=1NbtWDvxnd5yYLiw@mail.gmail.com
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1411262304010.3961@nanos
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Linus pointed out that there were locations that did the hard coded
update of the parent and rip parameters. One of them was the static tracer
which could also use the ftrace_caller_setup to do that work. In fact,
because it did not use it, it is prone to bugs, and since the static
tracer is hardly ever used (who wants function tracing code always being
called?) it doesn't get tested very often. I only run a few "does it still
work" tests on it. But I do not run stress tests on that code. Although,
since it is never turned off, just having it on should be stressful enough.
(especially for the performance folks)
There's no reason that the static tracer can't also use ftrace_caller_setup.
Have it do so.
Link: http://lkml.kernel.org/r/CA+55aFwF+qCGSKdGaEgW4p6N65GZ5_XTV=1NbtWDvxnd5yYLiw@mail.gmail.com
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1411262304010.3961@nanos
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Hand down the cpu number instead, otherwise lockdep screams when doing
echo 1 > /sys/devices/system/cpu/microcode/reload.
BUG: using smp_processor_id() in preemptible [00000000] code: amd64-microcode/2470
caller is debug_smp_processor_id+0x12/0x20
CPU: 1 PID: 2470 Comm: amd64-microcode Not tainted 3.18.0-rc6+ #26
...
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1417428741-4501-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
First, there was this: https://bugzilla.kernel.org/show_bug.cgi?id=88001
The problem there was that microcode patches are not being reapplied
after suspend-to-ram. It was important to reapply them, though, because
of for example Haswell's TSX erratum which disabled TSX instructions
with a microcode patch.
A simple fix was fb86b97300 ("x86, microcode: Update BSPs microcode
on resume") but, as it is often the case, simple fixes are too
simple. This one causes 32-bit resume to fail:
https://bugzilla.kernel.org/show_bug.cgi?id=88391
Properly fixing this would require more involved changes for which it
is too late now, right before the merge window. Thus, limit this to
64-bit only temporarily.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1417353999-32236-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
These functions can be executed on the int3 stack, so kprobes
are dangerous. Tracing is probably a bad idea, too.
Fixes: b645af2d59 ("x86_64, traps: Rework bad_iret")
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: <stable@vger.kernel.org> # Backport as far back as it would apply
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/50e33d26adca60816f3ba968875801652507d0c4.1416870125.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
New updates to the ftrace generic code had ftrace_stub not always being
called when ftrace is off. This causes the static tracer to always save
and restore functions. But it also showed that when function tracing is
running, the function graph tracer can not. We should always check to see
if function graph tracing is running even if the function tracer is running
too. The function tracer code is not the only one that uses the hook to
function mcount.
Cc: Markos Chandras <Markos.Chandras@imgtec.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Merge x86-64 iret fixes from Andy Lutomirski:
"This addresses the following issues:
- an unrecoverable double-fault triggerable with modify_ldt.
- invalid stack usage in espfix64 failed IRET recovery from IST
context.
- invalid stack usage in non-espfix64 failed IRET recovery from IST
context.
It also makes a good but IMO scary change: non-espfix64 failed IRET
will now report the correct error. Hopefully nothing depended on the
old incorrect behavior, but maybe Wine will get confused in some
obscure corner case"
* emailed patches from Andy Lutomirski <luto@amacapital.net>:
x86_64, traps: Rework bad_iret
x86_64, traps: Stop using IST for #SS
x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in C
It's possible for iretq to userspace to fail. This can happen because
of a bad CS, SS, or RIP.
Historically, we've handled it by fixing up an exception from iretq to
land at bad_iret, which pretends that the failed iret frame was really
the hardware part of #GP(0) from userspace. To make this work, there's
an extra fixup to fudge the gs base into a usable state.
This is suboptimal because it loses the original exception. It's also
buggy because there's no guarantee that we were on the kernel stack to
begin with. For example, if the failing iret happened on return from an
NMI, then we'll end up executing general_protection on the NMI stack.
This is bad for several reasons, the most immediate of which is that
general_protection, as a non-paranoid idtentry, will try to deliver
signals and/or schedule from the wrong stack.
This patch throws out bad_iret entirely. As a replacement, it augments
the existing swapgs fudge into a full-blown iret fixup, mostly written
in C. It's should be clearer and more correct.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On a 32-bit kernel, this has no effect, since there are no IST stacks.
On a 64-bit kernel, #SS can only happen in user code, on a failed iret
to user space, a canonical violation on access via RSP or RBP, or a
genuine stack segment violation in 32-bit kernel code. The first two
cases don't need IST, and the latter two cases are unlikely fatal bugs,
and promoting them to double faults would be fine.
This fixes a bug in which the espfix64 code mishandles a stack segment
violation.
This saves 4k of memory per CPU and a tiny bit of code.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There's nothing special enough about the espfix64 double fault fixup to
justify writing it in assembly. Move it to C.
This also fixes a bug: if the double fault came from an IST stack, the
old asm code would return to a partially uninitialized stack frame.
Fixes: 3891a04aaf
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The PCI/MSI irq chip callbacks mask/unmask_msi_irq have been renamed
to pci_msi_mask/unmask_irq to mark them PCI specific. Rename all usage
sites. The conversion helper functions are kept around to avoid
conflicts in next and will be removed after merging into mainline.
Coccinelle assisted conversion. No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: x86@kernel.org
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Murali Karicheri <m-karicheri2@ti.com>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Mohit Kumar <mohit.kumar@st.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Yijing Wang <wangyijing@huawei.com>
Rename write_msi_msg() to pci_write_msi_msg() to mark it as PCI
specific.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Yingjoe Chen <yingjoe.chen@mediatek.com>
Cc: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull x86 fixes from Thomas Gleixner:
"Misc fixes:
- gold linker build fix
- noxsave command line parsing fix
- bugfix for NX setup
- microcode resume path bug fix
- _TIF_NOHZ versus TIF_NOHZ bugfix as discussed in the mysterious
lockup thread"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, syscall: Fix _TIF_NOHZ handling in syscall_trace_enter_phase1
x86, kaslr: Handle Gold linker for finding bss/brk
x86, mm: Set NX across entire PMD at boot
x86, microcode: Update BSPs microcode on resume
x86: Require exact match for 'noxsave' command line option
Pull perf fixes from Ingo Molnar:
"Misc fixes: two Intel uncore driver fixes, a CPU-hotplug fix and a
build dependencies fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/uncore: Fix boot crash on SBOX PMU on Haswell-EP
perf/x86/intel/uncore: Fix IRP uncore register offsets on Haswell EP
perf: Fix corruption of sibling list with hotplug
perf/x86: Fix embarrasing typo
TIF_NOHZ is 19 (i.e. _TIF_SYSCALL_TRACE | _TIF_NOTIFY_RESUME |
_TIF_SINGLESTEP), not (1<<19).
This code is involved in Dave's trinity lockup, but I don't see why
it would cause any of the problems he's seeing, except inadvertently
by causing a different path through entry_64.S's syscall handling.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/a6cd3b60a3f53afb6e1c8081b0ec30ff19003dd7.1416434075.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Recover original IP register if the pre_handler doesn't change it.
Since current kprobes doesn't expect that another ftrace handler
may change regs->ip, it sets kprobe.addr + MCOUNT_INSN_SIZE to
regs->ip and returns to ftrace.
This seems wrong behavior since kprobes can recover regs->ip
and safely pass it to another handler.
This adds code which recovers original regs->ip passed from
ftrace right before returning to ftrace, so that another ftrace
user can change regs->ip.
Link: http://lkml.kernel.org/r/20141009130106.4698.26362.stgit@kbuild-f20.novalocal
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
When trigger_all_cpu_backtrace() is called on x86, it will trigger an
NMI on each CPU and call show_regs(). But this can lead to a hard lock
up if the NMI comes in on another printk().
In order to avoid this, when the NMI triggers, it switches the printk
routine for that CPU to call a NMI safe printk function that records the
printk in a per_cpu seq_buf descriptor. After all NMIs have finished
recording its data, the seq_bufs are printed in a safe context.
Link: http://lkml.kernel.org/p/20140619213952.360076309@goodmis.org
Link: http://lkml.kernel.org/r/20141115050605.055232587@goodmis.org
Tested-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Stack traces that happen from function tracing check if the address
on the stack is a __kernel_text_address(). That is, is the address
kernel code. This calls core_kernel_text() which returns true
if the address is part of the builtin kernel code. It also calls
is_module_text_address() which returns true if the address belongs
to module code.
But what is missing is ftrace dynamically allocated trampolines.
These trampolines are allocated for individual ftrace_ops that
call the ftrace_ops callback functions directly. But if they do a
stack trace, the code checking the stack wont detect them as they
are neither core kernel code nor module address space.
Adding another field to ftrace_ops that also stores the size of
the trampoline assigned to it we can create a new function called
is_ftrace_trampoline() that returns true if the address is a
dynamically allocate ftrace trampoline. Note, it ignores trampolines
that are not dynamically allocated as they will return true with
the core_kernel_text() function.
Link: http://lkml.kernel.org/r/20141119034829.497125839@goodmis.org
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
When CONFIG_FRAME_POINTERS are enabled, it is required that the
ftrace_caller and ftrace_regs_caller trampolines set up frame pointers
otherwise a stack trace from a function call wont print the functions
that called the trampoline. This is due to a check in
__save_stack_address():
#ifdef CONFIG_FRAME_POINTER
if (!reliable)
return;
#endif
The "reliable" variable is only set if the function address is equal to
contents of the address before the address the frame pointer register
points to. If the frame pointer is not set up for the ftrace caller
then this will fail the reliable test. It will miss the function that
called the trampoline. Worse yet, if fentry is used (gcc 4.6 and
beyond), it will also miss the parent, as the fentry is called before
the stack frame is set up. That means the bp frame pointer points
to the stack of just before the parent function was called.
Link: http://lkml.kernel.org/r/20141119034829.355440340@goodmis.org
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: stable@vger.kernel.org # 3.7+
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Uncorrected no action required (UCNA) - is a uncorrected recoverable
machine check error that is not signaled via a machine check exception
and, instead, is reported to system software as a corrected machine
check error. UCNA errors indicate that some data in the system is
corrupted, but the data has not been consumed and the processor state
is valid and you may continue execution on this processor. UCNA errors
require no action from system software to continue execution. Note that
UCNA errors are supported by the processor only when IA32_MCG_CAP[24]
(MCG_SER_P) is set.
-- Intel SDM Volume 3B
Deferred errors are errors that cannot be corrected by hardware, but
do not cause an immediate interruption in program flow, loss of data
integrity, or corruption of processor state. These errors indicate
that data has been corrupted but not consumed. Hardware writes information
to the status and address registers in the corresponding bank that
identifies the source of the error if deferred errors are enabled for
logging. Deferred errors are not reported via machine check exceptions;
they can be seen by polling the MCi_STATUS registers.
-- AMD64 APM Volume 2
Above two items, both UCNA and Deferred errors belong to detected
errors, but they can't be corrected by hardware, and this is very
similar to Software Recoverable Action Optional (SRAO) errors.
Therefore, we can take some actions that have been used for handling
SRAO errors to handle UCNA and Deferred errors.
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Chen Yucong <slaoub@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Until now, the mce_severity mechanism can only identify the severity
of UCNA error as MCE_KEEP_SEVERITY. Meanwhile, it is not able to filter
out DEFERRED error for AMD platform.
This patch extends the mce_severity mechanism for handling
UCNA/DEFERRED error. In order to do this, the patch introduces a new
severity level - MCE_UCNA/DEFERRED_SEVERITY.
In addition, mce_severity is specific to machine check exception,
and it will check MCIP/EIPV/RIPV bits. In order to use mce_severity
mechanism in non-exception context, the patch also introduces a new
argument (is_excp) for mce_severity. `is_excp' is used to explicitly
specify the calling context of mce_severity.
Reviewed-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Chen Yucong <slaoub@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
In the situation when we apply early microcode but do *not* apply late
microcode, we fail to update the BSP's microcode on resume because we
haven't initialized the uci->mc microcode pointer. So, in order to
alleviate that, we go and dig out the stashed microcode patch during
early boot. It is basically the same thing that is done on the APs early
during boot so do that too here.
Tested-by: alex.schnaidt@gmail.com
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=88001
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: <stable@vger.kernel.org> # v3.9
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20141118094657.GA6635@pd.tnic
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This is really the meat of the MPX patch set. If there is one patch to
review in the entire series, this is the one. There is a new ABI here
and this kernel code also interacts with userspace memory in a
relatively unusual manner. (small FAQ below).
Long Description:
This patch adds two prctl() commands to provide enable or disable the
management of bounds tables in kernel, including on-demand kernel
allocation (See the patch "on-demand kernel allocation of bounds tables")
and cleanup (See the patch "cleanup unused bound tables"). Applications
do not strictly need the kernel to manage bounds tables and we expect
some applications to use MPX without taking advantage of this kernel
support. This means the kernel can not simply infer whether an application
needs bounds table management from the MPX registers. The prctl() is an
explicit signal from userspace.
PR_MPX_ENABLE_MANAGEMENT is meant to be a signal from userspace to
require kernel's help in managing bounds tables.
PR_MPX_DISABLE_MANAGEMENT is the opposite, meaning that userspace don't
want kernel's help any more. With PR_MPX_DISABLE_MANAGEMENT, the kernel
won't allocate and free bounds tables even if the CPU supports MPX.
PR_MPX_ENABLE_MANAGEMENT will fetch the base address of the bounds
directory out of a userspace register (bndcfgu) and then cache it into
a new field (->bd_addr) in the 'mm_struct'. PR_MPX_DISABLE_MANAGEMENT
will set "bd_addr" to an invalid address. Using this scheme, we can
use "bd_addr" to determine whether the management of bounds tables in
kernel is enabled.
Also, the only way to access that bndcfgu register is via an xsaves,
which can be expensive. Caching "bd_addr" like this also helps reduce
the cost of those xsaves when doing table cleanup at munmap() time.
Unfortunately, we can not apply this optimization to #BR fault time
because we need an xsave to get the value of BNDSTATUS.
==== Why does the hardware even have these Bounds Tables? ====
MPX only has 4 hardware registers for storing bounds information.
If MPX-enabled code needs more than these 4 registers, it needs to
spill them somewhere. It has two special instructions for this
which allow the bounds to be moved between the bounds registers
and some new "bounds tables".
They are similar conceptually to a page fault and will be raised by
the MPX hardware during both bounds violations or when the tables
are not present. This patch handles those #BR exceptions for
not-present tables by carving the space out of the normal processes
address space (essentially calling the new mmap() interface indroduced
earlier in this patch set.) and then pointing the bounds-directory
over to it.
The tables *need* to be accessed and controlled by userspace because
the instructions for moving bounds in and out of them are extremely
frequent. They potentially happen every time a register pointing to
memory is dereferenced. Any direct kernel involvement (like a syscall)
to access the tables would obviously destroy performance.
==== Why not do this in userspace? ====
This patch is obviously doing this allocation in the kernel.
However, MPX does not strictly *require* anything in the kernel.
It can theoretically be done completely from userspace. Here are
a few ways this *could* be done. I don't think any of them are
practical in the real-world, but here they are.
Q: Can virtual space simply be reserved for the bounds tables so
that we never have to allocate them?
A: As noted earlier, these tables are *HUGE*. An X-GB virtual
area needs 4*X GB of virtual space, plus 2GB for the bounds
directory. If we were to preallocate them for the 128TB of
user virtual address space, we would need to reserve 512TB+2GB,
which is larger than the entire virtual address space today.
This means they can not be reserved ahead of time. Also, a
single process's pre-popualated bounds directory consumes 2GB
of virtual *AND* physical memory. IOW, it's completely
infeasible to prepopulate bounds directories.
Q: Can we preallocate bounds table space at the same time memory
is allocated which might contain pointers that might eventually
need bounds tables?
A: This would work if we could hook the site of each and every
memory allocation syscall. This can be done for small,
constrained applications. But, it isn't practical at a larger
scale since a given app has no way of controlling how all the
parts of the app might allocate memory (think libraries). The
kernel is really the only place to intercept these calls.
Q: Could a bounds fault be handed to userspace and the tables
allocated there in a signal handler instead of in the kernel?
A: (thanks to tglx) mmap() is not on the list of safe async
handler functions and even if mmap() would work it still
requires locking or nasty tricks to keep track of the
allocation state there.
Having ruled out all of the userspace-only approaches for managing
bounds tables that we could think of, we create them on demand in
the kernel.
Based-on-patch-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141114151829.AD4310DE@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The current x86 instruction decoder steps along through the
instruction stream but always ensures that it never steps farther
than the largest possible instruction size (MAX_INSN_SIZE).
The MPX code is now going to be doing some decoding of userspace
instructions. We copy those from userspace in to the kernel and
they're obviously completely untrusted coming from userspace. In
addition to the constraint that instructions can only be so long,
we also have to be aware of how long the buffer is that came in
from userspace. This _looks_ to be similar to what the perf and
kprobes is doing, but it's unclear to me whether they are
affected.
The whole reason we need this is that it is perfectly valid to be
executing an instruction within MAX_INSN_SIZE bytes of an
unreadable page. We should be able to gracefully handle short
reads in those cases.
This adds support to the decoder to record how long the buffer
being decoded is and to refuse to "validate" the instruction if
we would have gone over the end of the buffer to decode it.
The kprobes code probably needs to be looked at here a bit more
carefully. This patch still respects the MAX_INSN_SIZE limit
there but the kprobes code does look like it might be able to
be a bit more strict than it currently is.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: x86@kernel.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20141114153957.E6B01535@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We have some very similarly named command-line options:
arch/x86/kernel/cpu/common.c:__setup("noxsave", x86_xsave_setup);
arch/x86/kernel/cpu/common.c:__setup("noxsaveopt", x86_xsaveopt_setup);
arch/x86/kernel/cpu/common.c:__setup("noxsaves", x86_xsaves_setup);
__setup() is designed to match options that take arguments, like
"foo=bar" where you would have:
__setup("foo", x86_foo_func...);
The problem is that "noxsave" actually _matches_ "noxsaves" in
the same way that "foo" matches "foo=bar". If you boot an old
kernel that does not know about "noxsaves" with "noxsaves" on the
command line, it will interpret the argument as "noxsave", which
is not what you want at all.
This makes the "noxsave" handler only return success when it finds
an *exact* match.
[ tglx: We really need to make __setup() more robust. ]
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: x86@kernel.org
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20141111220133.FE053984@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
PEBS can capture machine state regs at retiremnt of the sampled
instructions. When precise sampling is enabled on an event, PEBS
is used, so substitute the interrupted state with the PEBS state.
Note that not all registers are captured by PEBS. Those missing
are replaced by the interrupt state counter-parts.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1411559322-16548-3-git-send-email-eranian@google.com
Cc: cebbert.lkml@gmail.com
Cc: jolsa@redhat.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Disallow setting inv/cmask/etc. flags for all PEBS events
on these CPUs, except for the UOPS_RETIRED.* events on Nehalem/Westmere,
which are needed for cycles:p. This avoids an undefined situation
strongly discouraged by the Intle SDM. The PLD_* events were already
covered. This follows the earlier changes for Sandy Bridge and alter.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1411569288-5627-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
My earlier commit:
86a04461a9 ("perf/x86: Revamp PEBS event selection")
made nearly all PEBS on Sandy/IvyBridge/Haswell to reject non zero flags.
However this wasn't done for the INST_RETIRED.PREC_DIST event
because no suitable macro existed. Now that we have
INTEL_FLAGS_UEVENT_CONSTRAINT enforce zero flags for
INST_RETIRED.PREC_DIST too.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1411569288-5627-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add a FLAGS_UEVENT_CONSTRAINT macro that allows us to
match on event+umask, and in additional all flags.
This is needed to ensure the INV and CMASK fields
are zero for specific events, as this can cause undefined
behavior.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Cc: Mark Davies <junk@eslaf.co.uk>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1411569288-5627-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add scaling to MB/s to the memory controller read/write
events for Sandy/IvyBridge/Haswell-EP similar to how the client
does. This makes the events easier to use from the
standard perf tool.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1415062828-19759-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There were several reports that on some systems writing the SBOX0 PMU
initialization MSR would #GP at boot. This did not happen on all
systems -- my two test systems booted fine.
Writing the three initialization bits bit-by-bit seems to avoid the
problem. So add a special callback to do just that.
This replaces an earlier patch that disabled the SBOX.
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Reported-and-Tested-by: Patrick Lu <patrick.lu@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1415062828-19759-4-git-send-email-andi@firstfloor.org
[ Fixed a whitespace error and added attribution tags that were left out inexplicably. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The counter register offsets for the IRP box PMU for Haswell-EP
were incorrect. The offsets actually changed over IvyBridge EP.
Fix them to the correct values. For this we need to fork the read
function from the IVB and use an own counter array.
Tested-by: patrick.lu@intel.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1415062828-19759-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The only place where the time is invalid is when the ACPI_CSTATE_FFH entry
method is not set. Otherwise for all the drivers, the time can be correctly
measured.
Instead of duplicating the CPUIDLE_FLAG_TIME_VALID flag in all the drivers
for all the states, just invert the logic by replacing it by the flag
CPUIDLE_FLAG_TIME_INVALID, hence we can set this flag only for the acpi idle
driver, remove the former flag from all the drivers and invert the logic with
this flag in the different governor.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
New Fam15h models carry extra feature bits and extend
the MSR register space for IBS ops. Adding them here.
While at it, add functionality to read IbsBrTarget and
OpData4 depending on their availability if user wants a
PERF_SAMPLE_RAW.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <paulus@samba.org>
Cc: <acme@kernel.org>
Link: http://lkml.kernel.org/r/1415651066-13523-1-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJUX/DqAAoJEHm+PkMAQRiGLtQH/iAt3fRHlYDXjaJian/KG1Cb
wVP0I+HWZmvVmmd0PzyaxCZLgRNwdmmYHEH4QLy2JwZ3jZfFHlxhy+hDWCgz+67t
bIzkLs0Pf1T4kJ2+r8qW2kBEz9PWJHGTQw7NTqZ++Ts3rPptBA6Fg4mEJ6fQigXy
qRIY68DpipUkXV9BWBWijnTmrvP5tt7JtPzBr4DC8frMjvWct8+XwYhc2k2tEv2j
LwLYb1OW6PUpPv2BQBfWjqqH77vYNQVhJwuwGcDe2YZdI0UFkDheL24+RbbPcZ4f
OnrLjJSSgzv6lBWkAaXZK7/WJ/JZbXxEqHzWZQ3xXoQov97bm7lEYJqqi5gDasQ=
=6Qpa
-----END PGP SIGNATURE-----
Merge tag 'v3.18-rc4' into x86/cleanups, to refresh the tree before pulling new changes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJUX/DqAAoJEHm+PkMAQRiGLtQH/iAt3fRHlYDXjaJian/KG1Cb
wVP0I+HWZmvVmmd0PzyaxCZLgRNwdmmYHEH4QLy2JwZ3jZfFHlxhy+hDWCgz+67t
bIzkLs0Pf1T4kJ2+r8qW2kBEz9PWJHGTQw7NTqZ++Ts3rPptBA6Fg4mEJ6fQigXy
qRIY68DpipUkXV9BWBWijnTmrvP5tt7JtPzBr4DC8frMjvWct8+XwYhc2k2tEv2j
LwLYb1OW6PUpPv2BQBfWjqqH77vYNQVhJwuwGcDe2YZdI0UFkDheL24+RbbPcZ4f
OnrLjJSSgzv6lBWkAaXZK7/WJ/JZbXxEqHzWZQ3xXoQov97bm7lEYJqqi5gDasQ=
=6Qpa
-----END PGP SIGNATURE-----
Merge tag 'v3.18-rc4' into drm-next
backmerge to get vmwgfx locking changes into next as the
conflict with per-plane locking.
Add support of Hardware Managed Performance States (HWP) described in Volume 3
section 14.4 of the SDM.
One bit CPUID.06H:EAX[bit 7] expresses the presence of the HWP feature on
the processor. The remaining bits CPUID.06H:EAX[bit 8-11] denote the
presense of various HWP features.
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The problem fixed by 0e4ccb1505 ("PCI: Add x86_msi.msi_mask_irq() and
msix_mask_irq()") has been fixed in a simpler way by a previous commit
("PCI/MSI: Add pci_msi_ignore_mask to prevent writes to MSI/MSI-X Mask
Bits").
The msi_mask_irq() and msix_mask_irq() x86_msi_ops added by 0e4ccb1505
are no longer needed, so revert the commit.
default_msi_mask_irq() and default_msix_mask_irq() were added by
0e4ccb1505 and are still used by s390, so keep them for now.
[bhelgaas: changelog]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: David Vrabel <david.vrabel@citrix.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: xen-devel@lists.xenproject.org
With the introduction of the dynamic trampolines, it is useful that if
things go wrong that ftrace_bug() produces more information about what
the current state is. This can help debug issues that may arise.
Ftrace has lots of checks to make sure that the state of the system it
touchs is exactly what it expects it to be. When it detects an abnormality
it calls ftrace_bug() and disables itself to prevent any further damage.
It is crucial that ftrace_bug() produces sufficient information that
can be used to debug the situation.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Borislav Petkov <bp@suse.de>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
When the static ftrace_ops (like function tracer) enables tracing, and it
is the only callback that is referencing a function, a trampoline is
dynamically allocated to the function that calls the callback directly
instead of calling a loop function that iterates over all the registered
ftrace ops (if more than one ops is registered).
But when it comes to dynamically allocated ftrace_ops, where they may be
freed, on a CONFIG_PREEMPT kernel there's no way to know when it is safe
to free the trampoline. If a task was preempted while executing on the
trampoline, there's currently no way to know when it will be off that
trampoline.
But this is not true when it comes to !CONFIG_PREEMPT. The current method
of calling schedule_on_each_cpu() will force tasks off the trampoline,
becaues they can not schedule while on it (kernel preemption is not
configured). That means it is safe to free a dynamically allocated
ftrace ops trampoline when CONFIG_PREEMPT is not configured.
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Previous versions of the espfix had a single function which did setup
the pagetables. It was later split into BSP and AP version. Drop unused
leftovers after that split.
Signed-off-by: Borislav Petkov <bp@suse.de>
* access the dis_ucode_ldr chicken bit properly
* fix patch stashing on AMD on 32-bit
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUYNWUAAoJEBLB8Bhh3lVKU1sQAKIj1LVBtNAeaMaC9O8AUkUN
SWfskslf0uU2OS4RvV0QjDbr/chivIKMs7rbeMb521lHqWULRV/ZSR0kReB1JL45
yF7Dnz/YZX4VXx7O1lUSBhczN+Xp2jlPGuaeV1Q7iE0S1Focwxe8B24n6ye3dyto
o3dOH9tSna1U5KZqzHSaXWI4LJg3VrVNmf70IbYQFYyINHEtxI3oEtRWUlfFBA6C
+RbA3cUksBhYkNLfpkoA9o9ODbdSh5oSNkKFV8R26GCYw+pBQp27FhSECaEDEYIe
sdMTLgQd3ZWo5zh2zm3U12j8hf0hsfz4TjpDuozXmBlHRJSi/cLbFyEUOAbaCHpQ
Coaxgs8iiGcFVcZnMGmis9WGM41Q4O3UyxYVVpVEyMYLcrOxysKB0j1L2ycMGHV1
YHVL6Ex2MYxxqbK6NoC2ZK0OWWm1KNl4O2NAYsT4ICBxsDyxc9JzA6vidKM7VBU6
VYtOo21fYYbDgxogF6N/C95PA6nRxCm5coJ6X2QENg9DWSQHWkQ/q4Jp3yTrW4Dn
h/vY+Y5FkmVGoPBITg6BjtG9Sl3wrsqpIz2umWEeRmNCbcQm+KNQWSctvzzmOWDW
yYHyPQUgwxVX5qK5VVrTEvtDBn7E0gLEnwJLy4AdwkHf7YESxwbnYv+xXkiAubLH
dDlDNEEv1Fi3wzwc4/6g
=BamU
-----END PGP SIGNATURE-----
Merge tag 'microcode_fixes_for_3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/urgent
Pull two fixes for early microcode loader on 32-bit from Borislav Petkov:
- access the dis_ucode_ldr chicken bit properly
- fix patch stashing on AMD on 32-bit
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Save the patch while we're running on the BSP instead of later, before
the initrd has been jettisoned. More importantly, on 32-bit we need to
access the physical address instead of the virtual.
This way we actually do find it on the APs instead of having to go
through the initrd each time.
Tested-by: Richard Hendershot <rshendershot@mchsi.com>
Fixes: 5335ba5cf4 ("x86, microcode, AMD: Fix early ucode loading")
Cc: <stable@vger.kernel.org> # v3.13+
Signed-off-by: Borislav Petkov <bp@suse.de>
Commit 2ed53c0d6c ("x86/smpboot: Speed up suspend/resume by
avoiding 100ms sleep for CPU offline during S3") introduced
completions to CPU offlining process. These completions are not
initialized on Xen kernels causing a panic in
play_dead_common().
Move handling of die_complete into common routines to make them
available to Xen guests.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: tianyu.lan@intel.com
Cc: konrad.wilk@oracle.com
Cc: xen-devel@lists.xenproject.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1414770572-7950-1-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The vsyscall emulation code sets orig_ax for seccomp's benefit,
but it forgot to set it back.
I'm not sure that this is observable at all, but it could cause
confusion to various /proc or ptrace users, and it's possible
that it could cause minor artifacts if a signal were to be
delivered on return from vsyscall emulation.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/cdc6a564517a4df09235572ee5f530ccdcf933f7.1415144089.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Many sysfs *_show function use cpu{list,mask}_scnprintf to copy cpumap
to the buffer aligned to PAGE_SIZE, append '\n' and '\0' to return null
terminated buffer with newline.
This patch creates a new helper function cpumap_print_to_pagebuf in
cpumask.h using newly added bitmap_print_to_pagebuf and consolidates
most of those sysfs functions using the new helper function.
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Suggested-by: Stephen Boyd <sboyd@codeaurora.org>
Tested-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: x86@kernel.org
Cc: linux-acpi@vger.kernel.org
Cc: linux-pci@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We should be accessing it through a pointer, like on the BSP.
Tested-by: Richard Hendershot <rshendershot@mchsi.com>
Fixes: 65cef1311d ("x86, microcode: Add a disable chicken bit")
Cc: <stable@vger.kernel.org> # v3.15+
Signed-off-by: Borislav Petkov <bp@suse.de>
Observing that per-CPU data (in the SMP case) is reachable by
exploiting 64-bit address wraparound (building on the default kernel
load address being at 16Mb), the one byte shorter RIP-relative
addressing form can be used for most per-CPU accesses. The one
exception are the "stable" reads, where the use of the "P" operand
modifier prevents the compiler from using RIP-relative addressing, but
is unavoidable due to the use of the "p" constraint (side note: with
gcc 4.9.x the intended effect of this isn't being achieved anymore,
see gcc bug 63637).
With the dependency on the minimum kernel load address, arbitrarily
low values for CONFIG_PHYSICAL_START are now no longer possible. A
link time assertion is being added, directing to the need to increase
that value when it triggers.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/5458A1780200007800044A9D@mail.emea.novell.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Both this_cpu_off and cpu_info aren't getting modified post boot, yet
are being accessed on enough code paths that grouping them with other
frequently read items seems desirable. For cpu_info this at the same
time implies removing the cache line alignment (which afaict became
pointless when it got converted to per-CPU data years ago).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/54589BD20200007800044A84@mail.emea.novell.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Drop printing that serves no purpose, as it's printing fixed or known
values, and mark constant structure appropriately.
Signed-off-by: Daniel J Blueman <daniel@numascale.com>
Cc: Steffen Persvold <sp@numascale.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/1415089784-28779-3-git-send-email-daniel@numascale.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The default self-IPI path polls the ICR to delay sending the IPI until
there is no IPI in progress. This is redundant on x86-86 APICs, since
IPIs are queued. See the AMD64 Architecture Programmer's Manual, vol 2,
p525.
Signed-off-by: Daniel J Blueman <daniel@numascale.com>
Cc: Steffen Persvold <sp@numascale.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/1415089784-28779-2-git-send-email-daniel@numascale.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Prevent 16-bit APIC IDs being truncated by using correct mask. This fixes
booting large systems, where the wrong core would receive the startup and
init IPIs, causing hanging.
Signed-off-by: Daniel J Blueman <daniel@numascale.com>
Cc: Steffen Persvold <sp@numascale.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/1415089784-28779-1-git-send-email-daniel@numascale.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This adds CONFIG_X86_VSYSCALL_EMULATION, guarded by CONFIG_EXPERT.
Turning it off completely disables vsyscall emulation, saving ~3.5k
for vsyscall_64.c, 4k for vsyscall_emu_64.S (the fake vsyscall
page), some tiny amount of core mm code that supports a gate area,
and possibly 4k for a wasted pagetable. The latter is because the
vsyscall addresses are misaligned and fit poorly in the fixmap.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/406db88b8dd5f0cbbf38216d11be34bbb43c7eae.1414618407.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
vsyscall_64.c is just vsyscall emulation. Tidy it up accordingly.
[ tglx: Preserved the original copyright notices ]
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/9c448d5643d0fdb618f8cde9a54c21d2bcd486ce.1414618407.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
I see no point in having an unusable read-only page sitting at
0xffffffffff600000 when vsyscall=none. Instead, skip mapping it and
remove it from /proc/PID/maps.
I kept the ratelimited warning when programs try to use a vsyscall
in this mode, since it may help admins avoid confusion.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/0dddbadc1d4e3bfbaf887938ff42afc97a7cc1f2.1414618407.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We can use get_cpu() and put_cpu() to replace
preempt_disable()/cpu = smp_processor_id() and
preempt_enable() for slightly better code.
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Konrad triggered the following splat below in a 32-bit guest on an AMD
box. As it turns out, in save_microcode_in_initrd_amd() we're using the
*physical* address of the container *after* we have enabled paging and
thus we #PF in load_microcode_amd() when trying to access the microcode
container in the ramdisk range.
Because the ramdisk is exactly there:
[ 0.000000] RAMDISK: [mem 0x35e04000-0x36ef9fff]
and we fault at 0x35e04304.
And since this guest doesn't relocate the ramdisk, we don't do the
computation which will give us the correct virtual address and we end up
with the PA.
So, we should actually be using virtual addresses on 32-bit too by the
time we're freeing the initrd. Do that then!
Unpacking initramfs...
BUG: unable to handle kernel paging request at 35d4e304
IP: [<c042e905>] load_microcode_amd+0x25/0x4a0
*pde = 00000000
Oops: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.1-302.fc21.i686 #1
Hardware name: Xen HVM domU, BIOS 4.4.1 10/01/2014
task: f5098000 ti: f50d0000 task.ti: f50d0000
EIP: 0060:[<c042e905>] EFLAGS: 00010246 CPU: 0
EIP is at load_microcode_amd+0x25/0x4a0
EAX: 00000000 EBX: f6e9ec4c ECX: 00001ec4 EDX: 00000000
ESI: f5d4e000 EDI: 35d4e2fc EBP: f50d1ed0 ESP: f50d1e94
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 8005003b CR2: 35d4e304 CR3: 00e33000 CR4: 000406d0
Stack:
00000000 00000000 f50d1ebc f50d1ec4 f5d4e000 c0d7735a f50d1ed0 15a3d17f
f50d1ec4 00600f20 00001ec4 bfb83203 f6e9ec4c f5d4e000 c0d7735a f50d1ed8
c0d80861 f50d1ee0 c0d80429 f50d1ef0 c0d889a9 f5d4e000 c0000000 f50d1f04
Call Trace:
? unpack_to_rootfs
? unpack_to_rootfs
save_microcode_in_initrd_amd
save_microcode_in_initrd
free_initrd_mem
populate_rootfs
? unpack_to_rootfs
do_one_initcall
? unpack_to_rootfs
? repair_env_string
? proc_mkdir
kernel_init_freeable
kernel_init
ret_from_kernel_thread
? rest_init
Reported-and-tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
References: https://bugzilla.redhat.com/show_bug.cgi?id=1158204
Fixes: 75a1ba5b2c ("x86, microcode, AMD: Unify valid container checks")
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # v3.14+
Link: http://lkml.kernel.org/r/20141101100100.GA4462@pd.tnic
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
There are some AMD CPU models which have thresholding banks but which
cannot generate a thresholding interrupt. This is denoted by the bit
MCi_MISC[IntP]. Make sure to check that bit before assigning the
thresholding interrupt handler.
Signed-off-by: Chen Yucong <slaoub@gmail.com>
[ Boris: save an indentation level and rewrite commit message. ]
Link: http://lkml.kernel.org/r/1412662128.28440.18.camel@debian
Signed-off-by: Borislav Petkov <bp@suse.de>
Pull x86 fixes from Ingo Molnar:
"Fixes from all around the place:
- hyper-V 32-bit PAE guest kernel fix
- two IRQ allocation fixes on certain x86 boards
- intel-mid boot crash fix
- intel-quark quirk
- /proc/interrupts duplicate irq chip name fix
- cma boot crash fix
- syscall audit fix
- boot crash fix with certain TSC configurations (seen on Qemu)
- smpboot.c build warning fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, pageattr: Prevent overflow in slow_virt_to_phys() for X86_PAE
ACPI, irq, x86: Return IRQ instead of GSI in mp_register_gsi()
x86, intel-mid: Create IRQs for APB timers and RTC timers
x86: Don't enable F00F workaround on Intel Quark processors
x86/irq: Fix XT-PIC-XT-PIC in /proc/interrupts
x86, cma: Reserve DMA contiguous area after initmem_init()
i386/audit: stop scribbling on the stack frame
x86, apic: Handle a bad TSC more gracefully
x86: ACPI: Do not translate GSI number if IOAPIC is disabled
x86/smpboot: Move data structure to its primary usage scope
The file /sys/kernel/debug/tracing/eneabled_functions is used to debug
ftrace function hooks. Add to the output what function is being called
by the trampoline if the arch supports it.
Add support for this feature in x86_64.
Cc: H. Peter Anvin <hpa@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The current method of handling multiple function callbacks is to register
a list function callback that calls all the other callbacks based on
their hash tables and compare it to the function that the callback was
called on. But this is very inefficient.
For example, if you are tracing all functions in the kernel and then
add a kprobe to a function such that the kprobe uses ftrace, the
mcount trampoline will switch from calling the function trace callback
to calling the list callback that will iterate over all registered
ftrace_ops (in this case, the function tracer and the kprobes callback).
That means for every function being traced it checks the hash of the
ftrace_ops for function tracing and kprobes, even though the kprobes
is only set at a single function. The kprobes ftrace_ops is checked
for every function being traced!
Instead of calling the list function for functions that are only being
traced by a single callback, we can call a dynamically allocated
trampoline that calls the callback directly. The function graph tracer
already uses a direct call trampoline when it is being traced by itself
but it is not dynamically allocated. It's trampoline is static in the
kernel core. The infrastructure that called the function graph trampoline
can also be used to call a dynamically allocated one.
For now, only ftrace_ops that are not dynamically allocated can have
a trampoline. That is, users such as function tracer or stack tracer.
kprobes and perf allocate their ftrace_ops, and until there's a safe
way to free the trampoline, it can not be used. The dynamically allocated
ftrace_ops may, although, use the trampoline if the kernel is not
compiled with CONFIG_PREEMPT. But that will come later.
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
These patches:
86a349a28b ("perf/x86/intel: Add Broadwell core support")
c46e665f03 ("perf/x86: Add INST_RETIRED.ALL workarounds")
fdda3c4aac ("perf/x86/intel: Use Broadwell cache event list for Haswell")
introduced magic constants and unexplained changes:
https://lkml.org/lkml/2014/10/28/1128https://lkml.org/lkml/2014/10/27/325https://lkml.org/lkml/2014/8/27/546https://lkml.org/lkml/2014/10/28/546
Peter Zijlstra has attempted to help out, to clean up the mess:
https://lkml.org/lkml/2014/10/28/543
But has not received helpful and constructive replies which makes
me doubt wether it can all be finished in time until v3.18 is
released.
Despite various review feedback the author (Andi Kleen) has answered
only few of the review questions and has generally been uncooperative,
only giving replies when prompted repeatedly, and only giving minimal
answers instead of constructively explaining and helping along the effort.
That kind of behavior is not acceptable.
There's also a boot crash on Intel E5-1630 v3 CPUs reported for another
commit from Andi Kleen:
e735b9db12 ("perf/x86/intel/uncore: Add Haswell-EP uncore support")
https://lkml.org/lkml/2014/10/22/730
Which is not yet resolved. The uncore driver is independent in theory,
but the crash makes me worry about how well all these patches were
tested and makes me uneasy about the level of interminging that the
Broadwell and Haswell code has received by the commits above.
As a first step to resolve the mess revert the Broadwell client commits
back to the v3.17 version, before we run out of time and problematic
code hits a stable upstream kernel.
( If the Haswell-EP crash is not resolved via a simple fix then we'll have
to revert the Haswell-EP uncore driver as well. )
The Broadwell client series has to be submitted in a clean fashion, with
single, well documented changes per patch. If they are submitted in time
and are accepted during review then they can possibly go into v3.19 but
will need additional scrutiny due to the rocky history of this patch set.
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1409683455-29168-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Function mp_register_gsi() returns blindly the GSI number for the ACPI
SCI interrupt. That causes a regression when the GSI for ACPI SCI is
shared with other devices.
The regression was caused by commit 84245af729 "x86, irq, ACPI:
Change __acpi_register_gsi to return IRQ number instead of GSI" and
exposed on a SuperMicro system, which shares one GSI between ACPI SCI
and PCI device, with following failure:
http://sourceforge.net/p/linux1394/mailman/linux1394-user/?viewmonth=201410
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 20 low
level)
[ 2.699224] firewire_ohci 0000:06:00.0: failed to allocate interrupt
20
Return mp_map_gsi_to_irq(gsi, 0) instead of the GSI number.
Reported-and-Tested-by: Daniel Robbins <drobbins@funtoo.org>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: <stable@vger.kernel.org> # 3.17
Link: http://lkml.kernel.org/r/1414387308-27148-4-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Intel MID platforms has no legacy interrupts, so no IRQ descriptors
preallocated. We need to call mp_map_gsi_to_irq() to create IRQ
descriptors for APB timers and RTC timers, otherwise it may cause
invalid memory access as:
[ 0.116839] BUG: unable to handle kernel NULL pointer dereference at
0000003a
[ 0.123803] IP: [<c1071c0e>] setup_irq+0xf/0x4d
Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: <stable@vger.kernel.org> # 3.17
Link: http://lkml.kernel.org/r/1414387308-27148-3-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The Intel Quark processor is a part of family 5, but does not have the
F00F bug present in Pentiums of the same family.
Pentiums were models 0 through 8, Quark is model 9.
Signed-off-by: Dave Jones <davej@redhat.com>
Cc: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Link: http://lkml.kernel.org/r/20141028175753.GA12743@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fix duplicate XT-PIC seen in /proc/interrupts on x86 systems
that make use of 8259A Programmable Interrupt Controllers.
Specifically convert output like this:
CPU0
0: 76573 XT-PIC-XT-PIC timer
1: 11 XT-PIC-XT-PIC i8042
2: 0 XT-PIC-XT-PIC cascade
4: 8 XT-PIC-XT-PIC serial
6: 3 XT-PIC-XT-PIC floppy
7: 0 XT-PIC-XT-PIC parport0
8: 1 XT-PIC-XT-PIC rtc0
10: 448 XT-PIC-XT-PIC fddi0
12: 23 XT-PIC-XT-PIC eth0
14: 2464 XT-PIC-XT-PIC ide0
NMI: 0 Non-maskable interrupts
ERR: 0
to one like this:
CPU0
0: 122033 XT-PIC timer
1: 11 XT-PIC i8042
2: 0 XT-PIC cascade
4: 8 XT-PIC serial
6: 3 XT-PIC floppy
7: 0 XT-PIC parport0
8: 1 XT-PIC rtc0
10: 145 XT-PIC fddi0
12: 31 XT-PIC eth0
14: 2245 XT-PIC ide0
NMI: 0 Non-maskable interrupts
ERR: 0
that is one like we used to have from ~2.2 till it was changed
sometime.
The rationale is there is no value in this duplicate
information, it merely clutters output and looks ugly. We only
have one handler for 8259A interrupts so there is no need to
give it a name separate from the name already given to
irq_chip.
We could define meaningful names for handlers based on bits in
the ELCR register on systems that have it or the value of the
LTIM bit we use in ICW1 otherwise (hardcoded to 0 though with
MCA support gone), to tell edge-triggered and level-triggered
inputs apart. While that information does not affect 8259A
interrupt handlers it could help people determine which lines
are shareable and which are not. That is material for a
separate change though.
Any tools that parse /proc/interrupts are supposed not to be
affected since it was many years we used the format this change
converts back to.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.LFD.2.11.1410260147190.21390@eddie.linux-mips.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Drop double entry for pt_regs_bx.
This seems to be a typo - resulting in a double entry in the
generated include/generated/asm-offsets.h, which is not necessary.
Build-tested and booted on x86 64 box to make sure it was not
doing any strange magic.... after all it was in the kernel in
this form for almost 10 years.
Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20141027172805.GA19760@opentech.at
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy spotted the fail in what was intended as a conditional printk level.
Reported-by: Andy Lutomirski <luto@amacapital.net>
Fixes: cc6cd47e73 ("perf/x86: Tone down kernel messages when the PMU check fails in a virtual environment")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20141007124757.GH19379@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fengguang Wu reported a boot crash on the x86 platform
via the 0-day Linux Kernel Performance Test:
cma: dma_contiguous_reserve: reserving 31 MiB for global area
BUG: Int 6: CR2 (null)
[<41850786>] dump_stack+0x16/0x18
[<41d2b1db>] early_idt_handler+0x6b/0x6b
[<41072227>] ? __phys_addr+0x2e/0xca
[<41d4ee4d>] cma_declare_contiguous+0x3c/0x2d7
[<41d6d359>] dma_contiguous_reserve_area+0x27/0x47
[<41d6d4d1>] dma_contiguous_reserve+0x158/0x163
[<41d33e0f>] setup_arch+0x79b/0xc68
[<41d2b7cf>] start_kernel+0x9c/0x456
[<41d2b2ca>] i386_start_kernel+0x79/0x7d
(See details at: https://lkml.org/lkml/2014/10/8/708)
It is because dma_contiguous_reserve() is called before
initmem_init() in x86, the variable high_memory is not
initialized but accessed by __pa(high_memory) in
dma_contiguous_reserve().
This patch moves dma_contiguous_reserve() after initmem_init()
so that high_memory is initialized before accessed.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: iamjoonsoo.kim@lge.com
Cc: 'Linux-MM' <linux-mm@kvack.org>
Cc: 'Weijie Yang' <weijie.yang.kh@gmail.com>
Link: http://lkml.kernel.org/r/000101cfef69%2431e528a0%2495af79e0%24%25yang@samsung.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ok, new attempt, this time around with full ppgtt disabled again.
drm-intel-next-2014-10-03:
- first batch of skl stage 1 enabling
- fixes from Rodrigo to the PSR, fbc and sink crc code
- kerneldoc for the frontbuffer tracking code, runtime pm code and the basic
interrupt enable/disable functions
- smaller stuff all over
drm-intel-next-2014-09-19:
- bunch more i830M fixes from Ville
- full ppgtt now again enabled by default
- more ppgtt fixes from Michel Thierry and Chris Wilson
- plane config work from Gustavo Padovan
- spinlock clarifications
- piles of smaller improvements all over, as usual
* tag 'drm-intel-next-2014-10-03-no-ppgtt' of git://anongit.freedesktop.org/drm-intel: (114 commits)
Revert "drm/i915: Enable full PPGTT on gen7"
drm/i915: Update DRIVER_DATE to 20141003
drm/i915: Remove the duplicated logic between the two shrink phases
drm/i915: kerneldoc for interrupt enable/disable functions
drm/i915: Use dev_priv instead of dev in irq setup functions
drm/i915: s/pm._irqs_disabled/pm.irqs_enabled/
drm/i915: Clear TX FIFO reset master override bits on chv
drm/i915: Make sure hardware uses the correct swing margin/deemph bits on chv
drm/i915: make sink_crc return -EIO on aux read/write failure
drm/i915: Constify send buffer for intel_dp_aux_ch
drm/i915: De-magic the PSR AUX message
drm/i915: Reinstate error level message for non-simulated gpu hangs
drm/i915: Kerneldoc for intel_runtime_pm.c
drm/i915: Call runtime_pm_disable directly
drm/i915: Move intel_display_set_init_power to intel_runtime_pm.c
drm/i915: Bikeshed rpm functions name a bit.
drm/i915: Extract intel_runtime_pm.c
drm/i915: Remove intel_modeset_suspend_hw
drm/i915: spelling fixes for frontbuffer tracking kerneldoc
drm/i915: Tighting frontbuffer tracking around flips
...
git commit b4f0d3755c was very very dumb.
It was writing over %esp/pt_regs semi-randomly on i686 with the expected
"system can't boot" results. As noted in:
https://bugs.freedesktop.org/show_bug.cgi?id=85277
This patch stops fscking with pt_regs. Instead it sets up the registers
for the call to __audit_syscall_entry in the most obvious conceivable
way. It then does just a tiny tiny touch of magic. We need to get what
started in PT_EDX into 0(%esp) and PT_ESI into 4(%esp). This is as easy
as a pair of pushes.
After the call to __audit_syscall_entry all we need to do is get that
now useless junk off the stack (pair of pops) and reload %eax with the
original syscall so other stuff can keep going about it's business.
Reported-by: Paulo Zanoni <przanoni@gmail.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Link: http://lkml.kernel.org/r/1414037043-30647-1-git-send-email-eparis@redhat.com
Cc: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJURGCfAAoJEHm+PkMAQRiG6toH/RUazjqZxqMvLlm1y+O6+7s9
OpFdcDl4ZQtrvymBRYipu46pbDUoAAsVbxQJllaLNtHE0UrvaQE76WihBQYM8qW/
WoESLsZRbNQqQYQixf55pOozX7uIuG+9LKHagC8JNfD1Bw/nQ+RleSXqFsBCdpMW
i7SzcZBu2Iv+LnVmjvoGMOQa+loKzO6Pj1MpoHxxJQmeypH3dZR7mLVeBJNZQtLE
BGY47gYraVzb9EjKnSbjrIKjpM9o0MIihoanrrjnq0JMrfm4pi6W5GgaGDUiaBVH
w7Vmr5S2pjzrS41gKSVK9/XO1CrDG8tsp3QwA2+iIbjdR3wBDynyeG3UfnLABec=
=hwbG
-----END PGP SIGNATURE-----
Merge tag 'v3.18-rc1' into x86/urgent
Reason:
Need to apply audit patch on top of v3.18-rc1.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
If the TSC is unusable or disabled, then this patch fixes:
- Confusion while trying to clear old APIC interrupts.
- Division by zero and incorrect programming of the TSC deadline
timer.
This fixes boot if the CPU has a TSC deadline timer but a missing or
broken TSC. The failure to boot can be observed with qemu using
-cpu qemu64,-tsc,+tsc-deadline
This also happens to me in nested KVM for unknown reasons.
With this patch, I can boot cleanly (although without a TSC).
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Bandan Das <bsd@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/e2fa274e498c33988efac0ba8b7e3120f7f92d78.1413393027.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Aravind had the good question about why we're assigning a
software-defined bank when reporting error thresholding errors instead
of simply using the bank which reports the last error causing the
overflow.
Digging through git history, it pointed to
9526866439 ("[PATCH] x86_64: mce_amd support for family 0x10 processors")
which added that functionality. The problem with this, however, is that
tools don't know about software-defined banks and get puzzled. So drop
that K8_MCE_THRESHOLD_BASE and simply use the hw bank reporting the
thresholding interrupt.
Save us a couple of MSR reads while at it.
Reported-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Link: https://lkml.kernel.org/r/5435B206.60402@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Assigning to mce_threshold_vector is loop-invariant code in
mce_amd_feature_init(). So do it only once, out of loop body.
Signed-off-by: Chen Yucong <slaoub@gmail.com>
Link: http://lkml.kernel.org/r/1412263212.8085.6.camel@debian
[ Boris: commit message corrections. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
mce_setup() does not gather the content of IA32_MCG_STATUS, so it
should be read explicitly. Moreover, we need to clear IA32_MCx_STATUS
to avoid that mce_log() logs the processed threshold event again
at next time.
But we do the logging ourselves and machine_check_poll() is completely
useless there. So kill it.
Signed-off-by: Chen Yucong <slaoub@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Avoid open coded calculations for bank MSRs by hiding the index
of higher bank MSRs in well-defined macros.
No semantic changes.
Signed-off-by: Chen Yucong <slaoub@gmail.com>
Link: http://lkml.kernel.org/r/1411438561-24319-1-git-send-email-slaoub@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
When IOAPIC is disabled, acpi_gsi_to_irq() should return gsi directly
instead of calling mp_map_gsi_to_irq() to translate gsi to IRQ by IOAPIC.
It fixes https://bugzilla.kernel.org/show_bug.cgi?id=84381.
This regression was introduced with commit 6b9fb70824 "x86, ACPI,
irq: Consolidate algorithm of mapping (ioapic, pin) to IRQ number"
Reported-and-Tested-by: Thomas Richter <thor@math.tu-berlin.de>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Richter <thor@math.tu-berlin.de>
Cc: rui.zhang@intel.com
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: <stable@vger.kernel.org> # 3.17
Link: http://lkml.kernel.org/r/1413816327-12850-1-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull audit updates from Eric Paris:
"So this change across a whole bunch of arches really solves one basic
problem. We want to audit when seccomp is killing a process. seccomp
hooks in before the audit syscall entry code. audit_syscall_entry
took as an argument the arch of the given syscall. Since the arch is
part of what makes a syscall number meaningful it's an important part
of the record, but it isn't available when seccomp shoots the
syscall...
For most arch's we have a better way to get the arch (syscall_get_arch)
So the solution was two fold: Implement syscall_get_arch() everywhere
there is audit which didn't have it. Use syscall_get_arch() in the
seccomp audit code. Having syscall_get_arch() everywhere meant it was
a useless flag on the stack and we could get rid of it for the typical
syscall entry.
The other changes inside the audit system aren't grand, fixed some
records that had invalid spaces. Better locking around the task comm
field. Removing some dead functions and structs. Make some things
static. Really minor stuff"
* git://git.infradead.org/users/eparis/audit: (31 commits)
audit: rename audit_log_remove_rule to disambiguate for trees
audit: cull redundancy in audit_rule_change
audit: WARN if audit_rule_change called illegally
audit: put rule existence check in canonical order
next: openrisc: Fix build
audit: get comm using lock to avoid race in string printing
audit: remove open_arg() function that is never used
audit: correct AUDIT_GET_FEATURE return message type
audit: set nlmsg_len for multicast messages.
audit: use union for audit_field values since they are mutually exclusive
audit: invalid op= values for rules
audit: use atomic_t to simplify audit_serial()
kernel/audit.c: use ARRAY_SIZE instead of sizeof/sizeof[0]
audit: reduce scope of audit_log_fcaps
audit: reduce scope of audit_net_id
audit: arm64: Remove the audit arch argument to audit_syscall_entry
arm64: audit: Add audit hook in syscall_trace_enter/exit()
audit: x86: drop arch from __audit_syscall_entry() interface
sparc: implement is_32bit_task
sparc: properly conditionalize use of TIF_32BIT
...
Makes the code more readable by moving variable and usage closer
to each other, which also avoids this build warning in the
!CONFIG_HOTPLUG_CPU case:
arch/x86/kernel/smpboot.c:105:42: warning: ‘die_complete’ defined but not used [-Wunused-variable]
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Lan Tianyu <tianyu.lan@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: srostedt@redhat.com
Cc: toshi.kani@hp.com
Cc: imammedo@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1409039025-32310-1-git-send-email-tianyu.lan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull percpu consistent-ops changes from Tejun Heo:
"Way back, before the current percpu allocator was implemented, static
and dynamic percpu memory areas were allocated and handled separately
and had their own accessors. The distinction has been gone for many
years now; however, the now duplicate two sets of accessors remained
with the pointer based ones - this_cpu_*() - evolving various other
operations over time. During the process, we also accumulated other
inconsistent operations.
This pull request contains Christoph's patches to clean up the
duplicate accessor situation. __get_cpu_var() uses are replaced with
with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr().
Unfortunately, the former sometimes is tricky thanks to C being a bit
messy with the distinction between lvalues and pointers, which led to
a rather ugly solution for cpumask_var_t involving the introduction of
this_cpu_cpumask_var_ptr().
This converts most of the uses but not all. Christoph will follow up
with the remaining conversions in this merge window and hopefully
remove the obsolete accessors"
* 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits)
irqchip: Properly fetch the per cpu offset
percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix
ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write.
percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t
Revert "powerpc: Replace __get_cpu_var uses"
percpu: Remove __this_cpu_ptr
clocksource: Replace __this_cpu_ptr with raw_cpu_ptr
sparc: Replace __get_cpu_var uses
avr32: Replace __get_cpu_var with __this_cpu_write
blackfin: Replace __get_cpu_var uses
tile: Use this_cpu_ptr() for hardware counters
tile: Replace __get_cpu_var uses
powerpc: Replace __get_cpu_var uses
alpha: Replace __get_cpu_var
ia64: Replace __get_cpu_var uses
s390: cio driver &__get_cpu_var replacements
s390: Replace __get_cpu_var uses
mips: Replace __get_cpu_var uses
MIPS: Replace __get_cpu_var uses in FPU emulator.
arm: Replace __this_cpu_ptr with raw_cpu_ptr
...
Merge second patch-bomb from Andrew Morton:
- a few hotfixes
- drivers/dma updates
- MAINTAINERS updates
- Quite a lot of lib/ updates
- checkpatch updates
- binfmt updates
- autofs4
- drivers/rtc/
- various small tweaks to less used filesystems
- ipc/ updates
- kernel/watchdog.c changes
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (135 commits)
mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
kernel/param: consolidate __{start,stop}___param[] in <linux/moduleparam.h>
ia64: remove duplicate declarations of __per_cpu_start[] and __per_cpu_end[]
frv: remove unused declarations of __start___ex_table and __stop___ex_table
kvm: ensure hard lockup detection is disabled by default
kernel/watchdog.c: control hard lockup detection default
staging: rtl8192u: use %*pEn to escape buffer
staging: rtl8192e: use %*pEn to escape buffer
staging: wlan-ng: use %*pEhp to print SN
lib80211: remove unused print_ssid()
wireless: hostap: proc: print properly escaped SSID
wireless: ipw2x00: print SSID via %*pE
wireless: libertas: print esaped string via %*pE
lib/vsprintf: add %*pE[achnops] format specifier
lib / string_helpers: introduce string_escape_mem()
lib / string_helpers: refactoring the test suite
lib / string_helpers: move documentation to c-file
include/linux: remove strict_strto* definitions
arch/x86/mm/numa.c: fix boot failure when all nodes are hotpluggable
fs: check bh blocknr earlier when searching lru
...
Pull x86 ras, uv and vdso fixlets from Ingo Molnar:
"ras: tone down a kernel message to only occur during initial bootup,
not during suspend/resume cycles.
uv: a cleanup commit
vdso: a fix to error checking"
* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Avoid showing repetitive message from intel_init_thermal()
* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic/uv: Remove unnecessary #ifdef
* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vdso: Fix vdso2c's special_pages[] error checking
Pull x86 fixes from Ingo Molnar:
"Misc smaller fixes that missed the v3.17 cycle"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/build: Add arch/x86/purgatory/ make generated files to gitignore
x86: Fix section conflict for numachip
x86: Reject x32 executables if x32 ABI not supported
x86_64, entry: Filter RFLAGS.NT on entry from userspace
x86, boot, kaslr: Fix nuisance warning on 32-bit builds
Pull x86 seccomp changes from Ingo Molnar:
"This tree includes x86 seccomp filter speedups and related preparatory
work, which touches core seccomp facilities as well.
The main idea is to split seccomp into two phases, to be able to enter
a simple fast path for syscalls with ptrace side effects.
There's no substantial user-visible (and ABI) effects expected from
this, except a change in how we emit a better audit record for
SECCOMP_RET_TRACE events"
* 'x86-seccomp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86_64, entry: Use split-phase syscall_trace_enter for 64-bit syscalls
x86_64, entry: Treat regs->ax the same in fastpath and slowpath syscalls
x86: Split syscall_trace_enter into two phases
x86, entry: Only call user_exit if TIF_NOHZ
x86, x32, audit: Fix x32's AUDIT_ARCH wrt audit
seccomp: Document two-phase seccomp and arch-provided seccomp_data
seccomp: Allow arch code to provide seccomp_data
seccomp: Refactor the filter callback and the API
seccomp,x86,arm,mips,s390: Remove nr parameter from secure_computing
Pull x86 platform updates from Ingo Molnar:
"The main changes in this tree are:
- fix and update Intel Quark [Galileo] SoC platform support
- update IOSF chipset side band interface and make it available via
debugfs
- enable HPETs on Soekris net6501 and other e6xx based systems"
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Add cpu_detect_cache_sizes to init_intel() add Quark legacy_cache()
x86: Quark: Comment setup_arch() to document TLB/PGE bug
x86/intel/quark: Switch off CR4.PGE so TLB flush uses CR3 instead
x86/platform/intel/iosf: Add debugfs config option for IOSF
x86/platform/intel/iosf: Add better description of IOSF driver in config
x86/platform/intel/iosf: Add Braswell PCI ID
x86/platform/pmc_atom: Fix warning when CONFIG_DEBUG_FS=n
x86: HPET force enable for e6xx based systems
x86/iosf: Add debugfs support
x86/iosf: Add Kconfig prompt for IOSF_MBI selection
Pull x86 mm updates from Ingo Molnar:
"This tree includes the following changes:
- fix memory hotplug
- fix hibernation bootup memory layout assumptions
- fix hyperv numa guest kernel messages
- remove dead code
- update documentation"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Update memory map description to list hypervisor-reserved area
x86/mm, hibernate: Do not assume the first e820 area to be RAM
x86/mm/numa: Drop dead code and rename setup_node_data() to setup_alloc_data()
x86/mm/hotplug: Modify PGD entry when removing memory
x86/mm/hotplug: Pass sync_global_pgds() a correct argument in remove_pagetable()
x86: Remove set_pmd_pfn
Pull x86 FPU updates from Ingo Molnar:
"x86 FPU handling fixes, cleanups and enhancements from Oleg.
The signal handling race fix and the __restore_xstate_sig() preemption
fix for eager-mode is marked for -stable as well"
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: copy_thread: Don't nullify ->ptrace_bps twice
x86, fpu: Shift "fpu_counter = 0" from copy_thread() to arch_dup_task_struct()
x86, fpu: copy_process: Sanitize fpu->last_cpu initialization
x86, fpu: copy_process: Avoid fpu_alloc/copy if !used_math()
x86, fpu: Change __thread_fpu_begin() to use use_eager_fpu()
x86, fpu: __restore_xstate_sig()->math_state_restore() needs preempt_disable()
x86, fpu: shift drop_init_fpu() from save_xstate_sig() to handle_signal()
Pull x86 cpufeature updates from Ingo Molnar:
"This tree includes the following changes:
- Introduce DISABLED_MASK to list disabled CPU features, to simplify
CPU feature handling and avoid excessive #ifdefs
- Remove the lightly used cpu_has_pae() primitive"
* 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Add more disabled features
x86: Introduce disabled-features
x86: Axe the lightly-used cpu_has_pae
Use watchdog_enable_hardlockup_detector() to set hard lockup detection's
default value to false. It's risky to run this detection in a guest, as
false positives are easy to trigger, especially if the host is
overcommitted.
Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
x86_64 allnoconfig:
arch/x86/kernel/cpu/common.c:968: warning: 'syscall32_cpu_init' defined but not used
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>