CPUs which should support the RAPL counters according to
Family/Model/Stepping may still issue #GP when attempting to access
the RAPL MSRs. This may happen when Linux is running under KVM and
we are passing-through host F/M/S data, for example. Use rdmsrl_safe
to first access the RAPL_POWER_UNIT MSR; if this fails, do not
attempt to use this PMU.
Signed-off-by: Venkatesh Srinivas <venkateshs@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1394739386-22260-1-git-send-email-venkateshs@google.com
Cc: zheng.z.yan@intel.com
Cc: eranian@google.com
Cc: ak@linux.intel.com
Cc: linux-kernel@vger.kernel.org
[ The patch also silently fixes another bug: rapl_pmu_init() didn't handle the memory alloc failure case previously. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Change branch_setup_xol_ops() to simply use opc1 = OPCODE2(insn) - 0x10
if OPCODE1() == 0x0f; this matches the "short" jmp which checks the same
condition.
Thanks to lib/insn.c, it does the rest correctly. branch->ilen/offs are
correct no matter if this jmp is "near" or "short".
Reported-by: Jonathan Lebon <jlebon@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Teach branch_emulate_op() to emulate the conditional "short" jmp's which
check regs->flags.
Note: this doesn't support jcxz/jcexz, loope/loopz, and loopne/loopnz.
They all are rel8 and thus they can't trigger the problem, but perhaps
we will add the support in future just for completeness.
Reported-by: Jonathan Lebon <jlebon@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
See the previous "Emulate unconditional relative jmp's" which explains
why we can not execute "jmp" out-of-line, the same applies to "call".
Emulating of rip-relative call is trivial, we only need to additionally
push the ret-address. If this fails, we execute this instruction out of
line and this should trigger the trap, the probed application should die
or the same insn will be restarted if a signal handler expands the stack.
We do not even need ->post_xol() for this case.
But there is a corner (and almost theoretical) case: another thread can
expand the stack right before we execute this insn out of line. In this
case it hit the same problem we are trying to solve. So we simply turn
the probed insn into "call 1f; 1:" and add ->post_xol() which restores
->sp and restarts.
Many thanks to Jonathan who finally found the standalone reproducer,
otherwise I would never resolve the "random SIGSEGV's under systemtap"
bug-report. Now that the problem is clear we can write the simplified
test-case:
void probe_func(void), callee(void);
int failed = 1;
asm (
".text\n"
".align 4096\n"
".globl probe_func\n"
"probe_func:\n"
"call callee\n"
"ret"
);
/*
* This assumes that:
*
* - &probe_func = 0x401000 + a_bit, aligned = 0x402000
*
* - xol_vma->vm_start = TASK_SIZE_MAX - PAGE_SIZE = 0x7fffffffe000
* as xol_add_vma() asks; the 1st slot = 0x7fffffffe080
*
* so we can target the non-canonical address from xol_vma using
* the simple math below, 100 * 4096 is just the random offset
*/
asm (".org . + 0x800000000000 - 0x7fffffffe080 - 5 - 1 + 100 * 4096\n");
void callee(void)
{
failed = 0;
}
int main(void)
{
probe_func();
return failed;
}
It SIGSEGV's if you probe "probe_func" (although this is not very reliable,
randomize_va_space/etc can change the placement of xol area).
Note: as Denys Vlasenko pointed out, amd and intel treat "callw" (0x66 0xe8)
differently. This patch relies on lib/insn.c and thus implements the intel's
behaviour: 0x66 is simply ignored. Fortunately nothing sane should ever use
this insn, so we postpone the fix until we decide what should we do; emulate
or not, support or not, etc.
Reported-by: Jonathan Lebon <jlebon@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Finally we can kill the ugly (and very limited) code in __skip_sstep().
Just change branch_setup_xol_ops() to treat "nop" as jmp to the next insn.
Thanks to lib/insn.c, it is clever enough. OPCODE1() == 0x90 includes
"(rep;)+ nop;" at least, and (afaics) much more.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Currently we always execute all insns out-of-line, including relative
jmp's and call's. This assumes that even if regs->ip points to nowhere
after the single-step, default_post_xol_op(UPROBE_FIX_IP) logic will
update it correctly.
However, this doesn't work if this regs->ip == xol_vaddr + insn_offset
is not canonical. In this case CPU generates #GP and general_protection()
kills the task which tries to execute this insn out-of-line.
Now that we have uprobe_xol_ops we can teach uprobes to emulate these
insns and solve the problem. This patch adds branch_xol_ops which has
a single branch_emulate_op() hook, so far it can only handle rel8/32
relative jmp's.
TODO: move ->fixup into the union along with rip_rela_target_address.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Jonathan Lebon <jlebon@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
1. Add the trivial sizeof_long() helper and change other callers of
is_ia32_task() to use it.
TODO: is_ia32_task() is not what we actually want, TS_COMPAT does
not necessarily mean 32bit. Fortunately syscall-like insns can't be
probed so it actually works, but it would be better to rename and
use is_ia32_frame().
2. As Jim pointed out "ncopied" in arch_uretprobe_hijack_return_addr()
and adjust_ret_addr() should be named "nleft". And in fact only the
last copy_to_user() in arch_uretprobe_hijack_return_addr() actually
needs to inspect the non-zero error code.
TODO: adjust_ret_addr() should die. We can always calculate the value
we need to write into *regs->sp, just UPROBE_FIX_CALL should record
insn->length.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
SIGILL after the failed arch_uprobe_post_xol() should only be used as
a last resort, we should try to restart the probed insn if possible.
Currently only adjust_ret_addr() can fail, and this can only happen if
another thread unmapped our stack after we executed "call" out-of-line.
Most probably the application if buggy, but even in this case it can
have a handler for SIGSEGV/etc. And in theory it can be even correct
and do something non-trivial with its memory.
Of course we can't restart unconditionally, so arch_uprobe_post_xol()
does this only if ->post_xol() returns -ERESTART even if currently this
is the only possible error.
default_post_xol_op(UPROBE_FIX_CALL) can always restart, but as Jim
pointed out it should not forget to pop off the return address pushed
by this insn executed out-of-line.
Note: this is not "perfect", we do not want the extra handler_chain()
after restart, but I think this is the best solution we can realistically
do without too much uglifications.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Currently the error from arch_uprobe_post_xol() is silently ignored.
This doesn't look good and this can lead to the hard-to-debug problems.
1. Change handle_singlestep() to loudly complain and send SIGILL.
Note: this only affects x86, ppc/arm can't fail.
2. Change arch_uprobe_post_xol() to call arch_uprobe_abort_xol() and
avoid TF games if it is going to return an error.
This can help to to analyze the problem, if nothing else we should
not report ->ip = xol_slot in the core-file.
Note: this means that handle_riprel_post_xol() can be called twice,
but this is fine because it is idempotent.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
arch_uprobe_analyze_insn() calls handle_riprel_insn() at the start,
but only "0xff" and "default" cases need the UPROBE_FIX_RIP_ logic.
Move the callsite into "default" case and change the "0xff" case to
fall-through.
We are going to add the various hooks to handle the rip-relative
jmp/call instructions (and more), we need this change to enforce the
fact that the new code can not conflict with is_riprel_insn() logic
which, after this change, can only be used by default_xol_ops.
Note: arch_uprobe_abort_xol() still calls handle_riprel_post_xol()
directly. This is fine unless another _xol_ops we may add later will
need to reuse "UPROBE_FIX_RIP_AX|UPROBE_FIX_RIP_CX" bits in ->fixup.
In this case we can add uprobe_xol_ops->abort() hook, which (perhaps)
we will need anyway in the long term.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Introduce arch_uprobe->ops pointing to the "struct uprobe_xol_ops",
move the current UPROBE_FIX_{RIP*,IP,CALL} code into the default
set of methods and change arch_uprobe_pre/post_xol() accordingly.
This way we can add the new uprobe_xol_ops's to handle the insns
which need the special processing (rip-relative jmp/call at least).
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
No functional changes. Preparation to simplify the review of the next
change. Just reorder the code in arch_uprobe_pre/post_xol() functions
so that UPROBE_FIX_{RIP_*,IP,CALL} logic goes to the end.
Also change arch_uprobe_pre_xol() to use utask instead of autask, to
make the code more symmetrical with arch_uprobe_post_xol().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cosmetic. Move pre_xol_rip_insn() and handle_riprel_post_xol() up to
the closely related handle_riprel_insn(). This way it is simpler to
read and understand this code, and this lessens the number of ifdef's.
While at it, update the comment in handle_riprel_post_xol() as Jim
suggested.
TODO: rename them somehow to make the naming consistent.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Kill the "mm->context.ia32_compat" check in handle_riprel_insn(), if
it is true insn_rip_relative() must return false. validate_insn_bits()
passed "ia32_compat" as !x86_64 to insn_init(), and insn_rip_relative()
checks insn->x86_64.
Also, remove the no longer needed "struct mm_struct *mm" argument and
the unnecessary "return" at the end.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
No functional changes, preparation.
Shift the code from prepare_fixups() to arch_uprobe_analyze_insn()
with the following modifications:
- Do not call insn_get_opcode() again, it was already called
by validate_insn_bits().
- Move "case 0xea" up. This way "case 0xff" can fall through
to default case.
- change "case 0xff" to use the nested "switch (MODRM_REG)",
this way the code looks a bit simpler.
- Make the comments look consistent.
While at it, kill the initialization of rip_rela_target_address and
->fixups, we can rely on kzalloc(). We will add the new members into
arch_uprobe, it would be better to assume that everything is zero by
default.
TODO: cleanup/fix the mess in validate_insn_bits() paths:
- validate_insn_64bits() and validate_insn_32bits() should be
unified.
- "ifdef" is not used consistently; if good_insns_64 depends
on CONFIG_X86_64, then probably good_insns_32 should depend
on CONFIG_X86_32/EMULATION
- the usage of mm->context.ia32_compat looks wrong if the task
is TIF_X32.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Current kprobes in-kernel page fault handler doesn't
expect that its single-stepping can be interrupted by
an NMI handler which may cause a page fault(e.g. perf
with callback tracing).
In that case, the page-fault handled by kprobes and it
misunderstands the page-fault has been caused by the
single-stepping code and tries to recover IP address
to probed address.
But the truth is the page-fault has been caused by the
NMI handler, and do_page_fault failes to handle real
page fault because the IP address is modified and
causes Kernel BUGs like below.
----
[ 2264.726905] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
[ 2264.727190] IP: [<ffffffff813c46e0>] copy_user_generic_string+0x0/0x40
To handle this correctly, I fixed the kprobes fault
handler to ensure the faulted ip address is its own
single-step buffer instead of checking current kprobe
state.
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Sandeepa Prabhu <sandeepa.prabhu@linaro.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: fche@redhat.com
Cc: systemtap@sourceware.org
Link: http://lkml.kernel.org/r/20140417081644.26341.52351.stgit@ltc230.yrl.intra.hitachi.co.jp
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 fixes from Ingo Molnar:
"Various fixes:
- reboot regression fix
- build message spam fix
- GPU quirk fix
- 'make kvmconfig' fix
plus the wire-up of the renameat2() system call on i386"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Remove the PCI reboot method from the default chain
x86/build: Supress "Nothing to be done for ..." messages
x86/gpu: Fix sign extension issue in Intel graphics stolen memory quirks
x86/platform: Fix "make O=dir kvmconfig"
i386: Wire up the renameat2() syscall
Pull perf fixes from Ingo Molnar:
"Tooling fixes, plus a simple hardware-enablement patch for the Intel
RAPL PMU (energy use measurement) on Haswell CPUs, which I hope is
still fine at this stage"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf tools: Instead of redirecting flex output, use -o
perf tools: Fix double free in perf test 21 (code-reading.c)
perf stat: Initialize statistics correctly
perf bench: Set more defaults in the 'numa' suite
perf bench: Fix segfault at the end of an 'all' execution
perf bench: Update manpage to mention numa and futex
perf probe: Use dwarf_getcfi_elf() instead of dwarf_getcfi()
perf probe: Fix to handle errors in line_range searching
perf probe: Fix --line option behavior
perf tools: Pick up libdw without explicit LIBDW_DIR
MAINTAINERS: Change e-mail to kernel.org one
perf callchains: Disable unwind libraries when libelf isn't found
tools lib traceevent: Do not call warning() directly
tools lib traceevent: Print event name when show warning if possible
perf top: Fix documentation of invalid -s option
perf/x86: Enable DRAM RAPL support on Intel Haswell
- Fix a couple of barnsjukdomar on the Rockchip driver.
- Remove an idiotic debug print I happened to leave behind
in the Nomadik driver.
- Fixup the Qualcomm MSM interrupt handling code for the
TLMM v2.
- Three patches renaming the Broadcom Capri driver to
BCM28155. This has been falling between the chairs for
some time due to some cross-tree synchronization
misunderstandings, now I'm fed up with this and just
rename it in this -rc1 phase.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJTTPPzAAoJEEEQszewGV1zSBgP/iMS9y9LAouPmzkXseCeT4aM
eb2pEi1wtOSWCeGNMIx4X1GRkbHS+T+5Wk6dmh46RUn/8b1HY4gkLoJLnrQGML1l
zS9tdDyURGmGYuVAr0ghLq0LTruvcViCQageRzO8yllTkJb8Tf6rfKE2+y9BsGRH
CdBIE9/XYP2Z2Wwd0fLAPMFpa9wPz8eNeF7XGyQ20+DSuRzNMmDq6AUhmlCfIMnL
TxlJIT1vpAAt/e3wcRvmr2n/Nrlz28ajP/VmzSm2dSqTajy8ofWqgFwQLiItJJ3q
VuJto3eKy+xGT4IVO+ozXCZd0kDgaAeiz7PNWHpbFBec0y4QFmVFxtPpw/Zff7RH
136Vh0KahX47TaJ1GGvB5622OLjsQzwH2TY24Sn9WRzNT8VS0pv2F3RQk8tVrWrd
fquQksuMEaCay+MHcBhI1mJlQcgTFJNsenmY8KOIjFykeBz5x6bOtBkbOCjzuogr
yVgaOW/zV7b0zFQ6Vv9eZFf7WYhBXE7w1Im5D2EMR9mJpNBWbj9A8GQpCjc0+VXB
jN2hWmj5qQtQW6z67VEn3l8Mqzpazsu61zbcB3F4Ma0m247vakIkk5I8GMmLW3/c
UMr6RG2IHIaECNdS1Ir1UkBnDCFb7CQq3j0Bh2UynyU9jKGtRZU/P7SdD7LqHsi+
Q56kdNSbYdRPpjPWmZUA
=i5qY
-----END PGP SIGNATURE-----
Merge tag 'pinctrl-v3.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pincontrol fixes from Linus Walleij:
"A first set of pin control fixes for the v3.15 series:
- Fix a couple of barnsjukdomar on the Rockchip driver.
- Remove an idiotic debug print I happened to leave behind in the
Nomadik driver.
- Fixup the Qualcomm MSM interrupt handling code for the TLMM v2.
- Three patches renaming the Broadcom Capri driver to BCM28155. This
has been falling between the chairs for some time due to some
cross-tree synchronization misunderstandings, now I'm fed up with
this and just rename it in this -rc1 phase"
* tag 'pinctrl-v3.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: fix typo in bindings documentation
Update bcm_defconfig with new pinctrl CONFIG
pinctrl: Rename Broadcom Capri pinctrl driver
pinctrl: msm: Correct interrupt code for TLMM v2
pinctrl: nomadik: delete stray debug print
pinctrl: rockchip: handle first half of rk3188-bank0 correctly
pinctrl: rockchip: add return value to rockchip_set_mux
pinctrl: rockchip: fix offset of mux registers for rk3188
Pull s390 patches from Martin Schwidefsky:
"An update to the oops output with additional information about the
crash. The renameat2 system call is enabled. Two patches in regard
to the PTR_ERR_OR_ZERO cleanup. And a bunch of bug fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/sclp_cmd: replace PTR_RET with PTR_ERR_OR_ZERO
s390/sclp: replace PTR_RET with PTR_ERR_OR_ZERO
s390/sclp_vt220: Fix kernel panic due to early terminal input
s390/compat: fix typo
s390/uaccess: fix possible register corruption in strnlen_user_srst()
s390: add 31 bit warning message
s390: wire up sys_renameat2
s390: show_registers() should not map user space addresses to kernel symbols
s390/mm: print control registers and page table walk on crash
s390/smp: fix smp_stop_cpu() for !CONFIG_SMP
s390: fix control register update
April 2014 Itanium processor specification update:
http://www.intel.com/content/www/us/en/processors/itanium/itanium-specification-update.html
describes this erratum:
=========================================================================
237. Under a complex set of conditions, store to load forwarding for a
sub 8-byte load may complete incorrectly
Problem: A load instruction may complete incorrectly when a code sequence
using 4-byte or smaller load and store operations to the same address
is executed in combination with specific timing of all the following
concurrent conditions: store to load forwarding, alignment checking
enabled, a mis-predicted branch, and complex cache utilization activity.
Implication: The affected sub 8-byte instruction may complete
incorrectly resulting in unpredictable system behavior. There is an
extremely low probability of exposure due to the significant number of
complex microarchitectural concurrent conditions required to encounter
the erratum.
Workaround: Set PSR.ac = 0 to completely avoid the erratum. Disabling
Hyper-Threading will significantly reduce exposure to the conditions
that contribute to encountering the erratum.
Status: See the Summary Table of Changes for the affected steppings.
=========================================================================
[Table of changes essentially lists all models from McKinley to Tukwila]
The PSR.ac bit controls whether the processor will always generate
an unaligned reference trap (0x5a00) for a misaligned data access
(when PSR.ac=1) or if it will let the access succeed when running
on a cpu that implements logic to handle some unaligned accesses.
Way back in 2008 in commit b704882e70
[IA64] Rationalize kernel mode alignment checking
we made the decision to always enable strict checking. We were
already doing so in trap/interrupt context because the common
preamble code set this bit - but the rest of supervisor code
(and by inheritance user code) ran with PSR.ac=0.
We now reverse that decision and set PSR.ac=0 everywhere in the
kernel (also inherited by user processes). This will avoid the
erratum using the method described in the Itanium specification
update. Net effect for users is that the processor will handle
unaligned access when it can (typically with a tiny performance
bubble in the pipeline ... but much less invasive than taking a
trap and having the OS perform the access).
Signed-off-by: Tony Luck <tony.luck@intel.com>
Steve reported a reboot hang and bisected it back to this commit:
a4f1987e4c x86, reboot: Add EFI and CF9 reboot methods into the default list
He heroically tested all reboot methods and found the following:
reboot=t # triple fault ok
reboot=k # keyboard ctrl FAIL
reboot=b # BIOS ok
reboot=a # ACPI FAIL
reboot=e # EFI FAIL [system has no EFI]
reboot=p # PCI 0xcf9 FAIL
And I think it's pretty obvious that we should only try PCI 0xcf9 as a
last resort - if at all.
The other observation is that (on this box) we should never try
the PCI reboot method, but close with either the 'triple fault'
or the 'BIOS' (terminal!) reboot methods.
Thirdly, CF9_COND is a total misnomer - it should be something like
CF9_SAFE or CF9_CAREFUL, and 'CF9' should be 'CF9_FORCE' ...
So this patch fixes the worst problems:
- it orders the actual reboot logic to follow the reboot ordering
pattern - it was in a pretty random order before for no good
reason.
- it fixes the CF9 misnomers and uses BOOT_CF9_FORCE and
BOOT_CF9_SAFE flags to make the code more obvious.
- it tries the BIOS reboot method before the PCI reboot method.
(Since 'BIOS' is a terminal reboot method resulting in a hang
if it does not work, this is essentially equivalent to removing
the PCI reboot method from the default reboot chain.)
- just for the miraculous possibility of terminal (resulting
in hang) reboot methods of triple fault or BIOS returning
without having done their job, there's an ordering between
them as well.
Reported-and-bisected-and-tested-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Li Aubrey <aubrey.li@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Link: http://lkml.kernel.org/r/20140404064120.GB11877@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull KVM fixes from Marcelo Tosatti:
- Fix for guest triggerable BUG_ON (CVE-2014-0155)
- CR4.SMAP support
- Spurious WARN_ON() fix
* git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: remove WARN_ON from get_kernel_ns()
KVM: Rename variable smep to cr4_smep
KVM: expose SMAP feature to guest
KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode
KVM: Add SMAP support when setting CR4
KVM: Remove SMAP bit from CR4_RESERVED_BITS
KVM: ioapic: try to recover if pending_eoi goes out of range
KVM: ioapic: fix assignment of ioapic->rtc_status.pending_eoi (CVE-2014-0155)
Rename variable smep to cr4_smep, which can better reflect the
meaning of the variable.
Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
SMAP is disabled if CPU is in non-paging mode in hardware.
However KVM always uses paging mode to emulate guest non-paging
mode with TDP. To emulate this behavior, SMAP needs to be
manually disabled when guest switches to non-paging mode.
Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch adds SMAP handling logic when setting CR4 for guests
Thanks a lot to Paolo Bonzini for his suggestion to use the branchless
way to detect SMAP violation.
Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
When we build an already built kernel again, arch/x86/syscalls/Makefile
and arch/x86/tools/Makefile emits "Nothing to be done for ..."
messages.
Here is the command log:
$ make defconfig
[ snip ]
$ make
[ snip ]
$ make
make[1]: Nothing to be done for `all'. <-----
make[1]: Nothing to be done for `relocs'. <-----
CHK include/config/kernel.release
CHK include/generated/uapi/linux/version.h
Besides not emitting those, "all" and "relocs" should be added to PHONY as well.
Signed-off-by: Masahiro Yamada <yamada.m@jp.panasonic.com>
Acked-by: Peter Foley <pefoley2@pefoley.com>
Acked-by: Michal Marek <mmarek@suse.cz>
Link: http://lkml.kernel.org/r/1397093742-11144-1-git-send-email-yamada.m@jp.panasonic.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
To be consistent with other Broadcom drivers, the Broadcom Capri pinctrl
driver and its related CONFIG option are renamed to bcm281xx.
This commit updates the defconfig that enables the pinctrl driver.
Signed-off-by: Sherman Yin <syin@broadcom.com>
Reviewed-by: Matt Porter <mporter@linaro.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Have the KB(),MB(),GB() macros produce unsigned longs to avoid
unintended sign extension issues with the gen2 memory size
detection.
What happens is first the uint8_t returned by
read_pci_config_byte() gets promoted to an int which gets
multiplied by another int from the MB() macro, and finally the
result gets sign extended to size_t.
Although this shouldn't be a problem in practice as all affected
gen2 platforms are 32bit AFAIK, so size_t will be 32 bits.
Reported-by: Bjorn Helgaas <bhelgaas@google.com>
Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1397382303-17525-1-git-send-email-ville.syrjala@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Running:
make O=dir x86_64_defconfig
make O=dir kvmconfig
the second command dirties the source tree with file ".config",
symlink "source" and objects in folder "scripts".
Fixed by using properly prefixed paths in the arch Makefile.
Signed-off-by: Antonio Borneo <borneo.antonio@gmail.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Pekka Enberg <penberg@kernel.org>
Link: http://lkml.kernel.org/r/1397377568-8375-1-git-send-email-borneo.antonio@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit 8f619b5429 ("powerpc/ppc64: Do not turn AIL (reloc-on
interrupts) too early") added code to set the AIL bit in the LPCR
without checking whether the kernel is running in hypervisor mode. The
result is that when the kernel is running as a guest (i.e., under
PowerKVM or PowerVM), the processor takes a privileged instruction
interrupt at that point, causing a panic. The visible result is that
the kernel hangs after printing "returning from prom_init".
This fixes it by checking for hypervisor mode being available before
setting LPCR. If we are not in hypervisor mode, we enable relocation-on
interrupts later in pSeries_setup_arch using the H_SET_MODE hcall.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iEYEABECAAYFAlNFtEkACgkQuseO5dulBZVcPwCdFWY81hKqQaKHaSPFh9m+n1lt
yY0An2VZZGrFSkj162POFy1P2sPpGw5p
=LRFZ
-----END PGP SIGNATURE-----
Merge tag 'llvmlinux-for-v3.15' of git://git.linuxfoundation.org/llvmlinux/kernel
Pull llvm patches from Behan Webster:
"These are some initial updates to support compiling the kernel with
clang.
These patches have been through the proper reviews to the best of my
ability, and have been soaking in linux-next for a few weeks. These
patches by themselves still do not completely allow clang to be used
with the kernel code, but lay the foundation for other patches which
are still under review.
Several other of the LLVMLinux patches have been already added via
maintainer trees"
* tag 'llvmlinux-for-v3.15' of git://git.linuxfoundation.org/llvmlinux/kernel:
x86: LLVMLinux: Fix "incomplete type const struct x86cpu_device_id"
x86 kbuild: LLVMLinux: More cc-options added for clang
x86, acpi: LLVMLinux: Remove nested functions from Thinkpad ACPI
LLVMLinux: Add support for clang to compiler.h and new compiler-clang.h
LLVMLinux: Remove warning about returning an uninitialized variable
kbuild: LLVMLinux: Fix LINUX_COMPILER definition script for compilation with clang
Documentation: LLVMLinux: Update Documentation/dontdiff
kbuild: LLVMLinux: Adapt warnings for compilation with clang
kbuild: LLVMLinux: Add Kbuild support for building kernel with Clang
Pull vfs updates from Al Viro:
"The first vfs pile, with deep apologies for being very late in this
window.
Assorted cleanups and fixes, plus a large preparatory part of iov_iter
work. There's a lot more of that, but it'll probably go into the next
merge window - it *does* shape up nicely, removes a lot of
boilerplate, gets rid of locking inconsistencie between aio_write and
splice_write and I hope to get Kent's direct-io rewrite merged into
the same queue, but some of the stuff after this point is having
(mostly trivial) conflicts with the things already merged into
mainline and with some I want more testing.
This one passes LTP and xfstests without regressions, in addition to
usual beating. BTW, readahead02 in ltp syscalls testsuite has started
giving failures since "mm/readahead.c: fix readahead failure for
memoryless NUMA nodes and limit readahead pages" - might be a false
positive, might be a real regression..."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
missing bits of "splice: fix racy pipe->buffers uses"
cifs: fix the race in cifs_writev()
ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure
kill generic_file_buffered_write()
ocfs2_file_aio_write(): switch to generic_perform_write()
ceph_aio_write(): switch to generic_perform_write()
xfs_file_buffered_aio_write(): switch to generic_perform_write()
export generic_perform_write(), start getting rid of generic_file_buffer_write()
generic_file_direct_write(): get rid of ppos argument
btrfs_file_aio_write(): get rid of ppos
kill the 5th argument of generic_file_buffered_write()
kill the 4th argument of __generic_file_aio_write()
lustre: don't open-code kernel_recvmsg()
ocfs2: don't open-code kernel_recvmsg()
drbd: don't open-code kernel_recvmsg()
constify blk_rq_map_user_iov() and friends
lustre: switch to kernel_sendmsg()
ocfs2: don't open-code kernel_sendmsg()
take iov_iter stuff to mm/iov_iter.c
process_vm_access: tidy up a bit
...
Pull audit updates from Eric Paris.
* git://git.infradead.org/users/eparis/audit: (28 commits)
AUDIT: make audit_is_compat depend on CONFIG_AUDIT_COMPAT_GENERIC
audit: renumber AUDIT_FEATURE_CHANGE into the 1300 range
audit: do not cast audit_rule_data pointers pointlesly
AUDIT: Allow login in non-init namespaces
audit: define audit_is_compat in kernel internal header
kernel: Use RCU_INIT_POINTER(x, NULL) in audit.c
sched: declare pid_alive as inline
audit: use uapi/linux/audit.h for AUDIT_ARCH declarations
syscall_get_arch: remove useless function arguments
audit: remove stray newline from audit_log_execve_info() audit_panic() call
audit: remove stray newlines from audit_log_lost messages
audit: include subject in login records
audit: remove superfluous new- prefix in AUDIT_LOGIN messages
audit: allow user processes to log from another PID namespace
audit: anchor all pid references in the initial pid namespace
audit: convert PPIDs to the inital PID namespace.
pid: get pid_t ppid of task in init_pid_ns
audit: rename the misleading audit_get_context() to audit_take_context()
audit: Add generic compat syscall support
audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL
...
In v3.2 the Analog Devices ADT75 temperature sensor driver was removed
as an IIO driver and support for it was added to the LM75 HWMON driver.
But it was apparently overlooked to rename one reference to CONFIG_ADT75
to CONFIG_SENSORS_LM75. Do so now. Use the IS_ENABLED() macro, while
we're at it.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
In v3.2 the Analog Devices AD7314 temperature sensor driver was removed
as an IIO driver and added as a HWMON driver. But it was apparently
overlooked to rename two references to CONFIG_AD7314 to
CONFIG_SENSORS_AD7314. Do so now. Use the IS_ENABLED() macro, while
we're at it.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
In v3.2 the Analog Devices ad2s1200/ad2s1205 driver was renamed from
ad2s120x to ad2s1200. But it apparently forgot to rename the references
to this driver in the BF537-STAMP code. Rename these now, and use the
IS_ENABLED() macro, while we're at it.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
There's a (rather subtle) typo in "CONFIG_SND_SOC_ADV80X_MODULE". Fix it
once and for all by using IS_ENABLED(), which is designed to avoid
issues like this.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>