Pull livepatching fixes from Jiri Kosina:
"Two tiny fixes for livepatching infrastructure:
- extending RCU critical section to cover all accessess to
RCU-protected variable, by Petr Mladek
- proper format string passing to kobject_init_and_add(), by Jiri
Kosina"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: RCU protect struct klp_func all the time when used in klp_ftrace_handler()
livepatch: fix format string in kobject_init_and_add()
Commit 7800064ba5 ("ARM: dts: Add basic dm816x device tree
configuration") added basic devices for dm816x, but I was not able
to test the USB completely because of an unconfigured USB phy, and
I only tested it to make sure the Mentor chips are detected and
clocked without a phy.
After testing the USB with actual devices I noticed a few issues
that should be fixed to avoid confusion:
- The USB id pin on dm8168-evm is hardwired and can be changed
only by software. As there are two USB-A type connectors, let's
start both in host mode instead of otg.
- The Mentor core is configured in such a way on dm8168-evm that
it's not capable of multipoint at least on revision c board
that I have.
- We need ranges for the syscon to properly set up the phy as
children of the SCM syscon area.
- Let's not disable the second interface, the board specific
dts files can do that if really needed. Most boards should
just keep it enabled to ensure the device is idled properly.
Note that also a phy and several musb fixes are still needed to
make the USB to work properly in addition to this fix.
Cc: Brian Hutchinson <b.hutchman@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Pull HID fixes from Jiri Kosina:
- a few fixes to Sony driver (rmmod breakage, spinlock initialization),
by Antonio Ospite, Frank Praznik and Jiri Kosina
- fix for wMaxInputLength handling regression in i2c-hid, by Seth
Forshee
- IRQ safety spinlock fix in sensor hub driver, by Srinivas Pandruvada
- IRQ level sensitivity fix to i2c-hid to be compliant with the spec,
by Mika Westerberg
- a couple device ID additions piggy-backing on top of that
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: microsoft: Add ID for NE7K wireless keyboard
HID: i2c-hid: Limit reads to wMaxInputLength bytes for input events
HID: sony: fix uninitialized per-controller spinlock
HID: sony: initialize sony_dev_list_lock properly
HID: sony: Fix a WARNING shown when rmmod-ing the driver
HID: sensor-hub: correct dyn_callback_lock IRQ-safe change
HID: hid-sensor-hub: Correct documentation
HID: saitek: add USB ID for older R.A.T. 7
HID: i2c-hid: The interrupt should be level sensitive
HID: wacom: Add missing ABS_MISC event and feature declaration for 27QHD
The sata_ref_clk is a reference clock to the SATA phy.
This fixes SATA malfunction across suspend/resume or when
SATA driver is used as a module.
Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
The sata_ref_clk is a reference clock to the SATA phy.
This fixes SATA malfunction across suspend/resume or when
SATA driver is used as a module.
Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Some bios really like to joke and start the planes at an offset ...
hooray!
Align start and end to fix this.
v2: Fixup calculation of size, spotted by Chris Wilson.
v3: Fix serious fumble I've just spotted.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86883
Cc: stable@vger.kernel.org
Cc: Johannes W <jargon@molb.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Reported-and-tested-by: Johannes W <jargon@molb.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
[Jani: split WARN_ONs, rebase on v4.0-rc1]
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Atm, it's possible that the interrupt handler is called when the device
is in D3 or some other low-power state. It can be due to another device
that is still in D0 state and shares the interrupt line with i915, or on
some platforms there could be spurious interrupts even without sharing
the interrupt line. The latter case was reported by Klaus Ethgen using a
Lenovo x61p machine (gen 4). He noticed this issue via a system
suspend/resume hang and bisected it to the following commit:
commit e11aa36230
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date: Wed Jun 18 09:52:55 2014 -0700
drm/i915: use runtime irq suspend/resume in freeze/thaw
This is a problem, since in low-power states IIR will always read
0xffffffff resulting in an endless IRQ servicing loop.
Fix this by handling interrupts only when the driver explicitly enables
them and so it's guaranteed that the interrupt registers return a valid
value.
Note that this issue existed even before the above commit, since during
runtime suspend/resume we never unregistered the handler.
v2:
- clarify the purpose of smp_mb() vs. synchronize_irq() in the
code comment (Chris)
v3:
- no need for an explicit smp_mb(), we can assume that synchronize_irq()
and the mmio read/writes in the install hooks provide for this (Daniel)
- remove code comment as the remaining synchronize_irq() is self
explanatory (Daniel)
v4:
- drm_irq_uninstall() implies synchronize_irq(), so no need to call it
explicitly (Daniel)
Reference: https://lkml.org/lkml/2015/2/11/205
Reported-and-bisected-by: Klaus Ethgen <Klaus@Ethgen.ch>
Cc: stable@vger.kernel.org
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
When we walk the list of vma, or even for protecting against concurrent
framebuffer creation, we must hold the struct_mutex or else a second
thread can corrupt the list as we walk it.
Fixes regression from
commit d7f46fc4e7
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date: Fri Dec 6 14:10:55 2013 -0800
drm/i915: Make pin count per VMA
References: https://bugs.freedesktop.org/show_bug.cgi?id=89085
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
When converting from implicitly tracked execlist queue items to ref counted
requests, not all frees of requests were replaced with unrefs, and extraneous
refs/unrefs of contexts were added.
Correct the unbalanced refcount & replace the frees.
Remove a noisy warning when hitting the request creation path.
drm_i915_gem_request and intel_context are both kref reference counted
structures. Upon allocation, drm_i915_gem_request's ref count should be
bumped using kref_init. When a context is assigned to the request,
the context's reference count should be bumped using i915_gem_context_reference.
i915_gem_request_reference will reduce the context reference count when
the request is freed.
Problem introduced in
commit 6d3d8274bc
Author: Nick Hoath <nicholas.hoath@intel.com>
AuthorDate: Thu Jan 15 13:10:39 2015 +0000
drm/i915: Subsume intel_ctx_submit_request in to drm_i915_gem_request
v2: Added comments explaining how the ctx pointer and the request object should
be ref-counted. Removed noisy warning.
v3: Cleaned up the language used in the commit & the header
description (Thanks David Gordon)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88652
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
It is a bad idea to export static functions. GCC for some platforms
shows errors like:
error: __ksymtab_azx_get_response causes a section type conflict
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Cc: <stable@vger.kernel.org> # v3.15+
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The KSTK_EIP() and KSTK_ESP() macros should return the user program
counter (PC) and stack pointer (A0StP) of the given task. These are used
to determine which VMA corresponds to the user stack in
/proc/<pid>/maps, and for the user PC & A0StP in /proc/<pid>/stat.
However for Meta the PC & A0StP from the task's kernel context are used,
resulting in broken output. For example in following /proc/<pid>/maps
output, the 3afff000-3b021000 VMA should be described as the stack:
# cat /proc/self/maps
...
100b0000-100b1000 rwxp 00000000 00:00 0 [heap]
3afff000-3b021000 rwxp 00000000 00:00 0
And in the following /proc/<pid>/stat output, the PC is in kernel code
(1074234964 = 0x40078654) and the A0StP is in the kernel heap
(1335981392 = 0x4fa17550):
# cat /proc/self/stat
51 (cat) R ... 1335981392 1074234964 ...
Fix the definitions of KSTK_EIP() and KSTK_ESP() to use
task_pt_regs(tsk)->ctx rather than (tsk)->thread.kernel_context. This
gets the registers from the user context stored after the thread info at
the base of the kernel stack, which is from the last entry into the
kernel from userland, regardless of where in the kernel the task may
have been interrupted, which results in the following more correct
/proc/<pid>/maps output:
# cat /proc/self/maps
...
0800b000-08070000 r-xp 00000000 00:02 207 /lib/libuClibc-0.9.34-git.so
...
100b0000-100b1000 rwxp 00000000 00:00 0 [heap]
3afff000-3b021000 rwxp 00000000 00:00 0 [stack]
And /proc/<pid>/stat now correctly reports the PC in libuClibc
(134320308 = 0x80190b4) and the A0StP in the [stack] region (989864576 =
0x3b002280):
# cat /proc/self/stat
51 (cat) R ... 989864576 134320308 ...
Reported-by: Alexey Brodkin <Alexey.Brodkin@synopsys.com>
Reported-by: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-metag@vger.kernel.org
Cc: <stable@vger.kernel.org> # v3.9+
Remove a useless pm_runtime_put_sync leading to unbalanced
usage_count.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reported-by: Sylvain Rochet <sylvain.rochet@finsecur.com>
The A2Q (Add To Queue) and UPDATE bits are left in their previous state
when resetting the layer.
This lead to weird behavior when enabling the plane again: the framebuffer
previously queued is dequeued and we end up with access to an old memory
region.
Reset those bits when resetting the channel.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
- Fix a bug that caused 15% CPU performance drop in Kaveri. This was caused
because we overwritten the initialization of the first pipe (out of eight),
which is dedicated to radeon operation. The fix was tested by Michel Dänzer.
This bug was introduced by a patch I prepared (yeah, my bad) and was merged
to 3.19-rc6. Therefore, I also marked it as Cc:stable.
- Fix sparse warning
* tag 'drm-amdkfd-fixes-2015-02-23' of git://people.freedesktop.org/~gabbayo/linux:
drm/amdkfd: don't set get_pipes_num() as inline
drm/amdkfd: Initialize only amdkfd's assigned pipelines
This fixes a bit of fallout that was caused by the atomic modesetting
driver conversion and some last-minute changes in the DRM atomic core.
It also fixes a bug exposed by recent changes in the clock framework
which results in non-working HDMI.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJU5eTAAAoJEN0jrNd/PrOhyx0P/0oLSRfqddmgqtWYLG5Xprc5
xwI6N3Elil9dMdl+QnVkh0Bd2dN3QiSobpJlLLu0C4eQwSKNGPKrMITdoz2kxdLm
V2hswNN9iVf/g8ZvR/VoLBYaWSZER9OVgSKx6kqih4X1hJNyGpBlux3MPxWQ2MTC
s3fIq8gdSgxcNwno4R1nfx0SOPxVRPW72qfPsY2hZQFE/6jcZ5k6V6BPqcu69mKz
af8SKrEIXN57Lxq54+qlzVrFxKCzQmj9lLeX3yty9Hj+SBqm0ybQNbnCrJE2Kcsi
xkYhA0JxUerw30sb5HJkvJqmWltxoaf0ZDaQOPd01ZTxIOGpsObN2o3h0lBaVt6G
lSXXKdLF9AFtYHzVJq6L7KkpsOK40fM0tks+K/4lhPRIZmwG7A46hRZbVnJfiCUv
PEYdwzXvNrz6jEACw4Cu986556n3FCeR6Qb/4T3gyCNh9VbICxcOTaDwTalGhw44
eLFEvY1KqmAbQtrf6soRlVcMySZ5QEJAZtRxNsYcjhHCSQOmcx6YIRiOJ2aA+BFe
WjHily2N4g7afetc8TWDkFvf8niLVBiyXisEtX90Ef13LRbHVXAY2b4oKiJN6ljX
kSb1uAG1BbLChETluAAj4CN6QQigzbMYjkW5Zrv9xN/Aj9w52YQK2bW9ydNH3ULW
UofsqSV4zUdXxcY2NSCQ
=PQiO
-----END PGP SIGNATURE-----
Merge tag 'drm/tegra/for-3.20-rc1-fixes' of git://anongit.freedesktop.org/tegra/linux into drm-fixes
drm/tegra: Fixes for v3.20-rc1
This fixes a bit of fallout that was caused by the atomic modesetting
driver conversion and some last-minute changes in the DRM atomic core.
It also fixes a bug exposed by recent changes in the clock framework
which results in non-working HDMI.
* tag 'drm/tegra/for-3.20-rc1-fixes' of git://anongit.freedesktop.org/tegra/linux:
drm/tegra: dc: Move more code into ->init()
drm/tegra: dc: Wire up CRTC parent of atomic state
drm/tegra: dc: Reset state's active_changed field
drm/tegra: hdmi: Explicitly set clock rate
In commit ccfc08655d
Author: Rob Clark <robdclark@gmail.com>
Date: Thu Dec 18 16:01:48 2014 -0500
drm: tweak getconnector locking
We need to extend the locking to cover connector->state reading for
atomic drivers, but the above commit was a bit too eager and also
included the fill_modes callback. Which on i915 on old platforms using
load detection needs to acquire modeset locks, resulting in a deadlock
on output probing.
Reported-by: Marc Finet <m.dreadlock@gmail.com>
Cc: Marc Finet <m.dreadlock@gmail.com>
Cc: robdclark@gmail.com
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
If xfs_trans_reserve fails we don't cancel the transaction,
and we'll leak the allocated transaction pointer.
Spotted by Coverity.
Signed-off-by: Eric Sandeen <ssandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
We shouldn't get here with RENAME_EXCHANGE set and no
target_ip, but let's be defensive, because xfs_cross_rename()
will dereference it.
Spotted by Coverity.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
If the default PM domain using PM_CLK is used for PM runtime, the real PM
domain(s) cannot be registered from DT later.
Hence do not enable it when running a multi-platform kernel with genpd
support on an r8a7740. The R-Mobile PM domain driver will take care of
PM runtime management of the module clocks.
The default PM domain is still needed for:
- platforms without genpd support,
- the legacy (non-DT) case, where genpd may take over later, except
for the C5 "always on" PM domain.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Need to define and use appropriate functions for when BLK_DEV_INTEGRITY
is not set.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Commit 1e02ce4ccc ("x86: Store a per-cpu shadow copy of CR4")
introduced CR4 shadows.
These shadows are initialized in early boot code. The commit missed
initialization for 64-bit PV(H) guests that this patch adds.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
A request in the ring buffer mustn't be read after it has been marked
as consumed. Otherwise it might already have been reused by the
frontend without violating the ring protocol.
To avoid inconsistencies in the backend only work on a private copy
of the request. This will ensure a malicious guest not being able to
bypass consistency checks of the backend by modifying an active
request.
Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Hypercalls submitted by user space tools via the privcmd driver can
take a long time (potentially many 10s of seconds) if the hypercall
has many sub-operations.
A fully preemptible kernel may deschedule such as task in any upcall
called from a hypercall continuation.
However, in a kernel with voluntary or no preemption, hypercall
continuations in Xen allow event handlers to be run but the task
issuing the hypercall will not be descheduled until the hypercall is
complete and the ioctl returns to user space. These long running
tasks may also trigger the kernel's soft lockup detection.
Add xen_preemptible_hcall_begin() and xen_preemptible_hcall_end() to
bracket hypercalls that may be preempted. Use these in the privcmd
driver.
When returning from an upcall, call xen_maybe_preempt_hcall() which
adds a schedule point if if the current task was within a preemptible
hypercall.
Since _cond_resched() can move the task to a different CPU, clear and
set xen_in_preemptible_hcall around the call.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Commit d524165cb8 ("x86/apic: Check x2apic early") tests X2APIC_ENABLE
bit of MSR_IA32_APICBASE when CONFIG_X86_X2APIC is off and panics
the kernel when this bit is set.
Xen's PV guests will pass this MSR read to the hypervisor which will
return its version of the MSR, where this bit might be set. Make sure
we clear it before returning MSR value to the caller.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
When a PCM draining is performed to an empty stream that has been
already in PREPARED state, the current code just ignores and leaves as
it is, although the drain is supposed to set all such streams to SETUP
state. This patch covers that overlooked case.
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The gpio_chip operations receive a pointer the gpio_chip struct which is
contained in the driver's private struct, yet the container_of call in those
functions point to the mfd struct defined in include/linux/mfd/tps65912.h.
Cc: Stable <stable@vger.kernel.org>
Signed-off-by: Nicolas Saenz Julienne <nicolassaenzj@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
The change:
7b8792bbdf
gpiolib: of: Correct error handling in of_get_named_gpiod_flags
assumed that only one gpio-chip is registred per of-node.
Some drivers register more than one chip per of-node, so
adjust the matching function of_gpiochip_find_and_xlate to
not stop looking for chips if a node-match is found and
the translation fails.
Cc: Stable <stable@vger.kernel.org>
Fixes: 7b8792bbdf ("gpiolib: of: Correct error handling in of_get_named_gpiod_flags")
Signed-off-by: Hans Holmberg <hans.holmberg@intel.com>
Acked-by: Alexandre Courbot <acourbot@nvidia.com>
Tested-by: Robert Jarzmik <robert.jarzmik@free.fr>
Tested-by: Tyler Hall <tylerwhall@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Microsoft Natural Wireless Ergonomic Keyboard 7000 has special My
Favorites 1..5 keys which are handled through a vendor-defined usage
page (0xff05).
Apply MS_ERGONOMY quirks handling to USB PID 0x071d (Microsoft Microsoft
2.4GHz Transceiver V1.0) so that the My Favorites 1..5 keys are reported
as KEY_F14..18 events.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=52841
Signed-off-by: Jakub Sitnicki <jsitnicki@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
d1c7e29e8d (HID: i2c-hid: prevent buffer overflow in early IRQ)
changed hid_get_input() to read ihid->bufsize bytes, which can be
more than wMaxInputLength. This is the case with the Dell XPS 13
9343, and it is causing events to be missed. In some cases the
missed events are releases, which can cause the cursor to jump or
freeze, among other problems. Limit the number of bytes read to
min(wMaxInputLength, ihid->bufsize) to prevent such problems.
Fixes: d1c7e29e8d "HID: i2c-hid: prevent buffer overflow in early IRQ"
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
skylake_update_primary_plane() did not handle all pixel formats returned
by skl_format_to_fourcc(). Handle alpha similar to skl_update_plane().
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89052
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Make it execute the ERMS version if support is present and we're in the
forward memmove() part and remove the unfolded alternatives section
definition.
Signed-off-by: Borislav Petkov <bp@suse.de>
Make alternatives replace single JMPs instead of whole memset functions,
thus decreasing the amount of instructions copied during patching time
at boot.
While at it, make it use the REP_GOOD version by default which means
alternatives NOP out the JMP to the other versions, as REP_GOOD is set
by default on the majority of relevant x86 processors.
Signed-off-by: Borislav Petkov <bp@suse.de>
This is based on a patch originally by hpa.
With the current improvements to the alternatives, we can simply use %P1
as a mem8 operand constraint and rely on the toolchain to generate the
proper instruction sizes. For example, on 32-bit, where we use an empty
old instruction we get:
apply_alternatives: feat: 6*32+8, old: (c104648b, len: 4), repl: (c195566c, len: 4)
c104648b: alt_insn: 90 90 90 90
c195566c: rpl_insn: 0f 0d 4b 5c
...
apply_alternatives: feat: 6*32+8, old: (c18e09b4, len: 3), repl: (c1955948, len: 3)
c18e09b4: alt_insn: 90 90 90
c1955948: rpl_insn: 0f 0d 08
...
apply_alternatives: feat: 6*32+8, old: (c1190cf9, len: 7), repl: (c1955a79, len: 7)
c1190cf9: alt_insn: 90 90 90 90 90 90 90
c1955a79: rpl_insn: 0f 0d 0d a0 d4 85 c1
all with the proper padding done depending on the size of the
replacement instruction the compiler generates.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@linux.intel.com>
... now that we have it.
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Borislav Petkov <bp@suse.de>
Move clear_page() up so that we can get 2-byte forward JMPs when
patching:
apply_alternatives: feat: 3*32+16, old: (ffffffff8130adb0, len: 5), repl: (ffffffff81d0b859, len: 5)
ffffffff8130adb0: alt_insn: 90 90 90 90 90
recompute_jump: new_displ: 0x0000003e
ffffffff81d0b859: rpl_insn: eb 3e 66 66 90
even though the compiler generated 5-byte JMPs which we padded with 5
NOPs.
Also, make the REP_GOOD version be the default as the majority of
machines set REP_GOOD. This way we get to save ourselves the JMP:
old insn VA: 0xffffffff813038b0, CPU feat: X86_FEATURE_REP_GOOD, size: 5, padlen: 0
clear_page:
ffffffff813038b0 <clear_page>:
ffffffff813038b0: e9 0b 00 00 00 jmpq ffffffff813038c0
repl insn: 0xffffffff81cf0e92, size: 0
old insn VA: 0xffffffff813038b0, CPU feat: X86_FEATURE_ERMS, size: 5, padlen: 0
clear_page:
ffffffff813038b0 <clear_page>:
ffffffff813038b0: e9 0b 00 00 00 jmpq ffffffff813038c0
repl insn: 0xffffffff81cf0e92, size: 5
ffffffff81cf0e92: e9 69 2a 61 ff jmpq ffffffff81303900
ffffffff813038b0 <clear_page>:
ffffffff813038b0: e9 69 2a 61 ff jmpq ffffffff8091631e
Signed-off-by: Borislav Petkov <bp@suse.de>
... and drop unfolded version. No need for ASM_NOP3 anymore either as
the alternatives do the proper padding at build time and insert proper
NOPs at boot time.
There should be no apparent operational change from this patch.
Signed-off-by: Borislav Petkov <bp@suse.de>
... instead of the semi-version with the spelled out sections.
What is more, make the REP_GOOD version be the default copy_page()
version as the majority of the relevant x86 CPUs do set
X86_FEATURE_REP_GOOD. Thus, copy_page gets compiled to:
ffffffff8130af80 <copy_page>:
ffffffff8130af80: e9 0b 00 00 00 jmpq ffffffff8130af90 <copy_page_regs>
ffffffff8130af85: b9 00 02 00 00 mov $0x200,%ecx
ffffffff8130af8a: f3 48 a5 rep movsq %ds:(%rsi),%es:(%rdi)
ffffffff8130af8d: c3 retq
ffffffff8130af8e: 66 90 xchg %ax,%ax
ffffffff8130af90 <copy_page_regs>:
...
and after the alternatives have run, the JMP to the old, unrolled
version gets NOPed out:
ffffffff8130af80 <copy_page>:
ffffffff8130af80: 66 66 90 xchg %ax,%ax
ffffffff8130af83: 66 90 xchg %ax,%ax
ffffffff8130af85: b9 00 02 00 00 mov $0x200,%ecx
ffffffff8130af8a: f3 48 a5 rep movsq %ds:(%rsi),%es:(%rdi)
ffffffff8130af8d: c3 retq
On modern uarches, those NOPs are cheaper than the unconditional JMP
previously.
Signed-off-by: Borislav Petkov <bp@suse.de>
Alternatives allow now for an empty old instruction. In this case we go
and pad the space with NOPs at assembly time. However, there are the
optimal, longer NOPs which should be used. Do that at patching time by
adding alt_instr.padlen-sized NOPs at the old instruction address.
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Up until now we had to pay attention to relative JMPs in alternatives
about how their relative offset gets computed so that the jump target
is still correct. Or, as it is the case for near CALLs (opcode e8), we
still have to go and readjust the offset at patching time.
What is more, the static_cpu_has_safe() facility had to forcefully
generate 5-byte JMPs since we couldn't rely on the compiler to generate
properly sized ones so we had to force the longest ones. Worse than
that, sometimes it would generate a replacement JMP which is longer than
the original one, thus overwriting the beginning of the next instruction
at patching time.
So, in order to alleviate all that and make using JMPs more
straight-forward we go and pad the original instruction in an
alternative block with NOPs at build time, should the replacement(s) be
longer. This way, alternatives users shouldn't pay special attention
so that original and replacement instruction sizes are fine but the
assembler would simply add padding where needed and not do anything
otherwise.
As a second aspect, we go and recompute JMPs at patching time so that we
can try to make 5-byte JMPs into two-byte ones if possible. If not, we
still have to recompute the offsets as the replacement JMP gets put far
away in the .altinstr_replacement section leading to a wrong offset if
copied verbatim.
For example, on a locally generated kernel image
old insn VA: 0xffffffff810014bd, CPU feat: X86_FEATURE_ALWAYS, size: 2
__switch_to:
ffffffff810014bd: eb 21 jmp ffffffff810014e0
repl insn: size: 5
ffffffff81d0b23c: e9 b1 62 2f ff jmpq ffffffff810014f2
gets corrected to a 2-byte JMP:
apply_alternatives: feat: 3*32+21, old: (ffffffff810014bd, len: 2), repl: (ffffffff81d0b23c, len: 5)
alt_insn: e9 b1 62 2f ff
recompute_jumps: next_rip: ffffffff81d0b241, tgt_rip: ffffffff810014f2, new_displ: 0x00000033, ret len: 2
converted to: eb 33 90 90 90
and a 5-byte JMP:
old insn VA: 0xffffffff81001516, CPU feat: X86_FEATURE_ALWAYS, size: 2
__switch_to:
ffffffff81001516: eb 30 jmp ffffffff81001548
repl insn: size: 5
ffffffff81d0b241: e9 10 63 2f ff jmpq ffffffff81001556
gets shortened into a two-byte one:
apply_alternatives: feat: 3*32+21, old: (ffffffff81001516, len: 2), repl: (ffffffff81d0b241, len: 5)
alt_insn: e9 10 63 2f ff
recompute_jumps: next_rip: ffffffff81d0b246, tgt_rip: ffffffff81001556, new_displ: 0x0000003e, ret len: 2
converted to: eb 3e 90 90 90
... and so on.
This leads to a net win of around
40ish replacements * 3 bytes savings =~ 120 bytes of I$
on an AMD guest which means some savings of precious instruction cache
bandwidth. The padding to the shorter 2-byte JMPs are single-byte NOPs
which on smart microarchitectures means discarding NOPs at decode time
and thus freeing up execution bandwidth.
Signed-off-by: Borislav Petkov <bp@suse.de>
Up until now we have always paid attention to make sure the length of
the new instruction replacing the old one is at least less or equal to
the length of the old instruction. If the new instruction is longer, at
the time it replaces the old instruction it will overwrite the beginning
of the next instruction in the kernel image and cause your pants to
catch fire.
So instead of having to pay attention, teach the alternatives framework
to pad shorter old instructions with NOPs at buildtime - but only in the
case when
len(old instruction(s)) < len(new instruction(s))
and add nothing in the >= case. (In that case we do add_nops() when
patching).
This way the alternatives user shouldn't have to care about instruction
sizes and simply use the macros.
Add asm ALTERNATIVE* flavor macros too, while at it.
Also, we need to save the pad length in a separate struct alt_instr
member for NOP optimization and the way to do that reliably is to carry
the pad length instead of trying to detect whether we're looking at
single-byte NOPs or at pathological instruction offsets like e9 90 90 90
90, for example, which is a valid instruction.
Thanks to Michael Matz for the great help with toolchain questions.
Signed-off-by: Borislav Petkov <bp@suse.de>
Make it pass __func__ implicitly. Also, dump info about each replacing
we're doing. Fixup comments and style while at it.
Signed-off-by: Borislav Petkov <bp@suse.de>
Per-controller spinlock needs to be properly initialized during device probe.
[jkosina@suse.cz: massage changelog]
[jkosina@suse.cz: drop hunk that has already been applied by previous
patch]
Signed-off-by: Frank Praznik <frank.praznik@oh.rr.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>