Pull locking fixes from Thomas Gleixner:
"Three patches to fix memory ordering issues on ALPHA and a comment to
clarify the usage scope of a mutex internal function"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/xchg/alpha: Fix xchg() and cmpxchg() memory ordering bugs
locking/xchg/alpha: Clean up barrier usage by using smp_mb() in place of __ASM__MB
locking/xchg/alpha: Add unconditional memory barrier to cmpxchg()
locking/mutex: Add comment to __mutex_owner() to deter usage
Pull cleanup patchlet from Thomas Gleixner:
"A single commit removing a bunch of bogus double semicolons all over
the tree"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
treewide/trivial: Remove ';;$' typo noise
Hightlights include:
- Fix a broken cast in nfs4_callback_recallany()
- Fix an Oops during NFSv4 migration events
- make struct nlmclnt_fl_close_lock_ops static
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJakuNIAAoJEGcL54qWCgDykYYQAITHRrWP7tQ6aSpZxW5+Un5z
6K3RRbfxFjHWVyaePCBzRMOtPTA/puqO0ggx9H+2D4+u2GeXhFl7FMdIuLueGKrc
rh0wzB6+KiHvqK8NT3g4c2VzZbGJ8IWB6jlNaA3ZyHRJcO+Oi3rQhYBNZpVqP6ny
M1C3yXQTUtA13aOLjeThoAKIJyknwdZcsiMTptJslvSsQ9PL0w6m6jZKrVHu6Rc+
Hg12FFptaKien/gj2IUYJb6Z2Mz3arJu1Y7cm1P/zH/NBs37ynMUsrb9AvPbzvRm
PvPRT4ugNOlTgDTaIT2JHwP2bhlp2JF+Tdzq7WYE3ek2CEPUD49jv07MlgTHxy/w
+tcp/322ZCxvKLjHTeEWqGn4T0TZ1TdPrd4dIJsjox9Ffy72Z3rOjvLKt5UrRV0i
8IhiE3/ruHFpB75Yfi7ABEIH8aEwwmchQTf5bth0ZKoZdaEPHmy3xnJkxa+wlD1V
Hp6KoqMNlDXEFx2Ih/SD6j50MFKszq6+cjUk2D8iclLnelXhu9iddFQ+PFNfsxVZ
WSo4AWZoPbtbLnn9Ez9dkdsJILKv86LbEvYxLX6/LnxLzzX70E34tbRQa+drVeR3
Na6czRpld85juKWgiFkzNx+zD4TaBAxUMbQH2Gbwngiz5RoT6SnkkdkAK3EW7nhR
pIJgZGiq/NG3NLiCxIgs
=l2jv
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-4.16-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
- fix a broken cast in nfs4_callback_recallany()
- fix an Oops during NFSv4 migration events
- make struct nlmclnt_fl_close_lock_ops static
* tag 'nfs-for-4.16-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: make struct nlmclnt_fl_close_lock_ops static
nfs: system crashes after NFS4ERR_MOVED recovery
NFSv4: Fix broken cast in nfs4_callback_recallany()
There is a potential deadlock if mount/umount happens when
raid5_finish_reshape() tries to grow the size of emulated disk.
How the deadlock happens?
1) The raid5 resync thread finished reshape (expanding array).
2) The mount or umount thread holds VFS sb->s_umount lock and tries to
write through critical data into raid5 emulated block device. So it
waits for raid5 kernel thread handling stripes in order to finish it
I/Os.
3) In the routine of raid5 kernel thread, md_check_recovery() will be
called first in order to reap the raid5 resync thread. That is,
raid5_finish_reshape() will be called. In this function, it will try
to update conf and call VFS revalidate_disk() to grow the raid5
emulated block device. It will try to acquire VFS sb->s_umount lock.
The raid5 kernel thread cannot continue, so no one can handle mount/
umount I/Os (stripes). Once the write-through I/Os cannot be finished,
mount/umount will not release sb->s_umount lock. The deadlock happens.
The raid5 kernel thread is an emulated block device. It is responible to
handle I/Os (stripes) from upper layers. The emulated block device
should not request any I/Os on itself. That is, it should not call VFS
layer functions. (If it did, it will try to acquire VFS locks to
guarantee the I/Os sequence.) So we have the resync thread to send
resync I/O requests and to wait for the results.
For solving this potential deadlock, we can put the size growth of the
emulated block device as the final step of reshape thread.
2017/12/29:
Thanks to Guoqing Jiang <gqjiang@suse.com>,
we confirmed that there is the same deadlock issue in raid10. It's
reproducible and can be fixed by this patch. For raid10.c, we can remove
the similar code to prevent deadlock as well since they has been called
before.
Reported-by: Alex Wu <alexwu@synology.com>
Reviewed-by: Alex Wu <alexwu@synology.com>
Reviewed-by: Chung-Chiang Cheng <cccheng@synology.com>
Signed-off-by: BingJing Chang <bingjingc@synology.com>
Signed-off-by: Shaohua Li <sh.li@alibaba-inc.com>
Some laptops such as the XPS 9360 support the intel-vbtn INT33D6
interface but don't initialize the bit that intel-vbtn uses to
represent switching tablet mode.
By running this only on real 2-in-1's it shouldn't cause false
positives.
Fixes: 30323fb6d5 ("Support tablet mode switch")
Reported-by: Jeremy Cline <jeremy@jcline.org>
Signed-off-by: Mario Limonciello <mario.limonciello@dell.com>
Tested-by: Jeremy Cline <jeremy@jcline.org>
Tested-by: Darren Hart (VMware) <dvhart@infradead.org>
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
- Add an empty linux/compiler_types.h (now being included by kconfig.h)
- Add __GFP_ZERO
- Add kzalloc
- Test __GFP_DIRECT_RECLAIM instead of __GFP_NOWARN
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Add handling for a missing instruction in our 32-bit BPF JIT so that it can be
used for seccomp filtering.
Add a missing NULL pointer check before a function call in new EEH code.
Fix an error path in the new ocxl driver to correctly return EFAULT.
The support for the new ibm,drc-info device tree property turns out to need
several fixes, so for now we just stop advertising to firmware that we support
it until the bugs can be ironed out.
One fix for the new drmem code which was incorrectly modifying the device tree
in place.
Finally two fixes for the RFI flush support, so that firmware can advertise to
us that it should be disabled entirely so as not to affect performance.
Thanks to:
Bharata B Rao, Frederic Barrat, Juan J. Alvarez, Mark Lord, Michael Bringmann.
-----BEGIN PGP SIGNATURE-----
iQIwBAABCAAaBQJakNObExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYAy
5A//RHFnwaXrOmcQ7h6fJim9lh+CsbhyHkf+hEGqJ4gp/ZOxjgWLjxgqB8O09hLb
CZUvbqc3phip431E+qCw1KVVoPzKXVVqEQD1mckvpVwVL7vVxOvcNfjzRH3oMsjC
jLQAfEvZMO6Lf14tb83vC5YBYOrWjojyOqJ/DbFzedw9hLLt1JTI12KfcgRZo5xr
3WhyevTHa4JJSgLhY9r34Q2hsf7CS2yTfhHa2+iHFPg2fINYb+Ld8V5JNj3JiFtx
fd6JiPFjQSyK+xXwK55eQGqMXZdEJU0iSwnYWBLUnFGaK7MQpEIJdv7FiRjD7Hcp
b3wpBuCSsN2mQIYpsoBY3GUm0thtAIbLDTRq1nAWdULbnOi1QBQy694Vgn4f8A79
T10Y1QuuwL2nInPy9qb/c6sukGz/czu9txFyqO/PwtRs99A6i4hRAtfKLUeW0cbG
QxI3GMfxst31SThqjVQifyhy7omXEjU8Oc6o1VMp8Vm1eH1QviwzcfmKA5UQuq3k
Y+ninT4wjo8MceAZ6F3w767DQmTK9e9lcgLU1ZZd/jntcnDzTr6AF54TFNmmicRb
o0arHMneMDo1IHEUfeski+0j0uZGuA/reQ6A/mDGSxum09voPwtWEH+dlAr9oKSQ
lgETCP6GN7rWluEWTYidgGGhV77QsfLNZyJYbGJXQ8cValo=
=mTqU
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Add handling for a missing instruction in our 32-bit BPF JIT so that
it can be used for seccomp filtering.
- Add a missing NULL pointer check before a function call in new EEH
code.
- Fix an error path in the new ocxl driver to correctly return EFAULT.
- The support for the new ibm,drc-info device tree property turns out
to need several fixes, so for now we just stop advertising to
firmware that we support it until the bugs can be ironed out.
- One fix for the new drmem code which was incorrectly modifying the
device tree in place.
- Finally two fixes for the RFI flush support, so that firmware can
advertise to us that it should be disabled entirely so as not to
affect performance.
Thanks to: Bharata B Rao, Frederic Barrat, Juan J. Alvarez, Mark Lord,
Michael Bringmann.
* tag 'powerpc-4.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/powernv: Support firmware disable of RFI flush
powerpc/pseries: Support firmware disable of RFI flush
powerpc/mm/drmem: Fix unexpected flag value in ibm,dynamic-memory-v2
powerpc/bpf/jit: Fix 32-bit JIT for seccomp_data access
powerpc/pseries: Revert support for ibm,drc-info devtree property
powerpc/pseries: Fix duplicate firmware feature for DRC_INFO
ocxl: Fix potential bad errno on irq allocation
powerpc/eeh: Fix crashes in eeh_report_resume()
When requeuing request, the domain token should have been freed
before re-inserting the request to io scheduler. Otherwise, the
assigned domain token will be leaked, and IO hang can be caused.
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Omar Sandoval <osandov@fb.com>
Cc: stable@vger.kernel.org
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
__blk_mq_requeue_request() covers two cases:
- one is that the requeued request is added to hctx->dispatch, such as
blk_mq_dispatch_rq_list()
- another case is that the request is requeued to io scheduler, such as
blk_mq_requeue_request().
We should call io sched's .requeue_request callback only for the 2nd
case.
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Omar Sandoval <osandov@fb.com>
Fixes: bd166ef183 ("blk-mq-sched: add framework for MQ capable IO schedulers")
Cc: stable@vger.kernel.org
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The capture interface doesn't work and the playback interface only
supports 48 kHz sampling rate even though it advertises more rates.
Signed-off-by: Erik Veijola <erik.veijola@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
On some boards setting power_save to a non 0 value leads to clicking /
popping sounds when ever we enter/leave powersaving mode. Ideally we would
figure out how to avoid these sounds, but that is not always feasible.
This commit adds a blacklist for devices where powersaving is known to
cause problems and disables it on these devices.
Note I tried to put this blacklist in userspace first:
https://github.com/systemd/systemd/pull/8128
But the systemd maintainers rightfully pointed out that it would be
impossible to then later remove entries once we actually find a way to
make power-saving work on listed boards without issues. Having this list
in the kernel will allow removal of the blacklist entry in the same commit
which fixes the clicks / plops.
The blacklist only applies to the default power_save module-option value,
if a user explicitly sets the module-option then the blacklist is not
used.
[ added an ifdef CONFIG_PM for the build error -- tiwai]
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1525104
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=198611
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
This patch fixes the wrongly included dtsi file which
was breaking mainline support for Engicam i.CoreM6 DualLite/Solo RQS.
As per the board name, the correct file should be imx6dl.dtsi instead
of imx6q.dtsi
Reported-by: Michael Trimarchi <michael@amarulasolutions.com>
Suggested-by: Jagan Teki <jagan@amarulasolutions.com>
Signed-off-by: Shyam Saini <shyam@amarulasolutions.com>
Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com>
Fixes: 7a9caba55a ("ARM: dts: imx6dl: Add Engicam i.CoreM6 DualLite/Solo RQS initial support")
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The SEV LAUNCH_SECRET command fails with error code 'invalid param'
because we missed filling the guest and header system physical address
while issuing the command.
Fixes: 9f5b5b950a (KVM: SVM: Add support for SEV LAUNCH_SECRET command)
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: linux-kernel@vger.kernel.org
Cc: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
RSM instruction is used by the SMM handler to return from SMM mode.
Currently, rsm causes a #UD - which results in instruction fetch, decode,
and emulate. By installing the RSM intercept we can avoid the instruction
fetch since we know that #VMEXIT was due to rsm.
The patch is required for the SEV guest, because in case of SEV guest
memory is encrypted with guest-specific key and hypervisor will not
able to fetch the instruction bytes from the guest memory.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Using the access_ok() to validate the input before issuing the SEV
command does not buy us anything in this case. If userland is
giving us a garbage pointer then copy_to_user() will catch it when we try
to return the measurement.
Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Fixes: 0d0736f763 (KVM: SVM: Add support for KVM_SEV_LAUNCH_MEASURE ...)
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: linux-kernel@vger.kernel.org
Cc: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Avoid traversing all the cpus for pv tlb flush when steal time
is disabled since pv tlb flush depends on the field in steal time
for shared data.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The early_param() is only called during kernel initialization, So Linux
marks the functions of it with __init macro to save memory.
But it forgot to mark the parse_no_kvmapf/stealacc/kvmclock_vsyscall,
So, Make them __init as well.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: rkrcmar@redhat.com
Cc: kvm@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: x86@kernel.org
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Guests on new hypersiors might set KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT
bit when enabling async_PF, but this bit is reserved on old hypervisors,
which results in a failure upon migration.
To avoid breaking different cases, we are checking for CPUID feature bit
before enabling the feature and nothing else.
Fixes: 52a5c155cf ("KVM: async_pf: Let guest support delivery of async_pf from guest mode")
Cc: <stable@vger.kernel.org>
Reviewed-by: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fix the following sparse warning by moving the prototype
of kvm_arch_mmu_notifier_invalidate_range() to linux/kvm_host.h .
CHECK arch/s390/kvm/../../../virt/kvm/kvm_main.c
arch/s390/kvm/../../../virt/kvm/kvm_main.c:138:13: warning: symbol 'kvm_arch_mmu_notifier_invalidate_range' was not declared. Should it be static?
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move the kvm_arch_irq_routing_update() prototype outside of
ifdef CONFIG_HAVE_KVM_EVENTFD guards to fix the following sparse warning:
arch/s390/kvm/../../../virt/kvm/irqchip.c:171:28: warning: symbol 'kvm_arch_irq_routing_update' was not declared. Should it be static?
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The 'Total' line looks a bit weird when we have a single event only. This
can happen e.g. due to filters. Therefore suppress when there's only a
single event in the output.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We keep the current logic that sorts all events (parent and child), but
re-shuffle the events afterwards, grouping the children after the
respective parent. Note that the percentage column for child events
gives the percentage of the parent's total.
Since we rework the logic anyway, we modify the total average
calculation to use the raw numbers instead of the (rounded) averages.
Note that this can result in differing numbers (between total average
and the sum of the individual averages) due to rounding errors.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Drilldown (i.e. toggle display of child trace events) was implemented by
overriding the fields filter. This resulted in inconsistencies: E.g. when
drilldown was not active, adding a filter that also matches child trace
events would not only filter fields according to the filter, but also add
in the child trace events matching the filter. E.g. on x86, setting
'kvm_userspace_exit' as the fields filter after startup would result in
display of kvm_userspace_exit(DCR), although that wasn't previously
present - not exactly what one would expect from a filter.
This patch addresses the issue by keeping drilldown and fields filter
separate. While at it, we also fix a PEP8 issue by adding a blank line
at one place (since we're in the area...).
We implement this by adding a framework that also allows to define a
taxonomy among the debugfs events to identify child trace events. I.e.
drilldown using 'x' can now also work with debugfs. A respective parent-
child relationship is only known for S390 at the moment, but could be
added adjusting other platforms' ARCH.dbg_is_child() methods
accordingly.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We can do with a single dialog that takes both, pids and guest names.
Note that we keep both interactive commands, 'p' and 'g' for now, to
avoid confusion among users used to a specific key.
While at it, we improve on some minor glitches regarding curses usage,
e.g. cursor still visible when not supposed to be.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Helps quite a bit reading the code when it's obvious when a method is
intended for internal use only.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Te checks for debugfs assumed that debugfs is always mounted at
/sys/kernel/debug - which is likely, but not guaranteed. This is addressed
by checking /proc/mounts for the actual location.
Furthermore, when debugfs was mounted, but the kvm module not loaded, a
misleading error pointing towards debugfs not present was given.
To reproduce,
(a) run kvm_stat with debugfs mounted at a place different from
/sys/kernel/debug
(b) run kvm_stat with debugfs mounted but kvm module not loaded
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Entering an invalid regular expression did not produce any indication of an
error so far.
To reproduce, press 'f' and enter 'foo(' (with an unescaped bracket).
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When we apply a filter that will only leave child trace events, we
receive a ZeroDivisionError when calculating the percentages.
In that case, provide percentages based on child events only.
To reproduce, run 'kvm_stat -f .*[\(].*'.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use '==' for equality checks and 'is' when comparing identities.
An example where '==' and 'is' behave differently:
>>> a = 4242
>>> a == 4242
True
>>> a is 4242
False
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If it's clear that the values of a dictionary will be used then use
the '.items()' method.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Tested-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
[Include fix for logging mode by Stefan Raspl]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use a namedtuple for storing the values as it allows to access the
fields of a tuple via names. This makes the overall code much easier
to read and to understand. Access by index is still possible as
before.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Tested-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The 'sortkey' function references a value in its enclosing
scope (closure). This is not common practice for a sort key function
so let's replace it. Additionally, the function 'sorted' has already a
parameter for reversing the result therefore the inversion of the
values is unneeded. The check for stats[x][1] is also superfluous as
it's ensured that this value is initialized with 0.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Tested-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reported by syzkaller:
WARNING: CPU: 6 PID: 2434 at arch/x86/kvm/vmx.c:6660 handle_ept_misconfig+0x54/0x1e0 [kvm_intel]
CPU: 6 PID: 2434 Comm: repro_test Not tainted 4.15.0+ #4
RIP: 0010:handle_ept_misconfig+0x54/0x1e0 [kvm_intel]
Call Trace:
vmx_handle_exit+0xbd/0xe20 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0xdaf/0x1d50 [kvm]
kvm_vcpu_ioctl+0x3e9/0x720 [kvm]
do_vfs_ioctl+0xa4/0x6a0
SyS_ioctl+0x79/0x90
entry_SYSCALL_64_fastpath+0x25/0x9c
The testcase creates a first thread to issue KVM_SMI ioctl, and then creates
a second thread to mmap and operate on the same vCPU. This triggers a race
condition when running the testcase with multiple threads. Sometimes one thread
exits with a triple fault while another thread mmaps and operates on the same
vCPU. Because CS=0x3000/IP=0x8000 is not mapped, accessing the SMI handler
results in an EPT misconfig. This patch fixes it by returning RET_PF_EMULATE
in kvm_handle_bad_page(), which will go on to cause an emulation failure and an
exit with KVM_EXIT_INTERNAL_ERROR.
Reported-by: syzbot+c1d9517cab094dae65e446c0c5b4de6c40f4dc58@syzkaller.appspotmail.com
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Although L2 is in halt state, it will be in the active state after
VM entry if the VM entry is vectoring according to SDM 26.6.2 Activity
State. Halting the vcpu here means the event won't be injected to L2
and this decision isn't reported to L1. Thus L0 drops an event that
should be injected to L2.
Cc: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
On x86, special KVM memslots such as the TSS region have anonymous
memory mappings created on behalf of userspace, and these mappings are
removed when the VM is destroyed.
It is however possible for removing these mappings via vm_munmap() to
fail. This can most easily happen if the thread receives SIGKILL while
it's waiting to acquire ->mmap_sem. This triggers the 'WARN_ON(r < 0)'
in __x86_set_memory_region(). syzkaller was able to hit this, using
'exit()' to send the SIGKILL. Note that while the vm_munmap() failure
results in the mapping not being removed immediately, it is not leaked
forever but rather will be freed when the process exits.
It's not really possible to handle this failure properly, so almost
every other caller of vm_munmap() doesn't check the return value. It's
a limitation of having the kernel manage these mappings rather than
userspace.
So just remove the WARN_ON() so that users can't spam the kernel log
with this warning.
Fixes: f0d648bdf0 ("KVM: x86: map/unmap private slots in __x86_set_memory_region")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
L1 might want to use SECONDARY_EXEC_DESC, so we must not clear the VMCS
bit if UMIP is not being emulated.
We must still set the bit when emulating UMIP as the feature can be
passed to L2 where L0 will do the emulation and because L2 can change
CR4 without a VM exit, we should clear the bit if UMIP is disabled.
Fixes: 0367f205a3 ("KVM: vmx: add support for emulating UMIP")
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
The initial reset of the local APIC is performed before the VMCS has been
created, but it tries to do a vmwrite:
vmwrite error: reg 810 value 4a00 (err 18944)
CPU: 54 PID: 38652 Comm: qemu-kvm Tainted: G W I 4.16.0-0.rc2.git0.1.fc28.x86_64 #1
Hardware name: Intel Corporation S2600CW/S2600CW, BIOS SE5C610.86B.01.01.0003.090520141303 09/05/2014
Call Trace:
vmx_set_rvi [kvm_intel]
vmx_hwapic_irr_update [kvm_intel]
kvm_lapic_reset [kvm]
kvm_create_lapic [kvm]
kvm_arch_vcpu_init [kvm]
kvm_vcpu_init [kvm]
vmx_create_vcpu [kvm_intel]
kvm_vm_ioctl [kvm]
Move it later, after the VMCS has been created.
Fixes: 4191db26b7 ("KVM: x86: Update APICv on APIC reset")
Cc: stable@vger.kernel.org
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull networking fixes from David Miller:
1) Fix TTL offset calculation in mac80211 mesh code, from Peter Oh.
2) Fix races with procfs in ipt_CLUSTERIP, from Cong Wang.
3) Memory leak fix in lpm_trie BPF map code, from Yonghong Song.
4) Need to use GFP_ATOMIC in BPF cpumap allocations, from Jason Wang.
5) Fix potential deadlocks in netfilter getsockopt() code paths, from
Paolo Abeni.
6) Netfilter stackpointer size checks really are needed to validate
user input, from Florian Westphal.
7) Missing timer init in x_tables, from Paolo Abeni.
8) Don't use WQ_MEM_RECLAIM in mac80211 hwsim, from Johannes Berg.
9) When an ibmvnic device is brought down then back up again, it can be
sent queue entries from a previous session, handle this properly
instead of crashing. From Thomas Falcon.
10) Fix TCP checksum on LRO buffers in mlx5e, from Gal Pressman.
11) When we are dumping filters in cls_api, the output SKB is empty, and
the filter we are dumping is too large for the space in the SKB, we
should return -EMSGSIZE like other netlink dump operations do.
Otherwise userland has no signal that is needs to increase the size
of its read buffer. From Roman Kapl.
12) Several XDP fixes for virtio_net, from Jesper Dangaard Brouer.
13) Module refcount leak in netlink when a dump start fails, from Jason
Donenfeld.
14) Handle sub-optimal GSO sizes better in TCP BBR congestion control,
from Eric Dumazet.
15) Releasing bpf per-cpu arraymaps can take a long time, add a
condtional scheduling point. From Eric Dumazet.
16) Implement retpolines for tail calls in x64 and arm64 bpf JITs. From
Daniel Borkmann.
17) Fix page leak in gianfar driver, from Andy Spencer.
18) Missed clearing of estimator scratch buffer, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
net_sched: gen_estimator: fix broken estimators based on percpu stats
gianfar: simplify FCS handling and fix memory leak
ipv6 sit: work around bogus gcc-8 -Wrestrict warning
macvlan: fix use-after-free in macvlan_common_newlink()
bpf, arm64: fix out of bounds access in tail call
bpf, x64: implement retpoline for tail call
rxrpc: Fix send in rxrpc_send_data_packet()
net: aquantia: Fix error handling in aq_pci_probe()
bpf: fix rcu lockdep warning for lpm_trie map_free callback
bpf: add schedule points in percpu arrays management
regulatory: add NUL to request alpha2
ibmvnic: Fix early release of login buffer
net/smc9194: Remove bogus CONFIG_MAC reference
net: ipv4: Set addr_type in hash_keys for forwarded case
tcp_bbr: better deal with suboptimal GSO
smsc75xx: fix smsc75xx_set_features()
netlink: put module reference if dump start fails
selftests/bpf/test_maps: exit child process without error in ENOMEM case
selftests/bpf: update gitignore with test_libbpf_open
selftests/bpf: tcpbpf_kern: use in6_* macros from glibc
..
Pull security subsystem fixes from James Morris:
- keys fixes via David Howells:
"A collection of fixes for Linux keyrings, mostly thanks to Eric
Biggers:
- Fix some PKCS#7 verification issues.
- Fix handling of unsupported crypto in X.509.
- Fix too-large allocation in big_key"
- Seccomp updates via Kees Cook:
"These are fixes for the get_metadata interface that landed during
-rc1. While the new selftest is strictly not a bug fix, I think
it's in the same spirit of avoiding bugs"
- an IMA build fix from Randy Dunlap
* 'fixes-v4.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
integrity/security: fix digsig.c build error with header file
KEYS: Use individual pages in big_key for crypto buffers
X.509: fix NULL dereference when restricting key with unsupported_sig
X.509: fix BUG_ON() when hash algorithm is unsupported
PKCS#7: fix direct verification of SignerInfo signature
PKCS#7: fix certificate blacklisting
PKCS#7: fix certificate chain verification
seccomp: add a selftest for get_metadata
ptrace, seccomp: tweak get_metadata behavior slightly
seccomp, ptrace: switch get_metadata types to arch independent
A single MIPS fix for mismatching struct compat_flock, resulting in bus
errors starting Firefox on Debian 8 since 4.13.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEd80NauSabkiESfLYbAtpk944dnoFAlqP5u0ACgkQbAtpk944
dnqPeA//dLmtRS9Ogjkwrpfmb9AkTiB2iECKYxIUFMAeFHgBTT9skYaI8smaaZlj
VHrIP5mnL/IJaSI9exZHdYDTIdDJ9iRYPaIq29Z63rHWOQKTQmcuU5SzDZuiJdq9
lGxF/fu1cHVJzZ5SsDIDol94P2GIoZ3Gw9u4kutk//H05tzhVgnIR7ETRIKY6HsO
gij4Ubg9YqEyxGvv+opt5SCSVEOxtt/gFB4/JvR74L6mxPVUYXcCMA4MB5RqckpO
aU7zH5YAHrtcNmto4qDsJIeGBrXYVo5H/MOq9j+1Nt3RJXOB3/934ZpMsLYOCBml
JYqe+k8OlTAsWGBqHXIYhdMlTXLFlaDHW8WHViY7wZ1Eh97DIeZ7KRVJuZR9t0kW
2bdifdMhfVfekPjSjYFQJB6AaEtYMTMNSm7ZaTk4zGXShIEN6N0byfFXt2gc0olg
oZu+NwhevlQhifS3qjgACddbPru2bf0QI6hByCXRbjccKJlB7B/ClKo3l7AmCyb9
Bfe/WW24gbSDp1VhHNNaB9kO2PaXXzDqnJN2AjrwgG4g0HixCcsoPVk/dKgp/AY2
+ewQfCrzdYW4Y/Nkfwg0gfuy/eQx2vZ3DdoMS7MeHVgdJw956bdnD5lgDgeldLpy
KSuQlxCPJm2zV3gV/QFJxAHOHrIIk5V9+WDh1ZdInfYjxcgDFW8=
=h/p9
-----END PGP SIGNATURE-----
Merge tag 'mips_fixes_4.16_3' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips
Pull MIPS fix from James Hogan:
"A single MIPS fix for mismatching struct compat_flock, resulting in
bus errors starting Firefox on Debian 8 since 4.13"
* tag 'mips_fixes_4.16_3' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips:
MIPS: Drop spurious __unused in struct compat_flock
Pull printk fixlet from Petr Mladek:
"People expect to see the real pointer value for %px.
Let's substitute '(null)' only for the other %p? format modifiers that
need to deference the pointer"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
vsprintf: avoid misleading "(null)" for %px
Pull i2c fixes from Wolfram Sang:
"Two bugfixes, one v4.16 regression fix, and two documentation fixes"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: designware: Consider SCL GPIO optional
i2c: busses: i2c-sirf: Fix spelling: "formular" -> "formula".
i2c: bcm2835: Set up the rising/falling edge delays
i2c: i801: Add missing documentation entries for Braswell and Kaby Lake
i2c: designware: must wait for enable
The 'lend' parameter of truncate_inode_pages_range is required to be
inclusive, so follow the rule.
This patch fixes one memory corruption triggered by discard.
Cc: <stable@vger.kernel.org>
Cc: Dmitry Monakhov <dmonakhov@openvz.org>
Fixes: 351499a172 ("block: Invalidate cache on discard v2")
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
These are mostly fixes for problems with merge window code. In
addition we have one doc update (alua) and two dead code removals
(aiclib and octogon) a spurious assignment removal (csiostor) and a
performance improvement for storvsc involving better interrupt
spreading and increasing the command per lun handling.
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCWo+H2yYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishe2eAQDyWfoK
Mfjbrl6cdPop+JIoED0VtBzAQyeXceJt8GYDQwEApXTIZon2HTdJqGawfUhaapBA
JnO6iOiC13/nZjl7C28=
=K3Pk
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"These are mostly fixes for problems with merge window code.
In addition we have one doc update (alua) and two dead code removals
(aiclib and octogon) a spurious assignment removal (csiostor) and a
performance improvement for storvsc involving better interrupt
spreading and increasing the command per lun handling"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qla4xxx: skip error recovery in case of register disconnect.
scsi: aacraid: fix shutdown crash when init fails
scsi: qedi: Cleanup local str variable
scsi: qedi: Fix truncation of CHAP name and secret
scsi: qla2xxx: Fix incorrect handle for abort IOCB
scsi: qla2xxx: Fix double free bug after firmware timeout
scsi: storvsc: Increase cmd_per_lun for higher speed devices
scsi: qla2xxx: Fix a locking imbalance in qlt_24xx_handle_els()
scsi: scsi_dh: Document alua_rtpg_queue() arguments
scsi: Remove Makefile entry for oktagon files
scsi: aic7xxx: remove aiclib.c
scsi: qla2xxx: Avoid triggering undefined behavior in qla2x00_mbx_completion()
scsi: mptfusion: Add bounds check in mptctl_hp_targetinfo()
scsi: sym53c8xx_2: iterator underflow in sym_getsync()
scsi: bnx2fc: Fix check in SCSI completion handler for timed out request
scsi: csiostor: remove redundant assignment to pointer 'ln'
scsi: ufs: Enable quirk to ignore sending WRITE_SAME command
scsi: ibmvfc: fix misdefined reserved field in ibmvfc_fcp_rsp_info
scsi: qla2xxx: Fix memory corruption during hba reset test
scsi: mpt3sas: fix an out of bound write