linux_old1

Commit Graph

Author	SHA1	Message	Date
David Hildenbrand	5ffe466cd3	KVM: s390: inject PER i-fetch events on applicable icpts In case we have to emuluate an instruction or part of it (instruction, partial instruction, operation exception), we have to inject a PER instruction-fetching event for that instruction, if hardware told us to do so. In case we retry an instruction, we must not inject the PER event. Please note that we don't filter the events properly yet, so guest debugging will be visible for the guest. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-07-05 12:02:56 +02:00
Wanpeng Li	196f20ca52	KVM: vmx: fix missed cancellation of TSC deadline timer INFO: rcu_sched detected stalls on CPUs/tasks: 1-...: (11800 GPs behind) idle=45d/140000000000000/0 softirq=0/0 fqs=21663 (detected by 0, t=65016 jiffies, g=11500, c=11499, q=719) Task dump for CPU 1: qemu-system-x86 R running task 0 3529 3525 0x00080808 ffff8802021791a0 ffff880212895040 0000000000000001 00007f1c2c00db40 ffff8801dd20fcd3 ffffc90002b98000 ffff8801dd20fc88 ffff8801dd20fcf8 0000000000000286 ffff8801dd2ac538 ffff8801dd20fcc0 ffffffffc06949c9 Call Trace: ? kvm_write_guest_cached+0xb9/0x160 [kvm] ? __delay+0xf/0x20 ? wait_lapic_expire+0x14a/0x200 [kvm] ? kvm_arch_vcpu_ioctl_run+0xcbe/0x1b00 [kvm] ? kvm_arch_vcpu_ioctl_run+0xe34/0x1b00 [kvm] ? kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm] ? __fget+0x5/0x210 ? do_vfs_ioctl+0x96/0x6a0 ? __fget_light+0x2a/0x90 ? SyS_ioctl+0x79/0x90 ? do_syscall_64+0x7c/0x1e0 ? entry_SYSCALL64_slow_path+0x25/0x25 This can be reproduced readily by running a full dynticks guest(since hrtimer in guest is heavily used) w/ lapic_timer_advance disabled. If fail to program hardware preemption timer, we will fallback to hrtimer based method, however, a previous programmed preemption timer miss to cancel in this scenario which results in one hardware preemption timer and one hrtimer emulated tsc deadline timer run simultaneously. So sometimes the target guest deadline tsc is earlier than guest tsc, which leads to the computation in vmx_set_hv_timer can underflow and cause delta_tsc to be set a huge value, then host soft lockup as above. This patch fix it by cancelling the previous programmed preemption timer if there is once we failed to program the new preemption timer and fallback to hrtimer based method. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Yunhong Jiang <yunhong.jiang@intel.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-07-01 11:03:42 +02:00
Wanpeng Li	bd97ad0e7e	KVM: x86: introduce cancel_hv_tscdeadline Introduce cancel_hv_tscdeadline() to encapsulate preemption timer cancel stuff. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Yunhong Jiang <yunhong.jiang@intel.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-07-01 11:03:41 +02:00
Paolo Bonzini	9175d2e97b	KVM: vmx: fix underflow in TSC deadline calculation If the TSC deadline timer is programmed really close to the deadline or even in the past, the computation in vmx_set_hv_timer can underflow and cause delta_tsc to be set to a huge value. This generally results in vmx_set_hv_timer returning -ERANGE, but we can fix it by limiting delta_tsc to be positive or zero. Reported-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-07-01 11:03:39 +02:00
Paolo Bonzini	f2485b3e0c	KVM: x86: use guest_exit_irqoff This gains a few clock cycles per vmexit. On Intel there is no need anymore to enable the interrupts in vmx_handle_external_intr, since we are using the "acknowledge interrupt on exit" feature. AMD needs to do that, and must be careful to avoid the interrupt shadow. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-07-01 11:03:38 +02:00
Paolo Bonzini	91fa0f8e9e	KVM: x86: always use "acknowledge interrupt on exit" This is necessary to simplify handle_external_intr in the next patch. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-07-01 11:03:36 +02:00
Paolo Bonzini	6edaa5307f	KVM: remove kvm_guest_enter/exit wrappers Use the functions from context_tracking.h directly. Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-07-01 11:03:21 +02:00
Paolo Bonzini	ebaac17362	context_tracking: move rcu_virt_note_context_switch out of kvm_host.h Make kvm_guest_{enter,exit} and __kvm_guest_{enter,exit} trivial wrappers around the code in context_tracking.h. Name the context_tracking.h functions consistently with what those for kernel<->user switch. Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-06-28 14:15:25 +02:00
James Hogan	fb6cec1492	MIPS: KVM: Combine entry trace events into class Combine the kvm_enter, kvm_reenter and kvm_out trace events into a single kvm_transition event class to reduce duplication and bloat. Suggested-by: Steven Rostedt <rostedt@goodmis.org> Fixes: `93258604ab` ("MIPS: KVM: Add guest mode switch trace events") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim KrÄmÃ¡Å™ <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-06-23 19:17:30 +02:00
Arnd Bergmann	87aeb54f1b	kvm: x86: use getboottime64 KVM reads the current boottime value as a struct timespec in order to calculate the guest wallclock time, resulting in an overflow in 2038 on 32-bit systems. The data then gets passed as an unsigned 32-bit number to the guest, and that in turn overflows in 2106. We cannot do much about the second overflow, which affects both 32-bit and 64-bit hosts, but we can ensure that they both behave the same way and don't overflow until 2106, by using getboottime64() to read a timespec64 value. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-06-23 19:17:30 +02:00
Ashok Raj	c45dcc71b7	KVM: VMX: enable guest access to LMCE related MSRs On Intel platforms, this patch adds LMCE to KVM MCE supported capabilities and handles guest access to LMCE related MSRs. Signed-off-by: Ashok Raj <ashok.raj@intel.com> [Haozhong: macro KVM_MCE_CAP_SUPPORTED => variable kvm_mce_cap_supported Only enable LMCE on Intel platform Check MSR_IA32_FEATURE_CONTROL when handling guest access to MSR_IA32_MCG_EXT_CTL] Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-06-23 19:17:29 +02:00
Haozhong Zhang	37e4c997da	KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL KVM currently does not check the value written to guest MSR_IA32_FEATURE_CONTROL, though bits corresponding to disabled features may be set. This patch makes KVM to validate individual bits written to guest MSR_IA32_FEATURE_CONTROL according to enabled features. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-06-23 19:17:29 +02:00
Haozhong Zhang	3b84080b95	KVM: VMX: move msr_ia32_feature_control to vcpu_vmx msr_ia32_feature_control will be used for LMCE and not depend only on nested anymore, so move it from struct nested_vmx to struct vcpu_vmx. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-06-23 19:17:28 +02:00
Paolo Bonzini	8ff7b95647	KVM: s390: vSIE (nested virtualization) feature for 4.8 (kvm/next) With an updated QEMU this allows to create nested KVM guests (KVM under KVM) on s390. s390 memory management changes from Martin Schwidefsky or acked by Martin. One common code memory management change (pageref) acked by Andrew Morton. The feature has to be enabled with the nested medule parameter. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQIcBAABAgAGBQJXaS0XAAoJEBF7vIC1phx8rqMP/3EUiQbD/seIbkc2xqdApbZX xEDu6215f84N3KsLlZC/z/PhoxZ9z+GrHu+p2DqgnoMyDMj3C32s0OVecdxrE2+H pdaEY2ju8rjnY2P4hHzAfdWhdZtwsgdMIX8qDesQjcz4Z1QdOBXymEUpV7d35E/O WYqZw6EY6aOB338XKPaD03ITC+us/dW75ctPvqRzh5EKjjIMeOATonyGx0oHi14s qXRK1tfq8OegJGzioW5hOIlUkawycNu5+uIRjyoxzLuEPjn68Bomfi6W7Mak0BIc v8kEylFgMvSHKy0mWXCZsxj3hD6kNUzlJ+gRYFCiVWvdw6ywFzQiHP2nwLApg7Op 2a5S4wzQ3FIhA8u0k7JQBUncJggRZByitm4DerezHCZzKI9QimYrcKp2HWBQtI+0 12VC9/A0qyy2mKvTlgaDS4SQnPV2P6ztrLgORgK6F9TPq4w8mMYxG9Nz6neea6Po cotjEtqZivKNno1MOUzAg3jsrTMy4O4ktS0e8JSog2pDRHIrMiqCOcq1B0cRGK6P 8UmcpkK6yiEgCCbtqEU3QhcIMY6srTC6xArCjuieTY9v6EtBWAioRXuHRkVhs8xR HkEdlWlYTXeGpqcR2cCrUzEzUKdVmyTmnIVtoy/FfRfKEY02fHhiEIS3orqXoc4y +/yPsoGVxt8lMFs3/E8Y =ubb/ -----END PGP SIGNATURE----- Merge tag 'kvm-s390-next-4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: vSIE (nested virtualization) feature for 4.8 (kvm/next) With an updated QEMU this allows to create nested KVM guests (KVM under KVM) on s390. s390 memory management changes from Martin Schwidefsky or acked by Martin. One common code memory management change (pageref) acked by Andrew Morton. The feature has to be enabled with the nested medule parameter.	2016-06-21 15:21:51 +02:00
David Hildenbrand	a411edf132	KVM: s390: vsie: add module parameter "nested" Let's be careful first and allow nested virtualization only if enabled by the system administrator. In addition, user space still has to explicitly enable it via SCLP features for it to work. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-21 09:43:47 +02:00
David Hildenbrand	5d3876a8bf	KVM: s390: vsie: add indication for future features We have certain SIE features that we cannot support for now. Let's add these features, so user space can directly prepare to enable them, so we don't have to update yet another component. In addition, add a comment block, telling why it is for now not possible to forward/enable these features. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-21 09:43:47 +02:00
David Hildenbrand	91473b487d	KVM: s390: vsie: correctly set and handle guest TOD Guest 2 sets up the epoch of guest 3 from his point of view. Therefore, we have to add the guest 2 epoch to the guest 3 epoch. We also have to take care of guest 2 epoch changes on STP syncs. This will work just fine by also updating the guest 3 epoch when a vsie_block has been set for a VCPU. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-21 09:43:46 +02:00
David Hildenbrand	b917ae573f	KVM: s390: vsie: speed up VCPU external calls Whenever a SIGP external call is injected via the SIGP external call interpretation facility, the VCPU is not kicked. When a VCPU is currently in the VSIE, the external call might not be processed immediately. Therefore we have to provoke partial execution exceptions, which leads to a kick of the VCPU and therefore also kick out of VSIE. This is done by simulating the WAIT state. This bit has no other side effects. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-21 09:43:46 +02:00
David Hildenbrand	94a15de8fb	KVM: s390: don't use CPUSTAT_WAIT to detect if a VCPU is idle As we want to make use of CPUSTAT_WAIT also when a VCPU is not idle but to force interception of external calls, let's check in the bitmap instead. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-21 09:43:45 +02:00
David Hildenbrand	adbf16985c	KVM: s390: vsie: speed up VCPU irq delivery when handling vsie Whenever we want to wake up a VCPU (e.g. when injecting an IRQ), we have to kick it out of vsie, so the request will be handled faster. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-21 09:43:44 +02:00
David Hildenbrand	1b7029bec1	KVM: s390: vsie: try to refault after a reported fault to g2 We can avoid one unneeded SIE entry after we reported a fault to g2. Theoretically, g2 resolves the fault and we can create the shadow mapping directly, instead of failing again when entering the SIE. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-21 09:43:44 +02:00
David Hildenbrand	7fd7f39daa	KVM: s390: vsie: support IBS interpretation We can easily enable ibs for guest 2, so he can use it for guest 3. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-21 09:43:43 +02:00
David Hildenbrand	13ee3f678b	KVM: s390: vsie: support conditional-external-interception We can easily enable cei for guest 2, so he can use it for guest 3. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-21 09:43:42 +02:00
David Hildenbrand	5630a8e82b	KVM: s390: vsie: support intervention-bypass We can easily enable intervention bypass for guest 2, so it can use it for guest 3. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-21 09:43:42 +02:00
David Hildenbrand	a1b7b9b286	KVM: s390: vsie: support guest-storage-limit-suppression We can easily forward guest-storage-limit-suppression if available. One thing to care about is keeping the prefix properly mapped when gsls in toggled on/off or the mso changes in between. Therefore we better remap the prefix on any mso changes just like we already do with the prefix. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-21 09:43:41 +02:00
David Hildenbrand	77d18f6d47	KVM: s390: vsie: support guest-PER-enhancement We can easily forward the guest-PER-enhancement facility to guest 2 if available. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-21 09:43:40 +02:00
David Hildenbrand	0615a326e0	KVM: s390: vsie: support shared IPTE-interlock facility As we forward the whole SCA provided by guest 2, we can directly forward SIIF if available. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-21 09:43:40 +02:00
David Hildenbrand	19c439b564	KVM: s390: vsie: support 64-bit-SCAO Let's provide the 64-bit-SCAO facility to guest 2, so he can set up a SCA for guest 3 that has a 64 bit address. Please note that we already require the 64 bit SCAO for our vsie implementation, in order to forward the SCA directly (by pinning the page). Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-21 09:43:39 +02:00
David Hildenbrand	588438cba0	KVM: s390: vsie: support run-time-instrumentation As soon as guest 2 is allowed to use run-time-instrumentation (indicated via via STFLE), it can also enable it for guest 3. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-21 09:43:39 +02:00
David Hildenbrand	c9bc1eabe5	KVM: s390: vsie: support vectory facility (SIMD) As soon as guest 2 is allowed to use the vector facility (indicated via STFLE), it can also enable it for guest 3. We have to take care of the sattellite block that might be used when not relying on lazy vector copying (not the case for KVM). Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-21 09:43:38 +02:00
David Hildenbrand	166ecb3d3c	KVM: s390: vsie: support transactional execution As soon as guest 2 is allowed to use transactional execution (indicated via STFLE), he can also enable it for guest 3. Active transactional execution requires also the second prefix page to be mapped. If that page cannot be mapped, a validity icpt has to be presented to the guest. We have to take care of tx being toggled on/off, otherwise we might get wrong prefix validity icpt. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-21 09:43:37 +02:00
David Hildenbrand	bbeaa58b32	KVM: s390: vsie: support aes dea wrapping keys As soon as message-security-assist extension 3 is enabled for guest 2, we have to allow key wrapping for guest 3. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-21 09:43:37 +02:00
David Hildenbrand	66b630d5b7	KVM: s390: vsie: support STFLE interpretation Issuing STFLE is extremely rare. Instead of copying 2k on every VSIE call, let's do this lazily, when a guest 3 tries to execute STFLE. We can setup the block and retry. Unfortunately, we can't directly forward that facility list, as we only have a 31 bit address for the facility list designation. So let's use a DMA allocation for our vsie_page instead for now. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-21 09:43:36 +02:00
David Hildenbrand	4ceafa9027	KVM: s390: vsie: support host-protection-interruption Introduced with ESOP, therefore available for the guest if it is allowed to use ESOP. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-21 09:43:35 +02:00
David Hildenbrand	535ef81c6e	KVM: s390: vsie: support edat1 / edat2 If guest 2 is allowed to use edat 1 / edat 2, it can also set it up for guest 3, so let's properly check and forward the edat cpuflags. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-21 09:43:35 +02:00
David Hildenbrand	3573602b20	KVM: s390: vsie: support setting the ibc As soon as we forward an ibc to guest 2 (indicated via kvm->arch.model.ibc), he can also use it for guest 3. Let's properly round the ibc up/down, so we avoid any potential validity icpts from the underlying SIE, if it doesn't simply round the values. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-21 09:43:34 +02:00
David Hildenbrand	06d68a6c85	KVM: s390: vsie: optimize gmap prefix mapping In order to not always map the prefix, we have to take care of certain aspects that implicitly unmap the prefix: - Changes to the prefix address - Changes to MSO, because the HVA of the prefix is changed - Changes of the gmap shadow (e.g. unshadowed, asce or edat changes) By properly handling these cases, we can stop remapping the prefix when there is no reason to do so. This also allows us now to not acquire any gmap shadow locks when rerunning the vsie and still having a valid gmap shadow. Please note, to detect changing gmap shadows, we have to keep the reference of the gmap shadow. The address of a gmap shadow does otherwise not reliably indicate if the gmap shadow has changed (the memory chunk could get reused). Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-21 09:43:34 +02:00
David Hildenbrand	a3508fbe9d	KVM: s390: vsie: initial support for nested virtualization This patch adds basic support for nested virtualization on s390x, called VSIE (virtual SIE) and allows it to be used by the guest if the necessary facilities are supported by the hardware and enabled for the guest. In order to make this work, we have to shadow the sie control block provided by guest 2. In order to gain some performance, we have to reuse the same shadow blocks as good as possible. For now, we allow as many shadow blocks as we have VCPUs (that way, every VCPU can run the VSIE concurrently). We have to watch out for the prefix getting unmapped out of our shadow gmap and properly get the VCPU out of VSIE in that case, to fault the prefix pages back in. We use the PROG_REQUEST bit for that purpose. This patch is based on an initial prototype by Tobias Elpelt. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-21 09:43:33 +02:00
David Hildenbrand	df9b2b4a4a	mm/page_ref: introduce page_ref_inc_return Let's introduce that helper. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-21 09:43:04 +02:00
David Hildenbrand	3d84683bd7	s390: introduce page_to_virt() and pfn_to_virt() Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-20 09:55:32 +02:00
David Hildenbrand	37d9df98b7	KVM: s390: backup the currently enabled gmap when scheduled out Nested virtualization will have to enable own gmaps. Current code would enable the wrong gmap whenever scheduled out and back in, therefore resulting in the wrong gmap being enabled. This patch reenables the last enabled gmap, therefore avoiding having to touch vcpu->arch.gmap when enabling a different gmap. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-20 09:55:24 +02:00
David Hildenbrand	65d0b0d4bc	KVM: s390: fast path for shadow gmaps in gmap notifier The default kvm gmap notifier doesn't have to handle shadow gmaps. So let's just directly exit in case we get notified about one. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-20 09:55:21 +02:00
David Hildenbrand	01f719176f	s390/mm: don't fault everything in read-write in gmap_pte_op_fixup() Let's not fault in everything in read-write but limit it to read-only where possible. When restricting access rights, we already have the required protection level in our hands. When reading from guest 2 storage (gmap_read_table), it is obviously PROT_READ. When shadowing a pte, the required protection level is given via the guest 2 provided pte. Based on an initial patch by Martin Schwidefsky. Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-20 09:55:20 +02:00
David Hildenbrand	5b6c963bce	s390/mm: allow to check if a gmap shadow is valid It will be very helpful to have a mechanism to check without any locks if a given gmap shadow is still valid and matches the given properties. Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-20 09:55:16 +02:00
David Hildenbrand	4a49443924	s390/mm: remember the int code for the last gmap fault For nested virtualization, we want to know if we are handling a protection exception, because these can directly be forwarded to the guest without additional checks. Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-20 09:55:08 +02:00
David Hildenbrand	717c05554a	s390/mm: limit number of real-space gmap shadows We have no known user of real-space designation and only support it to be architecture compliant. Gmap shadows with real-space designation are never unshadowed automatically, as there is nothing to protect for the top level table. So let's simply limit the number of such shadows to one by removing existing ones on creation of another one. Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-20 09:55:07 +02:00
David Hildenbrand	3218f7094b	s390/mm: support real-space for gmap shadows We can easily support real-space designation just like EDAT1 and EDAT2. So guest2 can provide for guest3 an asce with the real-space control being set. We simply have to allocate the biggest page table possible and fake all levels. There is no protection to consider. If we exceed guest memory, vsie code will inject an addressing exception (via program intercept). In the future, we could limit the fake table level to the gmap page table. As the top level page table can never go away, such gmap shadows will never get unshadowed, we'll have to come up with another way to limit the number of kept gmap shadows. Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-20 09:55:02 +02:00
David Hildenbrand	1c65781b56	s390/mm: push rte protection down to shadow pte Just like we already do with ste protection, let's take rte protection into account. This way, the host pte doesn't have to be mapped writable. Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-20 09:55:00 +02:00
David Hildenbrand	18b8980988	s390/mm: support EDAT2 for gmap shadows If the guest is enabled for EDAT2, we can easily create shadows for guest2 -> guest3 provided tables that make use of EDAT2. If guest2 references a 2GB page, this memory looks consecutive for guest2, but it does not have to be so for us. Therefore we have to create fake segment and page tables. This works just like EDAT1 support, so page tables are removed when the parent table (r3t table entry) is changed. We don't hve to care about: - ACCF-Validity Control in RTTE - Access-Control Bits in RTTE - Fetch-Protection Bit in RTTE - Common-Region Bit in RTTE Just like for EDAT1, all bits might be dropped and there is no guaranteed that they are active. Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-20 09:54:56 +02:00
David Hildenbrand	fd8d4e3ab6	s390/mm: support EDAT1 for gmap shadows If the guest is enabled for EDAT1, we can easily create shadows for guest2 -> guest3 provided tables that make use of EDAT1. If guest2 references a 1MB page, this memory looks consecutive for guest2, but it might not be so for us. Therefore we have to create fake page tables. We can easily add that to our existing infrastructure. The invalidation mechanism will make sure that fake page tables are removed when the parent table (sgt table entry) is changed. As EDAT1 also introduced protection on all page table levels, we have to also shadow these correctly. We don't have to care about: - ACCF-Validity Control in STE - Access-Control Bits in STE - Fetch-Protection Bit in STE - Common-Segment Bit in STE As all bits might be dropped and there is no guaranteed that they are active ("unpredictable whether the CPU uses these bits", "may be used"). Without using EDAT1 in the shadow ourselfes (STE-format control == 0), simply shadowing these bits would not be enough. They would be ignored. Please note that we are using the "fake" flag to make this look consistent with further changes (EDAT2, real-space designation support) and don't let the shadow functions handle fc=1 stes. In the future, with huge pages in the host, gmap_shadow_pgt() could simply try to map a huge host page if "fake" is set to one and indicate via return value that no lower fake tables / shadow ptes are required. Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-06-20 09:54:51 +02:00

1 2 3 4 5 ...

601988 Commits All Branches Search

601988 Commits

All Branches