linux

Commit Graph

Author	SHA1	Message	Date
Sean Christopherson	3df5c37e55	KVM: nVMX: try to set EFER bits correctly when initializing controls VM_ENTRY_IA32E_MODE and VM_{ENTRY,EXIT}_LOAD_IA32_EFER will be explicitly set/cleared as needed by vmx_set_efer(), but attempt to get the bits set correctly when intializing the control fields. Setting the value correctly can avoid multiple VMWrites. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:52 +02:00
Sean Christopherson	02343cf207	KVM: vmx: do not unconditionally clear EFER switching Do not unconditionally call clear_atomic_switch_msr() when updating EFER. This adds up to four unnecessary VMWrites in the case where guest_efer != host_efer, e.g. if the load_on_{entry,exit} bits were already set. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:51 +02:00
Sean Christopherson	b7031fd40f	KVM: nVMX: reset cache/shadows when switching loaded VMCS Reset the vm_{entry,exit}_controls_shadow variables as well as the segment cache after loading a new VMCS in vmx_switch_vmcs(). The shadows/cache track VMCS data, i.e. they're stale every time we switch to a new VMCS regardless of reason. This fixes a bug where stale control shadows would be consumed after a nested VMExit due to a failed consistency check. Suggested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:50 +02:00
Sean Christopherson	1abf23fb42	KVM: nVMX: use vm_exit_controls_init() to write exit controls for vmcs02 Write VM_EXIT_CONTROLS using vm_exit_controls_init() when configuring vmcs02, otherwise vm_exit_controls_shadow will be stale. EFER in particular can be corrupted if VM_EXIT_LOAD_IA32_EFER is not updated due to an incorrect shadow optimization, which can crash L0 due to EFER not being loaded on exit. This does not occur with the current code base simply because update_transition_efer() unconditionally clears VM_EXIT_LOAD_IA32_EFER before conditionally setting it, and because a nested guest always starts with VM_EXIT_LOAD_IA32_EFER clear, i.e. we'll only ever unnecessarily clear the bit. That is, until someone optimizes update_transition_efer()... Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:50 +02:00
Sean Christopherson	5b8ba41daf	KVM: nVMX: move vmcs12 EPTP consistency check to check_vmentry_prereqs() An invalid EPTP causes a VMFail(VMXERR_ENTRY_INVALID_CONTROL_FIELD), not a VMExit. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:49 +02:00
Sean Christopherson	64a919f7b5	KVM: nVMX: move host EFER consistency checks to VMFail path Invalid host state related to loading EFER on VMExit causes a VMFail(VMXERR_ENTRY_INVALID_HOST_STATE_FIELD), not a VMExit. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:49 +02:00
Wei Yang	31fc4f95dd	KVM: leverage change to adjust slots->used_slots in update_memslots() update_memslots() is only called by __kvm_set_memory_region(), in which "change" is calculated and indicates how to adjust slots->used_slots * increase by one if it is KVM_MR_CREATE * decrease by one if it is KVM_MR_DELETE * not change for others This patch adjusts slots->used_slots in update_memslots() based on "change" value instead of re-calculate those states again. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:48 +02:00
Jim Mattson	3c6e099fa1	KVM: nVMX: Always reflect #NM VM-exits to L1 When bit 3 (corresponding to CR0.TS) of the VMCS12 cr0_guest_host_mask field is clear, the VMCS12 guest_cr0 field does not necessarily hold the current value of the L2 CR0.TS bit, so the code that checked for L2's CR0.TS bit being set was incorrect. Moreover, I'm not sure that the CR0.TS check was adequate. (What if L2's CR0.EM was set, for instance?) Fortunately, lazy FPU has gone away, so L0 has lost all interest in intercepting #NM exceptions. See commit `bd7e5b0899` ("KVM: x86: remove code for lazy FPU handling"). Therefore, there is no longer any question of which hypervisor gets first dibs. The #NM VM-exit should always be reflected to L1. (Note that the corresponding bit must be set in the VMCS12 exception_bitmap field for there to be an #NM VM-exit at all.) Fixes: `ccf9844e5d` ("kvm, vmx: Really fix lazy FPU on nested guest") Reported-by: Abhiroop Dabral <adabral@paloaltonetworks.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Tested-by: Abhiroop Dabral <adabral@paloaltonetworks.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:47 +02:00
Vitaly Kuznetsov	214ff83d44	KVM: x86: hyperv: implement PV IPI send hypercalls Using hypercall for sending IPIs is faster because this allows to specify any number of vCPUs (even > 64 with sparse CPU set), the whole procedure will take only one VMEXIT. Current Hyper-V TLFS (v5.0b) claims that HvCallSendSyntheticClusterIpi hypercall can't be 'fast' (passing parameters through registers) but apparently this is not true, Windows always uses it as 'fast' so we need to support that. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:47 +02:00
Vitaly Kuznetsov	2cefc5feb8	KVM: x86: hyperv: optimize kvm_hv_flush_tlb() for vp_index == vcpu_idx case VP inedx almost always matches VCPU and when it does it's faster to walk the sparse set instead of all vcpus. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:46 +02:00
Vitaly Kuznetsov	0b0a31badb	KVM: x86: hyperv: valid_bank_mask should be 'u64' This probably doesn't matter much (KVM_MAX_VCPUS is much lower nowadays) but valid_bank_mask is really u64 and not unsigned long. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:46 +02:00
Vitaly Kuznetsov	87ee613d07	KVM: x86: hyperv: keep track of mismatched VP indexes In most common cases VP index of a vcpu matches its vcpu index. Userspace is, however, free to set any mapping it wishes and we need to account for that when we need to find a vCPU with a particular VP index. To keep search algorithms optimal in both cases introduce 'num_mismatched_vp_indexes' counter showing how many vCPUs with mismatching VP index we have. In case the counter is zero we can assume vp_index == vcpu_idx. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:45 +02:00
Vitaly Kuznetsov	1779a39f78	KVM: x86: hyperv: consistently use 'hv_vcpu' for 'struct kvm_vcpu_hv' variables Rename 'hv' to 'hv_vcpu' in kvm_hv_set_msr/kvm_hv_get_msr(); 'hv' is 'reserved' for 'struct kvm_hv' variables across the file. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:45 +02:00
Vitaly Kuznetsov	a812297c4f	KVM: x86: hyperv: optimize 'all cpus' case in kvm_hv_flush_tlb() We can use 'NULL' to represent 'all cpus' case in kvm_make_vcpus_request_mask() and avoid building vCPU mask with all vCPUs. Suggested-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:44 +02:00
Vitaly Kuznetsov	9170200ec0	KVM: x86: hyperv: enforce vp_index < KVM_MAX_VCPUS Hyper-V TLFS (5.0b) states: > Virtual processors are identified by using an index (VP index). The > maximum number of virtual processors per partition supported by the > current implementation of the hypervisor can be obtained through CPUID > leaf 0x40000005. A virtual processor index must be less than the > maximum number of virtual processors per partition. Forbid userspace to set VP_INDEX above KVM_MAX_VCPUS. get_vcpu_by_vpidx() can now be optimized to bail early when supplied vpidx is >= KVM_MAX_VCPUS. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:43 +02:00
Paolo Bonzini	0624fca951	kvm/x86: return meaningful value from KVM_SIGNAL_MSI If kvm_apic_map_get_dest_lapic() finds a disabled LAPIC, it will return with bitmap==0 and (*r == -1) will be returned to userspace. QEMU may then record "KVM: injection failed, MSI lost (Operation not permitted)" in its log, which is quite puzzling. Reported-by: Peng Hao <penghao122@sina.com.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:43 +02:00
Wei Yang	4fef0f4913	KVM: x86: move definition PT_MAX_HUGEPAGE_LEVEL and KVM_NR_PAGE_SIZES together Currently, there are two definitions related to huge page, but a little bit far from each other and seems loosely connected: * KVM_NR_PAGE_SIZES defines the number of different size a page could map * PT_MAX_HUGEPAGE_LEVEL means the maximum level of huge page The number of different size a page could map equals the maximum level of huge page, which is implied by current definition. While current implementation may not be kind to readers and further developers: * KVM_NR_PAGE_SIZES looks like a stand alone definition at first sight * in case we need to support more level, two places need to change This patch tries to make these two definition more close, so that reader and developer would feel more comfortable to manipulate. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:42 +02:00
Tianyu Lan	aaa45da24e	KVM/VMX: Remve unused function is_external_interrupt(). is_external_interrupt() is not used now and so remove it. Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:42 +02:00
Wei Yang	daefb7949e	KVM: x86: return 0 in case kvm_mmu_memory_cache has min number of objects The code tries to pre-allocate min number of objects, so it is ok to return 0 when the kvm_mmu_memory_cache meets the requirement. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:41 +02:00
Krish Sadhukhan	55c1dcd80b	nVMX x86: Make nested_vmx_check_pml_controls() concise Suggested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:41 +02:00
Wei Yang	3ff519f29d	KVM: x86: adjust kvm_mmu_page member to save 8 bytes On a 64bits machine, struct is naturally aligned with 8 bytes. Since kvm_mmu_page member unsync and role are less then 4 bytes, we can rearrange the sequence to compace the struct. As the comment shows, role and gfn are used to key the shadow page. In order to keep the comment valid, this patch moves the unsync up and exchange the position of role and gfn. From /proc/slabinfo, it shows the size of kvm_mmu_page is 8 bytes less and with one more object per slap after applying this patch. # name <active_objs> <num_objs> <objsize> <objperslab> kvm_mmu_page_header 0 0 168 24 kvm_mmu_page_header 0 0 160 25 Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:40 +02:00
Sean Christopherson	bd18bffca3	KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail A VMEnter that VMFails (as opposed to VMExits) does not touch host state beyond registers that are explicitly noted in the VMFail path, e.g. EFLAGS. Host state does not need to be loaded because VMFail is only signaled for consistency checks that occur before the CPU starts to load guest state, i.e. there is no need to restore any state as nothing has been modified. But in the case where a VMFail is detected by hardware and not by KVM (due to deferring consistency checks to hardware), KVM has already loaded some amount of guest state. Luckily, "loaded" only means loaded to KVM's software model, i.e. vmcs01 has not been modified. So, unwind our software model to the pre-VMEntry host state. Not restoring host state in this VMFail path leads to a variety of failures because we end up with stale data in vcpu->arch, e.g. CR0, CR4, EFER, etc... will all be out of sync relative to vmcs01. Any significant delta in the stale data is all but guaranteed to crash L1, e.g. emulation of SMEP, SMAP, UMIP, WP, etc... will be wrong. An alternative to this "soft" reload would be to load host state from vmcs12 as if we triggered a VMExit (as opposed to VMFail), but that is wildly inconsistent with respect to the VMX architecture, e.g. an L1 VMM with separate VMExit and VMFail paths would explode. Note that this approach does not mean KVM is 100% accurate with respect to VMX hardware behavior, even at an architectural level (the exact order of consistency checks is microarchitecture specific). But 100% emulation accuracy isn't the goal (with this patch), rather the goal is to be consistent in the information delivered to L1, e.g. a VMExit should not fall-through VMENTER, and a VMFail should not jump to HOST_RIP. This technically reverts commit "5af4157388ad (KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure)", but retains the core aspects of that patch, just in an open coded form due to the need to pull state from vmcs01 instead of vmcs12. Restoring host state resolves a variety of issues introduced by commit "4f350c6dbcb9 (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)", which remedied the incorrect behavior of treating VMFail like VMExit but in doing so neglected to restore arch state that had been modified prior to attempting nested VMEnter. A sample failure that occurs due to stale vcpu.arch state is a fault of some form while emulating an LGDT (due to emulated UMIP) from L1 after a failed VMEntry to L3, in this case when running the KVM unit test test_tpr_threshold_values in L1. L0 also hits a WARN in this case due to a stale arch.cr4.UMIP. L1: BUG: unable to handle kernel paging request at ffffc90000663b9e PGD 276512067 P4D 276512067 PUD 276513067 PMD 274efa067 PTE 8000000271de2163 Oops: 0009 [#1] SMP CPU: 5 PID: 12495 Comm: qemu-system-x86 Tainted: G W 4.18.0-rc2+ #2 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:native_load_gdt+0x0/0x10 ... Call Trace: load_fixmap_gdt+0x22/0x30 __vmx_load_host_state+0x10e/0x1c0 [kvm_intel] vmx_switch_vmcs+0x2d/0x50 [kvm_intel] nested_vmx_vmexit+0x222/0x9c0 [kvm_intel] vmx_handle_exit+0x246/0x15a0 [kvm_intel] kvm_arch_vcpu_ioctl_run+0x850/0x1830 [kvm] kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm] do_vfs_ioctl+0x9f/0x600 ksys_ioctl+0x66/0x70 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x4f/0x100 entry_SYSCALL_64_after_hwframe+0x44/0xa9 L0: WARNING: CPU: 2 PID: 3529 at arch/x86/kvm/vmx.c:6618 handle_desc+0x28/0x30 [kvm_intel] ... CPU: 2 PID: 3529 Comm: qemu-system-x86 Not tainted 4.17.2-coffee+ #76 Hardware name: Intel Corporation Kabylake Client platform/KBL S RIP: 0010:handle_desc+0x28/0x30 [kvm_intel] ... Call Trace: kvm_arch_vcpu_ioctl_run+0x863/0x1840 [kvm] kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm] do_vfs_ioctl+0x9f/0x5e0 ksys_ioctl+0x66/0x70 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x49/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: `5af4157388` (KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure) Fixes: `4f350c6dbc` (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly) Cc: Jim Mattson <jmattson@google.com> Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim KrÄmÃ¡Å™ <rkrcmar@redhat.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:39 +02:00
Jim Mattson	cfb634fe30	KVM: nVMX: Clear reserved bits of #DB exit qualification According to volume 3 of the SDM, bits 63:15 and 12:4 of the exit qualification field for debug exceptions are reserved (cleared to 0). However, the SDM is incorrect about bit 16 (corresponding to DR6.RTM). This bit should be set if a debug exception (#DB) or a breakpoint exception (#BP) occurred inside an RTM region while advanced debugging of RTM transactional regions was enabled. Note that this is the opposite of DR6.RTM, which "indicates (when clear) that a debug exception (#DB) or breakpoint exception (#BP) occurred inside an RTM region while advanced debugging of RTM transactional regions was enabled." There is still an issue with stale DR6 bits potentially being misreported for the current debug exception. DR6 should not have been modified before vectoring the #DB exception, and the "new DR6 bits" should be available somewhere, but it was and they aren't. Fixes: `b96fb43977` ("KVM: nVMX: fixes to nested virt interrupt injection") Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:39 +02:00
Andrew Jones	5b8ee8792f	kvm: selftests: support high GPAs in dirty_log_test Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:38 +02:00
Andrew Jones	e28934e661	kvm: selftests: stop lying to aarch64 tests about PA-bits Let's add the 40 PA-bit versions of the VM modes, that AArch64 should have been using, so we can extend the dirty log test without breaking things. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:38 +02:00
Andrew Jones	e1b376f140	kvm: selftests: dirty_log_test: also test 64K pages on aarch64 Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:37 +02:00
Andrew Jones	fff8dcd7b4	kvm: selftests: port dirty_log_test to aarch64 While we're messing with the code for the port and to support guest page sizes that are less than the host page size, we also make some code formatting cleanups and apply sync_global_to_guest(). Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:36 +02:00
Andrew Jones	81d1cca0c0	kvm: selftests: introduce new VM mode for 64K pages Rename VM_MODE_FLAT48PG to be more descriptive of its config and add a new config that has the same parameters, except with 64K pages. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:29:35 +02:00
Andrew Jones	0bec140fb6	kvm: selftests: add vcpu support for aarch64 This code adds VM and VCPU setup code for the VM_MODE_FLAT48PG mode. The VM_MODE_FLAT48PG isn't yet fully supportable, as it defines the guest physical address limit as 52-bits, and KVM currently only supports guests with up to 40-bit physical addresses (see KVM_PHYS_SHIFT). VM_MODE_FLAT48PG will work fine, though, as long as no >= 40-bit physical addresses are used. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:26:56 +02:00
Andrew Jones	7a6629ef74	kvm: selftests: add virt mem support for aarch64 Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:26:51 +02:00
Andrew Jones	d5106539cf	kvm: selftests: add vm_phy_pages_alloc Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:26:47 +02:00
Andrew Jones	eabe7881d2	kvm: selftests: tidy up kvm_util Tidy up kvm-util code: code/comment formatting, remove unused code, and move x86 specific code out. We also move vcpu_dump() out of common code, because not all arches (AArch64) have KVM_GET_REGS. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:26:44 +02:00
Andrew Jones	eea192bfd9	kvm: selftests: add cscope make target Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:26:30 +02:00
Andrew Jones	cc68765d41	kvm: selftests: move arch-specific files to arch-specific locations Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:26:16 +02:00
Andrew Jones	14c47b7530	kvm: selftests: introduce ucall Rework the guest exit to userspace code to generalize the concept into what it is, a "hypercall to userspace", and provide two implementations of it: the PortIO version currently used, but only useable by x86, and an MMIO version that other architectures (except s390) can use. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:26:14 +02:00
Andrew Jones	6c930268bc	kvm: selftests: vcpu_setup: set cr4.osfxsr Guest code may want to call functions that have variable arguments. To do so, we either need to compile with -mno-sse or enable SSE in the VCPUs. As it should be pretty safe to turn on the feature, and -mno-sse would make linking test code with standard libraries difficult, we choose the feature enabling. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:26:02 +02:00
Wanpeng Li	3b8a5df6c4	KVM: LAPIC: Tune lapic_timer_advance_ns automatically In cloud environment, lapic_timer_advance_ns is needed to be tuned for every CPU generations, and every host kernel versions(the kvm-unit-tests/tscdeadline_latency.flat is 5700 cycles for upstream kernel and 9600 cycles for our 3.10 product kernel, both preemption_timer=N, Skylake server). This patch adds the capability to automatically tune lapic_timer_advance_ns step by step, the initial value is 1000ns as 'commit `d0659d946b` ("KVM: x86: add option to advance tscdeadline hrtimer expiration")' recommended, it will be reduced when it is too early, and increased when it is too late. The guest_tsc and tsc_deadline are hard to equal, so we assume we are done when the delta is within a small scope e.g. 100 cycles. This patch reduces latency (kvm-unit-tests/tscdeadline_latency, busy waits, preemption_timer enabled) from ~2600 cyles to ~1200 cyles on our Skylake server. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Liran Alon <liran.alon@oracle.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-17 00:25:54 +02:00
Liran Alon	efebf0aaec	KVM: nVMX: Do not flush TLB on L1<->L2 transitions if L1 uses VPID and EPT If L1 uses VPID, it expects TLB to not be flushed on L1<->L2 transitions. However, code currently flushes TLB nonetheless if we didn't allocate a vpid02 for L2. As in this case, vmcs02->vpid == vmcs01->vpid == vmx->vpid. But, if L1 uses EPT, TLB entires populated by L2 are tagged with EPTP02 while TLB entries populated by L1 are tagged with EPTP01. Therefore, we can also avoid TLB flush if L1 uses VPID and EPT. Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-13 12:00:55 +02:00
Liran Alon	327c072187	KVM: nVMX: Flush linear and combined mappings on VPID02 related flushes All VPID12s used on a given L1 vCPU is translated to a single VPID02 (vmx->nested.vpid02 or vmx->vpid). Therefore, on L1->L2 VMEntry, we need to invalidate linear and combined mappings tagged by VPID02 in case L1 uses VPID and vmcs12->vpid was changed since last L1->L2 VMEntry. However, current code invalidates the wrong mappings as it calls __vmx_flush_tlb() with invalidate_gpa parameter set to true which will result in invalidating combined and guest-physical mappings tagged with active EPTP which is EPTP01. Similarly, INVVPID emulation have the exact same issue. Fix both issues by just setting invalidate_gpa parameter to false which will result in invalidating linear and combined mappings tagged with given VPID02 as required. Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-13 12:00:54 +02:00
Liran Alon	3d5bdae8b1	KVM: nVMX: Use correct VPID02 when emulating L1 INVVPID In case L0 didn't allocate vmx->nested.vpid02 for L2, vmcs02->vpid is set to vmx->vpid. Consider this case when emulating L1 INVVPID in L0. Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-13 12:00:54 +02:00
Liran Alon	1438921c6d	KVM: nVMX: Flush TLB entries tagged by dest EPTP on L1<->L2 transitions If L1 and L2 share VPID (because L1 don't use VPID or we haven't allocated a vpid02), we need to flush TLB on L1<->L2 transitions. Before this patch, this TLB flushing was done by vmx_flush_tlb(). If L0 use EPT, this will translate into INVEPT(active_eptp); However, if L1 use EPT, in L1->L2 VMEntry, active EPTP is EPTP01 but TLB entries populated by L2 are tagged with EPTP02. Therefore we should delay vmx_flush_tlb() until active_eptp is EPTP02. To achieve this, instead of directly calling vmx_flush_tlb() we request it to be called by KVM_REQ_TLB_FLUSH which is evaluated after KVM_REQ_LOAD_CR3 which sets the active_eptp to EPTP02 as required. Similarly, on L2->L1 VMExit, active EPTP is EPTP02 but TLB entries populated by L1 are tagged with EPTP01 and therefore we should delay vmx_flush_tlb() until active_eptp is EPTP01. Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-13 12:00:53 +02:00
Sean Christopherson	3de6347bf9	KVM: vmx: rename KVM_GUEST_CR0_MASK tp KVM_VM_CR0_ALWAYS_OFF The KVM_GUEST_CR0_MASK macro tracks CR0 bits that are forced to zero by the VMX architecture, i.e. CR0.{NW,CD} must always be zero in the hardware CR0 post-VMXON. Rename the macro to clarify its purpose, be consistent with KVM_VM_CR0_ALWAYS_ON and avoid confusion with the CR0_GUEST_HOST_MASK field. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-13 12:00:53 +02:00
Paolo Bonzini	3d0d0d9b1d	KVM: s390/vfio-ap: Fixes and enhancements for vfio-ap - add tracing - fix a locking bug - make local functions and data static -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJbwKzMAAoJEBF7vIC1phx8DR0QALWLdmVtMioQeeoas9LYurI0 VuFjM5QsH9hVkjDZIP7Y0titQz1L4WWqIwZVffHmQGL8saRr+fd/7gBwAReKgZVU OrZDtUpqS1PBsrKQx36MjWrZ5n4R5tzvW38xiPevEIBLq+rtQ1GbiCs3rtRwKlur uABquv9uYr8GHrmYa9bUvUpbVvHQvWz/h8T4cjgwAuN0NER6PzqMUzcZBt2Q9s29 26ZQ+r7CZ2qklJvoB8UOsrWdsZhM58BaY+CJzrAsxD3OAnPJGILpTFW2dXIoVfkh LMuuuzl8Tl0ntwJKjifhut7/f9VX9ipTvmA53e2moq52UEA2mJoOzu1Ku/KAivLe 4efycTIvYRV1UKE1JLWlS/5z6fNg9eG2CykSqRlrznEPiGNTPMY5JemtvrPINkcZ QrdbI6ou+grFXlfaG+KcS2iFOgrMqL1UWABiq1jJVW2RAK1ZeUBFHKVeJwKVKSeW p9xbh7jl7yIvQ8bfsO8P3LVFWK0EmxJt6oA7ln4X7O1Pbx1QBeH5ZMBsdqLJZFsT AQIT/p51JjIF6H4V8/jBCRYuR91IcD9CQlRR96y8zfHiaEYlvS1YFHuZdLZCJ7Ef LFLVIHXGfQLHoSN2r/8us7OmmVDJctKwPGLQuh2RcMFJPySm/iBBX9m57S7VWjAb 87wLYmntUMcFHIWFgixT =hZdt -----END PGP SIGNATURE----- Merge tag 'kvm-s390-next-4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390/vfio-ap: Fixes and enhancements for vfio-ap - add tracing - fix a locking bug - make local functions and data static	2018-10-13 12:00:26 +02:00
Paolo Bonzini	7dd2157cb6	PPC KVM update for 4.20. The major new feature here is nested HV KVM support. This allows the HV KVM module to load inside a radix guest on POWER9 and run radix guests underneath it. These nested guests can run in supervisor mode and don't require any additional instructions to be emulated, unlike with PR KVM, and so performance is much better than with PR KVM, and is very close to the performance of a non-nested guest. A nested hypervisor (a guest with nested guests) can be migrated to another host and will bring all its nested guests along with it. A nested guest can also itself run guests, and so on down to any desired depth of nesting. Apart from that there are a series of updates for IOMMU handling from Alexey Kardashevskiy, a "one VM per core" mode for HV KVM for security-paranoid applications, and a small fix for PR KVM. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJbvY8iAAoJEJ2a6ncsY3GfMVAH/0t1nKifIr3KTKm+zR4peOU/ DMOTp/BomPCAlIJfp/C+j56GJ9H5me+i/ykTG6ZvHFhIInaM6OvavU7xQ7LBRvl1 ltjSr21IRO5BUWhnuDCYQXPFasJ0nX+LrKIRr7vNoUioUJHk6W1e4Mfep7CNs6fD Q1p4Yg0BFCxN+ijhIOlQ49/ymbbaPFArOPh0mhipOEeY51NvCXYiUHiygu8RWnV9 sHuVFW3zQac0geUaWMAscVJ4dcvEw7GLXeY4FwWQtBPMq01LjJrECZcqy/c3Ut/2 06k7MpjZqPRneLp7aGznZCUjfQUMuP80ZRPQ8QLC/q1TODl+wPvB+hby1Q2tKvg= =zAPd -----END PGP SIGNATURE----- Merge tag 'kvm-ppc-next-4.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD PPC KVM update for 4.20. The major new feature here is nested HV KVM support. This allows the HV KVM module to load inside a radix guest on POWER9 and run radix guests underneath it. These nested guests can run in supervisor mode and don't require any additional instructions to be emulated, unlike with PR KVM, and so performance is much better than with PR KVM, and is very close to the performance of a non-nested guest. A nested hypervisor (a guest with nested guests) can be migrated to another host and will bring all its nested guests along with it. A nested guest can also itself run guests, and so on down to any desired depth of nesting. Apart from that there are a series of updates for IOMMU handling from Alexey Kardashevskiy, a "one VM per core" mode for HV KVM for security-paranoid applications, and a small fix for PR KVM.	2018-10-10 18:38:32 +02:00
Paul Mackerras	901f8c3f6f	KVM: PPC: Book3S HV: Add NO_HASH flag to GET_SMMU_INFO ioctl result This adds a KVM_PPC_NO_HASH flag to the flags field of the kvm_ppc_smmu_info struct, and arranges for it to be set when running as a nested hypervisor, as an unambiguous indication to userspace that HPT guests are not supported. Reporting the KVM_CAP_PPC_MMU_HASH_V3 capability as false could be taken as indicating only that the new HPT features in ISA V3.0 are not supported, leaving it ambiguous whether pre-V3.0 HPT features are supported. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2018-10-09 16:14:54 +11:00
Paul Mackerras	aa069a9969	KVM: PPC: Book3S HV: Add a VM capability to enable nested virtualization With this, userspace can enable a KVM-HV guest to run nested guests under it. The administrator can control whether any nested guests can be run; setting the "nested" module parameter to false prevents any guests becoming nested hypervisors (that is, any attempt to enable the nested capability on a guest will fail). Guests which are already nested hypervisors will continue to be so. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2018-10-09 16:14:47 +11:00
Paul Mackerras	9d67121a4f	Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-next This merges in the "ppc-kvm" topic branch of the powerpc tree to get a series of commits that touch both general arch/powerpc code and KVM code. These commits will be merged both via the KVM tree and the powerpc tree. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2018-10-09 16:13:20 +11:00
Paul Mackerras	83a055104e	KVM: PPC: Book3S HV: Add nested shadow page tables to debugfs This adds a list of valid shadow PTEs for each nested guest to the 'radix' file for the guest in debugfs. This can be useful for debugging. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-10-09 16:04:27 +11:00
Paul Mackerras	de760db4d9	KVM: PPC: Book3S HV: Allow HV module to load without hypervisor mode With this, the KVM-HV module can be loaded in a guest running under KVM-HV, and if the hypervisor supports nested virtualization, this guest can now act as a nested hypervisor and run nested guests. This also adds some checks to inform userspace that HPT guests are not supported by nested hypervisors (by returning false for the KVM_CAP_PPC_MMU_HASH_V3 capability), and to prevent userspace from configuring a guest to use HPT mode. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-10-09 16:04:27 +11:00
Suraj Jitindar Singh	10b5022db7	KVM: PPC: Book3S HV: Handle differing endianness for H_ENTER_NESTED The hcall H_ENTER_NESTED takes two parameters: the address in L1 guest memory of a hv_regs struct and the address of a pt_regs struct. The hcall requests the L0 hypervisor to use the register values in these structs to run a L2 guest and to return the exit state of the L2 guest in these structs. These are in the endianness of the L1 guest, rather than being always big-endian as is usually the case for PAPR hypercalls. This is convenient because it means that the L1 guest can pass the address of the regs field in its kvm_vcpu_arch struct. This also improves performance slightly by avoiding the need for two copies of the pt_regs struct. When reading/writing these structures, this patch handles the case where the endianness of the L1 guest differs from that of the L0 hypervisor, by byteswapping the structures after reading and before writing them back. Since all the fields of the pt_regs are of the same type, i.e., unsigned long, we treat it as an array of unsigned longs. The fields of struct hv_guest_state are not all the same, so its fields are byteswapped individually. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-10-09 16:04:27 +11:00

1 2 3 4 5 ...

782874 Commits All Branches Search

782874 Commits

All Branches