linux

Commit Graph

Author	SHA1	Message	Date
Wanpeng Li	81dc01f749	kvm: vmx: add nested virtualization support for xsaves Add nested virtualization support for xsaves. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-05 13:57:44 +01:00
Wanpeng Li	203000993d	kvm: vmx: add MSR logic for XSAVES Add logic to get/set the XSS model-specific register. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-05 13:57:39 +01:00
Wanpeng Li	f53cd63c2d	kvm: x86: handle XSAVES vmcs and vmexit Initialize the XSS exit bitmap. It is zero so there should be no XSAVES or XRSTORS exits. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-05 13:57:33 +01:00
Paolo Bonzini	404e0a19e1	KVM: cpuid: mask more bits in leaf 0xd and subleaves - EAX=0Dh, ECX=1: output registers EBX/ECX/EDX are reserved. - EAX=0Dh, ECX>1: output register ECX bit 0 is clear for all the CPUID leaves we support, because variable "supported" comes from XCR0 and not XSS. Bits above 0 are reserved, so ECX is overall zero. Output register EDX is reserved. Source: Intel Architecture Instruction Set Extensions Programming Reference, ref. number 319433-022 Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Tested-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-05 13:57:17 +01:00
Paolo Bonzini	412a3c411e	KVM: cpuid: set CPUID(EAX=0xd,ECX=1).EBX correctly This is the size of the XSAVES area. This starts providing guest support for XSAVES (with no support yet for supervisor states, i.e. XSS == 0 always in guests for now). Wanpeng Li suggested testing XSAVEC as well as XSAVES, since in practice no real processor exists that only has one of them, and there is no other way for userspace programs to compute the area of the XSAVEC save area. CPUID(EAX=0xd,ECX=1).EBX provides an upper bound. Suggested-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Tested-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-05 13:57:17 +01:00
Wanpeng Li	55412b2eda	kvm: x86: Add kvm_x86_ops hook that enables XSAVES for guest Expose the XSAVES feature to the guest if the kvm_x86_ops say it is available. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-05 13:57:16 +01:00
Paolo Bonzini	5c404cabd1	KVM: x86: use F() macro throughout cpuid.c For code that deals with cpuid, this makes things a bit more readable. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-05 13:57:15 +01:00
Paolo Bonzini	df1daba7d1	KVM: x86: support XSAVES usage in the host Userspace is expecting non-compacted format for KVM_GET_XSAVE, but struct xsave_struct might be using the compacted format. Convert in order to preserve userspace ABI. Likewise, userspace is passing non-compacted format for KVM_SET_XSAVE but the kernel will pass it to XRSTORS, and we need to convert back. Fixes: `f31a9f7c71` Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: stable@vger.kernel.org Cc: H. Peter Anvin <hpa@linux.intel.com> Tested-by: Nadav Amit <namit@cs.technion.ac.il> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-05 13:57:05 +01:00
Paolo Bonzini	ba7b39203a	x86: export get_xsave_addr get_xsave_addr is the API to access XSAVE states, and KVM would like to use it. Export it. Cc: stable@vger.kernel.org Cc: x86@kernel.org Cc: H. Peter Anvin <hpa@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-05 13:55:44 +01:00
Radim Krčmář	45c3094a64	KVM: x86: allow 256 logical x2APICs again While fixing an x2apic bug, `17d68b7` KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376) we've made only one cluster available. This means that the amount of logically addressible x2APICs was reduced to 16 and VCPUs kept overwriting themselves in that region, so even the first cluster wasn't set up correctly. This patch extends x2APIC support back to the logical_map's limit, and keeps the CVE fixed as messages for non-present APICs are dropped. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-04 15:29:08 +01:00
Radim Krčmář	25995e5b4a	KVM: x86: check bounds of APIC maps They can't be violated now, but play it safe for the future. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-04 15:29:08 +01:00
Radim Krčmář	fa834e9197	KVM: x86: fix APIC physical destination wrapping x2apic allows destinations > 0xff and we don't want them delivered to lower APICs. They are correctly handled by doing nothing. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-04 15:29:07 +01:00
Radim Krčmář	085563fb04	KVM: x86: deliver phys lowest-prio Physical mode can't address more than one APIC, but lowest-prio is allowed, so we just reuse our paths. SDM 10.6.2.1 Physical Destination: Also, for any non-broadcast IPI or I/O subsystem initiated interrupt with lowest priority delivery mode, software must ensure that APICs defined in the interrupt address are present and enabled to receive interrupts. We could warn on top of that. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-04 15:29:06 +01:00
Radim Krčmář	698f9755d9	KVM: x86: don't retry hopeless APIC delivery False from kvm_irq_delivery_to_apic_fast() means that we don't handle it in the fast path, but we still return false in cases that were perfectly handled, fix that. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-04 15:29:06 +01:00
Radim Krčmář	decdc28382	KVM: x86: use MSR_ICR instead of a number 0x830 MSR is 0x300 xAPIC MMIO, which is MSR_ICR. Signed-off-by: Radim KrÄmÃ¡Å™ <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-04 15:29:05 +01:00
Nadav Amit	c69d3d9bc1	KVM: x86: Fix reserved x2apic registers x2APIC has no registers for DFR and ICR2 (see Intel SDM 10.12.1.2 "x2APIC Register Address Space"). KVM needs to cause #GP on such accesses. Fix it (DFR and ICR2 on read, ICR2 on write, DFR already handled on writes). Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-04 15:29:05 +01:00
Nadav Amit	39f062ff51	KVM: x86: Generate #UD when memory operand is required Certain x86 instructions that use modrm operands only allow memory operand (i.e., mod012), and cause a #UD exception otherwise. KVM ignores this fact. Currently, the instructions that are such and are emulated by KVM are MOVBE, MOVNTPS, MOVNTPD and MOVNTI. MOVBE is the most blunt example, since it may be emulated by the host regardless of MMIO. The fix introduces a new group for handling such instructions, marking mod3 as illegal instruction. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-04 15:29:04 +01:00
Paolo Bonzini	2b4a273b42	kvm: x86: avoid warning about potential shift wrapping bug cs.base is declared as a __u64 variable and vector is a u32 so this causes a static checker warning. The user indeed can set "sipi_vector" to any u32 value in kvm_vcpu_ioctl_x86_set_vcpu_events(), but the value should really have 8-bit precision only. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-24 16:53:50 +01:00
Paolo Bonzini	c9eab58f64	KVM: x86: move device assignment out of kvm_host.h Create a new header, and hide the device assignment functions there. Move struct kvm_assigned_dev_kernel to assigned-dev.c by modifying arch/x86/kvm/iommu.c to take a PCI device struct. Based on a patch by Radim Krcmar <rkrcmark@redhat.com>. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-24 16:53:50 +01:00
Paolo Bonzini	b65d6e17fe	kvm: x86: mask out XSAVES This feature is not supported inside KVM guests yet, because we do not emulate MSR_IA32_XSS. Mask it out. Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-23 18:33:37 +01:00
Radim Krčmář	c274e03af7	kvm: x86: move assigned-dev.c and iommu.c to arch/x86/ Now that ia64 is gone, we can hide deprecated device assignment in x86. Notable changes: - kvm_vm_ioctl_assigned_device() was moved to x86/kvm_arch_vm_ioctl() The easy parts were removed from generic kvm code, remaining - kvm_iommu_(un)map_pages() would require new code to be moved - struct kvm_assigned_dev_kernel depends on struct kvm_irq_ack_notifier Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-23 18:33:36 +01:00
Radim Krcmar	3bf58e9ae8	kvm: remove CONFIG_X86 #ifdefs from files formerly shared with ia64 Signed-off-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-21 18:07:26 +01:00
Paolo Bonzini	6ef768fac9	kvm: x86: move ioapic.c and irq_comm.c back to arch/x86/ ia64 does not need them anymore. Ack notifiers become x86-specific too. Suggested-by: Gleb Natapov <gleb@kernel.org> Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-21 18:02:37 +01:00
Nicholas Krause	86619e7ba3	KVM: x86: Remove FIXMEs in emulate.c Remove FIXME comments about needing fault addresses to be returned. These are propaagated from walk_addr_generic to gva_to_gpa and from there to ops->read_std and ops->write_std. Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-19 18:54:43 +01:00
Paolo Bonzini	997b04128d	KVM: emulator: remove duplicated limit check The check on the higher limit of the segment, and the check on the maximum accessible size, is the same for both expand-up and expand-down segments. Only the computation of "lim" varies. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-19 18:40:24 +01:00
Paolo Bonzini	01485a2230	KVM: emulator: remove code duplication in register_address{,_increment} register_address has been a duplicate of address_mask ever since the ancestor of __linearize was born in `90de84f50b` (KVM: x86 emulator: preserve an operand's segment identity, 2010-11-17). However, we can put it to a better use by including the call to reg_read in register_address. Similarly, the call to reg_rmw can be moved to register_address_increment. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-19 18:27:27 +01:00
Nadav Amit	31ff64881b	KVM: x86: Move __linearize masking of la into switch In __linearize there is check of the condition whether to check if masking of the linear address is needed. It occurs immediately after switch that evaluates the same condition. Merge them. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-19 18:20:15 +01:00
Nadav Amit	abc7d8a4c9	KVM: x86: Non-canonical access using SS should cause #SS When SS is used using a non-canonical address, an #SS exception is generated on real hardware. KVM emulator causes a #GP instead. Fix it to behave as real x86 CPU. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-19 18:19:57 +01:00
Nadav Amit	d50eaa1803	KVM: x86: Perform limit checks when assigning EIP If branch (e.g., jmp, ret) causes limit violations, since the target IP > limit, the #GP exception occurs before the branch. In other words, the RIP pushed on the stack should be that of the branch and not that of the target. To do so, we can call __linearize, with new EIP, which also saves us the code which performs the canonical address checks. On the case of assigning an EIP >= 2^32 (when switching cs.l), we also safe, as __linearize will check the new EIP does not exceed the limit and would trigger #GP(0) otherwise. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-19 18:19:22 +01:00
Nadav Amit	a7315d2f3c	KVM: x86: Emulator performs privilege checks on __linearize When segment is accessed, real hardware does not perform any privilege level checks. In contrast, KVM emulator does. This causes some discrepencies from real hardware. For instance, reading from readable code segment may fail due to incorrect segment checks. In addition, it introduces unnecassary overhead. To reference Intel SDM 5.5 ("Privilege Levels"): "Privilege levels are checked when the segment selector of a segment descriptor is loaded into a segment register." The SDM never mentions privilege level checks during memory access, except for loading far pointers in section 5.10 ("Pointer Validation"). Those are actually segment selector loads and are emulated in the similarily (i.e., regardless to __linearize checks). This behavior was also checked using sysexit. A data-segment whose DPL=0 was loaded, and after sysexit (CPL=3) it is still accessible. Therefore, all the privilege level checks in __linearize are removed. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-19 18:17:58 +01:00
Nadav Amit	1c1c35ae4b	KVM: x86: Stack size is overridden by __linearize When performing segmented-read/write in the emulator for stack operations, it ignores the stack size, and uses the ad_bytes as indication for the pointer size. As a result, a wrong address may be accessed. To fix this behavior, we can remove the masking of address in __linearize and perform it beforehand. It is already done for the operands (so currently it is inefficiently done twice). It is missing in two cases: 1. When using rip_relative 2. On fetch_bit_operand that changes the address. This patch masks the address on these two occassions, and removes the masking from __linearize. Note that it does not mask EIP during fetch. In protected/legacy mode code fetch when RIP >= 2^32 should result in #GP and not wrap-around. Since we make limit checks within __linearize, this is the expected behavior. Partial revert of commit `518547b32a` (KVM: x86: Emulator does not calculate address correctly, 2014-09-30). Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-19 18:17:10 +01:00
Nadav Amit	7d882ffa81	KVM: x86: Revert NoBigReal patch in the emulator Commit 10e38fc7cab6 ("KVM: x86: Emulator flag for instruction that only support 16-bit addresses in real mode") introduced NoBigReal for instructions such as MONITOR. Apparetnly, the Intel SDM description that led to this patch is misleading. Since no instruction is using NoBigReal, it is safe to remove it, we fully understand what the SDM means. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-19 18:13:27 +01:00
Tiejun Chen	842bb26a40	kvm: x86: vmx: remove MMIO_MAX_GEN MMIO_MAX_GEN is the same as MMIO_GEN_MASK. Use only one. Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-18 11:12:18 +01:00
Tiejun Chen	81ed33e4aa	kvm: x86: vmx: cleanup handle_ept_violation Instead, just use PFERR_{FETCH, PRESENT, WRITE}_MASK inside handle_ept_violation() for slightly better code. Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-18 11:07:53 +01:00
Nadav Amit	f210f7572b	KVM: x86: Fix lost interrupt on irr_pending race apic_find_highest_irr assumes irr_pending is set if any vector in APIC_IRR is set. If this assumption is broken and apicv is disabled, the injection of interrupts may be deferred until another interrupt is delivered to the guest. Ultimately, if no other interrupt should be injected to that vCPU, the pending interrupt may be lost. commit `56cc2406d6` ("KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use") changed the behavior of apic_clear_irr so irr_pending is cleared after setting APIC_IRR vector. After this commit, if apic_set_irr and apic_clear_irr run simultaneously, a race may occur, resulting in APIC_IRR vector set, and irr_pending cleared. In the following example, assume a single vector is set in IRR prior to calling apic_clear_irr: apic_set_irr apic_clear_irr ------------ -------------- apic->irr_pending = true; apic_clear_vector(...); vec = apic_search_irr(apic); // => vec == -1 apic_set_vector(...); apic->irr_pending = (vec != -1); // => apic->irr_pending == false Nonetheless, it appears the race might even occur prior to this commit: apic_set_irr apic_clear_irr ------------ -------------- apic->irr_pending = true; apic->irr_pending = false; apic_clear_vector(...); if (apic_search_irr(apic) != -1) apic->irr_pending = true; // => apic->irr_pending == false apic_set_vector(...); Fixing this issue by: 1. Restoring the previous behavior of apic_clear_irr: clear irr_pending, call apic_clear_vector, and then if APIC_IRR is non-zero, set irr_pending. 2. On apic_set_irr: first call apic_set_vector, then set irr_pending. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-17 12:16:20 +01:00
Paolo Bonzini	a3e339e1ce	KVM: compute correct map even if all APICs are software disabled Logical destination mode can be used to send NMI IPIs even when all APICs are software disabled, so if all APICs are software disabled we should still look at the DFRs. So the DFRs should all be the same, even if some or all APICs are software disabled. However, the SDM does not say this, so tweak the logic as follows: - if one APIC is enabled and has LDR != 0, use that one to build the map. This picks the right DFR in case an OS is only setting it for the software-enabled APICs, or in case an OS is using logical addressing on some APICs while leaving the rest in reset state (using LDR was suggested by Radim). - if all APICs are disabled, pick a random one to build the map. We use the last one with LDR != 0 for simplicity. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-17 12:16:19 +01:00
Nadav Amit	173beedc16	KVM: x86: Software disabled APIC should still deliver NMIs Currently, the APIC logical map does not consider VCPUs whose local-apic is software-disabled. However, NMIs, INIT, etc. should still be delivered to such VCPUs. Therefore, the APIC mode should first be determined, and then the map, considering all VCPUs should be constructed. To address this issue, first find the APIC mode, and only then construct the logical map. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-17 12:16:19 +01:00
Igor Mammedov	1d4e7e3c0b	kvm: x86: increase user memory slots to 509 With the 3 private slots, this gives us 512 slots total. Motivation for this is in addition to assigned devices support more memory hotplug slots, where 1 slot is used by a hotplugged memory stick. It will allow to support upto 256 hotplug memory slots and leave 253 slots for assigned devices and other devices that use them. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-14 10:02:40 +01:00
Chris J Arges	d913b90435	kvm: svm: move WARN_ON in svm_adjust_tsc_offset When running the tsc_adjust kvm-unit-test on an AMD processor with the IA32_TSC_ADJUST feature enabled, the WARN_ON in svm_adjust_tsc_offset can be triggered. This WARN_ON checks for a negative adjustment in case __scale_tsc is called; however it may trigger unnecessary warnings. This patch moves the WARN_ON to trigger only if __scale_tsc will actually be called from svm_adjust_tsc_offset. In addition make adj in kvm_set_msr_common s64 since this can have signed values. Signed-off-by: Chris J Arges <chris.j.arges@canonical.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-13 11:56:11 +01:00
Andy Lutomirski	54b98bff8e	x86, kvm, vmx: Don't set LOAD_IA32_EFER when host and guest match There's nothing to switch if the host and guest values are the same. I am unable to find evidence that this makes any difference whatsoever. Signed-off-by: Andy Lutomirski <luto@amacapital.net> [I could see a difference on Nehalem. From 5 runs: userspace exit, guest!=host 12200 11772 12130 12164 12327 userspace exit, guest=host 11983 11780 11920 11919 12040 lightweight exit, guest!=host 3214 3220 3238 3218 3337 lightweight exit, guest=host 3178 3193 3193 3187 3220 This passes the t-test with 99% confidence for userspace exit, 98.5% confidence for lightweight exit. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-12 16:27:21 +01:00
Andy Lutomirski	f6577a5fa1	x86, kvm, vmx: Always use LOAD_IA32_EFER if available At least on Sandy Bridge, letting the CPU switch IA32_EFER is much faster than switching it manually. I benchmarked this using the vmexit kvm-unit-test (single run, but GOAL multiplied by 5 to do more iterations): Test Before After Change cpuid 2000 1932 -3.40% vmcall 1914 1817 -5.07% mov_from_cr8 13 13 0.00% mov_to_cr8 19 19 0.00% inl_from_pmtimer 19164 10619 -44.59% inl_from_qemu 15662 10302 -34.22% inl_from_kernel 3916 3802 -2.91% outl_to_kernel 2230 2194 -1.61% mov_dr 172 176 2.33% ipi (skipped) (skipped) ipi+halt (skipped) (skipped) ple-round-robin 13 13 0.00% wr_tsc_adjust_msr 1920 1845 -3.91% rd_tsc_adjust_msr 1892 1814 -4.12% mmio-no-eventfd:pci-mem 16394 11165 -31.90% mmio-wildcard-eventfd:pci-mem 4607 4645 0.82% mmio-datamatch-eventfd:pci-mem 4601 4610 0.20% portio-no-eventfd:pci-io 11507 7942 -30.98% portio-wildcard-eventfd:pci-io 2239 2225 -0.63% portio-datamatch-eventfd:pci-io 2250 2234 -0.71% I haven't explicitly computed the significance of these numbers, but this isn't subtle. Signed-off-by: Andy Lutomirski <luto@amacapital.net> [The results were reproducible on all of Nehalem, Sandy Bridge and Ivy Bridge. The slowness of manual switching is because writing to EFER with WRMSR triggers a TLB flush, even if the only bit you're touching is SCE (so the page table format is not affected). Doing the write as part of vmentry/vmexit, instead, does not flush the TLB, probably because all processors that have EPT also have VPID. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-12 12:35:02 +01:00
Paolo Bonzini	ac146235d4	KVM: x86: fix warning on 32-bit compilation PCIDs are only supported in 64-bit mode. No need to clear bit 63 of CR3 unless the host is 64-bit. Reported by Fengguang Wu's autobuilder. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-10 13:53:25 +01:00
David Matlack	ce1a5e60a6	kvm: x86: add trace event for pvclock updates The new trace event records: * the id of vcpu being updated * the pvclock_vcpu_time_info struct being written to guest memory This is useful for debugging pvclock bugs, such as the bug fixed by "[PATCH] kvm: x86: Fix kvm clock versioning.". Signed-off-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-08 08:20:55 +01:00
Owen Hofmann	09a0c3f110	kvm: x86: Fix kvm clock versioning. kvm updates the version number for the guest paravirt clock structure by incrementing the version of its private copy. It does not read the guest version, so will write version = 2 in the first update for every new VM, including after restoring a saved state. If guest state is saved during reading the clock, it could read and accept struct fields and guest TSC from two different updates. This changes the code to increment the guest version and write it back. Signed-off-by: Owen Hofmann <osh@google.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-08 08:20:54 +01:00
Nadav Amit	ed9aad215f	KVM: x86: MOVNTI emulation min opsize is not respected Commit `3b32004a66` ("KVM: x86: movnti minimum op size of 32-bit is not kept") did not fully fix the minimum operand size of MONTI emulation. Still, MOVNTI may be mistakenly performed using 16-bit opsize. This patch add No16 flag to mark an instruction does not support 16-bits operand size. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-08 08:20:54 +01:00
Marcelo Tosatti	7f187922dd	KVM: x86: update masterclock values on TSC writes When the guest writes to the TSC, the masterclock TSC copy must be updated as well along with the TSC_OFFSET update, otherwise a negative tsc_timestamp is calculated at kvm_guest_time_update. Once "if (!vcpus_matched && ka->use_master_clock)" is simplified to "if (ka->use_master_clock)", the corresponding "if (!ka->use_master_clock)" becomes redundant, so remove the do_request boolean and collapse everything into a single condition. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-08 08:20:53 +01:00
Nadav Amit	b2c9d43e6c	KVM: x86: Return UNHANDLABLE on unsupported SYSENTER Now that KVM injects #UD on "unhandlable" error, it makes better sense to return such error on sysenter instead of directly injecting #UD to the guest. This allows to track more easily the unhandlable cases the emulator does not support. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-08 08:20:52 +01:00
Nadav Amit	db324fe6f2	KVM: x86: Warn on APIC base relocation APIC base relocation is unsupported by KVM. If anyone uses it, the least should be to report a warning in the hypervisor. Note that KVM-unit-tests uses this feature for some reason, so running the tests triggers the warning. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-08 08:20:51 +01:00
Nadav Amit	d14cb5df59	KVM: x86: Emulator mis-decodes VEX instructions on real-mode Commit `7fe864dc94` (KVM: x86: Mark VEX-prefix instructions emulation as unimplemented, 2014-06-02) marked VEX instructions as such in protected mode. VEX-prefix instructions are not supported relevant on real-mode and VM86, but should cause #UD instead of being decoded as LES/LDS. Fix this behaviour to be consistent with real hardware. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> [Check for mod == 3, rather than 2 or 3. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-08 08:20:10 +01:00
Nadav Amit	2c2ca2d12f	KVM: x86: Remove redundant and incorrect cpl check on task-switch Task-switch emulation checks the privilege level prior to performing the task-switch. This check is incorrect in the case of task-gates, in which the tss.dpl is ignored, and can cause superfluous exceptions. Moreover this check is unnecassary, since the CPU checks the privilege levels prior to exiting. Intel SDM 25.4.2 says "If CALL or JMP accesses a TSS descriptor directly outside IA-32e mode, privilege levels are checked on the TSS descriptor" prior to exiting. AMD 15.14.1 says "The intercept is checked before the task switch takes place but after the incoming TSS and task gate (if one was involved) have been checked for correctness." This patch removes the CPL checks for CALL and JMP. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-07 15:44:10 +01:00

1 2 3 4 5 ...

20209 Commits