linux_old1

Commit Graph

Author	SHA1	Message	Date
Nadav Har'El	27fc51b21c	KVM: nVMX: Fix nested VMX TSC emulation This patch fixes two corner cases in nested (L2) handling of TSC-related issues: 1. Somewhat suprisingly, according to the Intel spec, if L1 allows WRMSR to the TSC MSR without an exit, then this should set L1's TSC value itself - not offset by vmcs12.TSC_OFFSET (like was wrongly done in the previous code). 2. Allow L1 to disable the TSC_OFFSETING control, and then correctly ignore the vmcs12.TSC_OFFSET. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-09-25 19:18:02 +03:00
Nadav Har'El	d5c1785d2f	KVM: L1 TSC handling KVM assumed in several places that reading the TSC MSR returns the value for L1. This is incorrect, because when L2 is running, the correct TSC read exit emulation is to return L2's value. We therefore add a new x86_ops function, read_l1_tsc, to use in places that specifically need to read the L1 TSC, NOT the TSC of the current level of guest. Note that one change, of one line in kvm_arch_vcpu_load, is made redundant by a different patch sent by Zachary Amsden (and not yet applied): kvm_arch_vcpu_load() should not read the guest TSC, and if it didn't, of course we didn't have to change the call of kvm_get_msr() to read_l1_tsc(). [avi: moved callback to kvm_x86_ops tsc block] Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Acked-by: Zachary Amsdem <zamsden@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-09-25 19:18:02 +03:00
Julia Lawall	cf3ace79c0	KVM: VMX: trivial: use BUG_ON Use BUG_ON(x) rather than if(x) BUG(); The semantic patch that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier x; @@ -if (x) BUG(); +BUG_ON(x); @@ identifier x; @@ -if (!x) BUG(); +BUG_ON(!x); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-09-25 19:18:01 +03:00
Stefan Hajnoczi	0d460ffc09	KVM: Use __print_symbolic() for vmexit tracepoints The vmexit tracepoints format the exit_reason to make it human-readable. Since the exit_reason depends on the instruction set (vmx or svm), formatting is handled with ftrace_print_symbols_seq() by referring to the appropriate exit reason table. However, the ftrace_print_symbols_seq() function is not meant to be used directly in tracepoints since it does not export the formatting table which userspace tools like trace-cmd and perf use to format traces. In practice perf dies when formatting vmexit-related events and trace-cmd falls back to printing the numeric value (with extra formatting code in the kvm plugin to paper over this limitation). Other userspace consumers of vmexit-related tracepoints would be in similar trouble. To avoid significant changes to the kvm_exit tracepoint, this patch moves the vmx and svm exit reason tables into arch/x86/kvm/trace.h and selects the right table with __print_symbolic() depending on the instruction set. Note that __print_symbolic() is designed for exporting the formatting table to userspace and allows trace-cmd and perf to work. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-09-25 19:17:59 +03:00
Xiao Guangrong	ce88decffd	KVM: MMU: mmio page fault support The idea is from Avi: \| We could cache the result of a miss in an spte by using a reserved bit, and \| checking the page fault error code (or seeing if we get an ept violation or \| ept misconfiguration), so if we get repeated mmio on a page, we don't need to \| search the slot list/tree. \| (https://lkml.org/lkml/2011/2/22/221) When the page fault is caused by mmio, we cache the info in the shadow page table, and also set the reserved bits in the shadow page table, so if the mmio is caused again, we can quickly identify it and emulate it directly Searching mmio gfn in memslots is heavy since we need to walk all memeslots, it can be reduced by this feature, and also avoid walking guest page table for soft mmu. [jan: fix operator precedence issue] Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-24 11:50:40 +03:00
Xiao Guangrong	c37079586f	KVM: MMU: remove bypass_guest_pf The idea is from Avi: \| Maybe it's time to kill off bypass_guest_pf=1. It's not as effective as \| it used to be, since unsync pages always use shadow_trap_nonpresent_pte, \| and since we convert between the two nonpresent_ptes during sync and unsync. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-24 11:50:33 +03:00
Nadav Har'El	509c75ea19	KVM: nVMX: Fix bug preventing more than two levels of nesting The nested VMX feature is supposed to fully emulate VMX for the guest. This (theoretically) not only allows it to run its own guests, but also also to further emulate VMX for its own guests, and allow arbitrarily deep nesting. This patch fixes a bug (discovered by Kevin Tian) in handling a VMLAUNCH by L2, which prevented deeper nesting. Deeper nesting now works (I only actually tested L3), but is currently absurdly slow, to the point of being unusable. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 13:16:11 +03:00
Jan Kiszka	2e4ce7f574	KVM: VMX: Silence warning on 32-bit hosts a is unused now on CONFIG_X86_32. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 13:16:08 +03:00
Nadav Har'El	2844d84905	KVM: nVMX: Miscellenous small corrections Small corrections of KVM (spelling, etc.) not directly related to nested VMX. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:19 +03:00
Nadav Har'El	7b8050f570	KVM: nVMX: Add VMX to list of supported cpuid features If the "nested" module option is enabled, add the "VMX" CPU feature to the list of CPU features KVM advertises with the KVM_GET_SUPPORTED_CPUID ioctl. Qemu uses this ioctl, and intersects KVM's list with its own list of desired cpu features (depending on the -cpu option given to qemu) to determine the final list of features presented to the guest. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:19 +03:00
Nadav Har'El	7991825b85	KVM: nVMX: Additional TSC-offset handling In the unlikely case that L1 does not capture MSR_IA32_TSC, L0 needs to emulate this MSR write by L2 by modifying vmcs02.tsc_offset. We also need to set vmcs12.tsc_offset, for this change to survive the next nested entry (see prepare_vmcs02()). Additionally, we also need to modify vmx_adjust_tsc_offset: The semantics of this function is that the TSC of all guests on this vcpu, L1 and possibly several L2s, need to be adjusted. To do this, we need to adjust vmcs01's tsc_offset (this offset will also apply to each L2s we enter). We can't set vmcs01 now, so we have to remember this adjustment and apply it when we later exit to L1. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:19 +03:00
Nadav Har'El	36cf24e01e	KVM: nVMX: Further fixes for lazy FPU loading KVM's "Lazy FPU loading" means that sometimes L0 needs to set CR0.TS, even if a guest didn't set it. Moreover, L0 must also trap CR0.TS changes and NM exceptions, even if we have a guest hypervisor (L1) who didn't want these traps. And of course, conversely: If L1 wanted to trap these events, we must let it, even if L0 is not interested in them. This patch fixes some existing KVM code (in update_exception_bitmap(), vmx_fpu_activate(), vmx_fpu_deactivate()) to do the correct merging of L0's and L1's needs. Note that handle_cr() was already fixed in the above patch, and that new code in introduced in previous patches already handles CR0 correctly (see prepare_vmcs02(), prepare_vmcs12(), and nested_vmx_vmexit()). Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:18 +03:00
Nadav Har'El	eeadf9e755	KVM: nVMX: Handling of CR0 and CR4 modifying instructions When L2 tries to modify CR0 or CR4 (with mov or clts), and modifies a bit which L1 asked to shadow (via CR[04]_GUEST_HOST_MASK), we already do the right thing: we let L1 handle the trap (see nested_vmx_exit_handled_cr() in a previous patch). When L2 modifies bits that L1 doesn't care about, we let it think (via CR[04]_READ_SHADOW) that it did these modifications, while only changing (in GUEST_CR[04]) the bits that L0 doesn't shadow. This is needed for corect handling of CR0.TS for lazy FPU loading: L0 may want to leave TS on, while pretending to allow the guest to change it. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:18 +03:00
Nadav Har'El	66c78ae40c	KVM: nVMX: Correct handling of idt vectoring info This patch adds correct handling of IDT_VECTORING_INFO_FIELD for the nested case. When a guest exits while delivering an interrupt or exception, we get this information in IDT_VECTORING_INFO_FIELD in the VMCS. When L2 exits to L1, there's nothing we need to do, because L1 will see this field in vmcs12, and handle it itself. However, when L2 exits and L0 handles the exit itself and plans to return to L2, L0 must inject this event to L2. In the normal non-nested case, the idt_vectoring_info case is discovered after the exit, and the decision to inject (though not the injection itself) is made at that point. However, in the nested case a decision of whether to return to L2 or L1 also happens during the injection phase (see the previous patches), so in the nested case we can only decide what to do about the idt_vectoring_info right after the injection, i.e., in the beginning of vmx_vcpu_run, which is the first time we know for sure if we're staying in L2. Therefore, when we exit L2 (is_guest_mode(vcpu)), we disable the regular vmx_complete_interrupts() code which queues the idt_vectoring_info for injection on next entry - because such injection would not be appropriate if we will decide to exit to L1. Rather, we just save the idt_vectoring_info and related fields in vmcs12 (which is a convenient place to save these fields). On the next entry in vmx_vcpu_run (after the injection phase, potentially exiting to L1 to inject an event requested by user space), if we find ourselves in L1 we don't need to do anything with those values we saved (as explained above). But if we find that we're in L2, or rather still at L2 (it's not nested_run_pending, meaning that this is the first round of L2 running after L1 having just launched it), we need to inject the event saved in those fields - by writing the appropriate VMCS fields. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:18 +03:00
Nadav Har'El	0b6ac343fc	KVM: nVMX: Correct handling of exception injection Similar to the previous patch, but concerning injection of exceptions rather than external interrupts. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:17 +03:00
Nadav Har'El	b6f1250edb	KVM: nVMX: Correct handling of interrupt injection The code in this patch correctly emulates external-interrupt injection while a nested guest L2 is running. Because of this code's relative un-obviousness, I include here a longer-than- usual justification for what it does - much longer than the code itself ;-) To understand how to correctly emulate interrupt injection while L2 is running, let's look first at what we need to emulate: How would things look like if the extra L0 hypervisor layer is removed, and instead of L0 injecting an interrupt, we had hardware delivering an interrupt? Now we have L1 running on bare metal with a guest L2, and the hardware generates an interrupt. Assuming that L1 set PIN_BASED_EXT_INTR_MASK to 1, and VM_EXIT_ACK_INTR_ON_EXIT to 0 (we'll revisit these assumptions below), what happens now is this: The processor exits from L2 to L1, with an external- interrupt exit reason but without an interrupt vector. L1 runs, with interrupts disabled, and it doesn't yet know what the interrupt was. Soon after, it enables interrupts and only at that moment, it gets the interrupt from the processor. when L1 is KVM, Linux handles this interrupt. Now we need exactly the same thing to happen when that L1->L2 system runs on top of L0, instead of real hardware. This is how we do this: When L0 wants to inject an interrupt, it needs to exit from L2 to L1, with external-interrupt exit reason (with an invalid interrupt vector), and run L1. Just like in the bare metal case, it likely can't deliver the interrupt to L1 now because L1 is running with interrupts disabled, in which case it turns on the interrupt window when running L1 after the exit. L1 will soon enable interrupts, and at that point L0 will gain control again and inject the interrupt to L1. Finally, there is an extra complication in the code: when nested_run_pending, we cannot return to L1 now, and must launch L2. We need to remember the interrupt we wanted to inject (and not clear it now), and do it on the next exit. The above explanation shows that the relative strangeness of the nested interrupt injection code in this patch, and the extra interrupt-window exit incurred, are in fact necessary for accurate emulation, and are not just an unoptimized implementation. Let's revisit now the two assumptions made above: If L1 turns off PIN_BASED_EXT_INTR_MASK (no hypervisor that I know does, by the way), things are simple: L0 may inject the interrupt directly to the L2 guest - using the normal code path that injects to any guest. We support this case in the code below. If L1 turns on VM_EXIT_ACK_INTR_ON_EXIT, things look very different from the description above: L1 expects to see an exit from L2 with the interrupt vector already filled in the exit information, and does not expect to be interrupted again with this interrupt. The current code does not (yet) support this case, so we do not allow the VM_EXIT_ACK_INTR_ON_EXIT exit-control to be turned on by L1. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:17 +03:00
Nadav Har'El	644d711aa0	KVM: nVMX: Deciding if L0 or L1 should handle an L2 exit This patch contains the logic of whether an L2 exit should be handled by L0 and then L2 should be resumed, or whether L1 should be run to handle this exit (using the nested_vmx_vmexit() function of the previous patch). The basic idea is to let L1 handle the exit only if it actually asked to trap this sort of event. For example, when L2 exits on a change to CR0, we check L1's CR0_GUEST_HOST_MASK to see if L1 expressed interest in any bit which changed; If it did, we exit to L1. But if it didn't it means that it is we (L0) that wished to trap this event, so we handle it ourselves. The next two patches add additional logic of what to do when an interrupt or exception is injected: Does L0 need to do it, should we exit to L1 to do it, or should we resume L2 and keep the exception to be injected later. We keep a new flag, "nested_run_pending", which can override the decision of which should run next, L1 or L2. nested_run_pending=1 means that we must run L2 next, not L1. This is necessary in particular when L1 did a VMLAUNCH of L2 and therefore expects L2 to be run (and perhaps be injected with an event it specified, etc.). Nested_run_pending is especially intended to avoid switching to L1 in the injection decision-point described above. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:16 +03:00
Nadav Har'El	7c1779384a	KVM: nVMX: vmcs12 checks on nested entry This patch adds a bunch of tests of the validity of the vmcs12 fields, according to what the VMX spec and our implementation allows. If fields we cannot (or don't want to) honor are discovered, an entry failure is emulated. According to the spec, there are two types of entry failures: If the problem was in vmcs12's host state or control fields, the VMLAUNCH instruction simply fails. But a problem is found in the guest state, the behavior is more similar to that of an exit. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:16 +03:00
Nadav Har'El	4704d0befb	KVM: nVMX: Exiting from L2 to L1 This patch implements nested_vmx_vmexit(), called when the nested L2 guest exits and we want to run its L1 parent and let it handle this exit. Note that this will not necessarily be called on every L2 exit. L0 may decide to handle a particular exit on its own, without L1's involvement; In that case, L0 will handle the exit, and resume running L2, without running L1 and without calling nested_vmx_vmexit(). The logic for deciding whether to handle a particular exit in L1 or in L0, i.e., whether to call nested_vmx_vmexit(), will appear in a separate patch below. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:16 +03:00
Nadav Har'El	99e65e805d	KVM: nVMX: No need for handle_vmx_insn function any more Before nested VMX support, the exit handler for a guest executing a VMX instruction (vmclear, vmlaunch, vmptrld, vmptrst, vmread, vmread, vmresume, vmwrite, vmon, vmoff), was handle_vmx_insn(). This handler simply threw a #UD exception. Now that all these exit reasons are properly handled (and emulate the respective VMX instruction), nothing calls this dummy handler and it can be removed. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:15 +03:00
Nadav Har'El	cd232ad02f	KVM: nVMX: Implement VMLAUNCH and VMRESUME Implement the VMLAUNCH and VMRESUME instructions, allowing a guest hypervisor to run its own guests. This patch does not include some of the necessary validity checks on vmcs12 fields before the entry. These will appear in a separate patch below. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:15 +03:00
Nadav Har'El	fe3ef05c75	KVM: nVMX: Prepare vmcs02 from vmcs01 and vmcs12 This patch contains code to prepare the VMCS which can be used to actually run the L2 guest, vmcs02. prepare_vmcs02 appropriately merges the information in vmcs12 (the vmcs that L1 built for L2) and in vmcs01 (our desires for our own guests). Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:14 +03:00
Nadav Har'El	bf8179a011	KVM: nVMX: Move control field setup to functions Move some of the control field setup to common functions. These functions will also be needed for running L2 guests - L0's desires (expressed in these functions) will be appropriately merged with L1's desires. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:14 +03:00
Nadav Har'El	a3a8ff8ebf	KVM: nVMX: Move host-state field setup to a function Move the setting of constant host-state fields (fields that do not change throughout the life of the guest) from vmx_vcpu_setup to a new common function vmx_set_constant_host_state(). This function will also be used to set the host state when running L2 guests. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:14 +03:00
Nadav Har'El	49f705c532	KVM: nVMX: Implement VMREAD and VMWRITE Implement the VMREAD and VMWRITE instructions. With these instructions, L1 can read and write to the VMCS it is holding. The values are read or written to the fields of the vmcs12 structure introduced in a previous patch. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:14 +03:00
Nadav Har'El	6a4d755060	KVM: nVMX: Implement VMPTRST This patch implements the VMPTRST instruction. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:13 +03:00
Nadav Har'El	63846663ea	KVM: nVMX: Implement VMPTRLD This patch implements the VMPTRLD instruction. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:12 +03:00
Nadav Har'El	27d6c86521	KVM: nVMX: Implement VMCLEAR This patch implements the VMCLEAR instruction. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:12 +03:00
Nadav Har'El	0140caea3b	KVM: nVMX: Success/failure of VMX instructions. VMX instructions specify success or failure by setting certain RFLAGS bits. This patch contains common functions to do this, and they will be used in the following patches which emulate the various VMX instructions. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:12 +03:00
Nadav Har'El	22bd035868	KVM: nVMX: Add VMCS fields to the vmcs12 In this patch we add to vmcs12 (the VMCS that L1 keeps for L2) all the standard VMCS fields. Later patches will enable L1 to read and write these fields using VMREAD/ VMWRITE, and they will be used during a VMLAUNCH/VMRESUME in preparing vmcs02, a hardware VMCS for running L2. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:11 +03:00
Nadav Har'El	ff2f6fe961	KVM: nVMX: Introduce vmcs02: VMCS used to run L2 We saw in a previous patch that L1 controls its L2 guest with a vcms12. L0 needs to create a real VMCS for running L2. We call that "vmcs02". A later patch will contain the code, prepare_vmcs02(), for filling the vmcs02 fields. This patch only contains code for allocating vmcs02. In this version, prepare_vmcs02() sets all of vmcs02's fields each time we enter from L1 to L2, so keeping just one vmcs02 for the vcpu is enough: It can be reused even when L1 runs multiple L2 guests. However, in future versions we'll probably want to add an optimization where vmcs02 fields that rarely change will not be set each time. For that, we may want to keep around several vmcs02s of L2 guests that have recently run, so that potentially we could run these L2s again more quickly because less vmwrites to vmcs02 will be needed. This patch adds to each vcpu a vmcs02 pool, vmx->nested.vmcs02_pool, which remembers the vmcs02s last used to run up to VMCS02_POOL_SIZE L2s. As explained above, in the current version we choose VMCS02_POOL_SIZE=1, I.e., one vmcs02 is allocated (and loaded onto the processor), and it is reused to enter any L2 guest. In the future, when prepare_vmcs02() is optimized not to set all fields every time, VMCS02_POOL_SIZE should be increased. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:11 +03:00
Nadav Har'El	064aea7747	KVM: nVMX: Decoding memory operands of VMX instructions This patch includes a utility function for decoding pointer operands of VMX instructions issued by L1 (a guest hypervisor) Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:11 +03:00
Nadav Har'El	b87a51ae28	KVM: nVMX: Implement reading and writing of VMX MSRs When the guest can use VMX instructions (when the "nested" module option is on), it should also be able to read and write VMX MSRs, e.g., to query about VMX capabilities. This patch adds this support. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:11 +03:00
Nadav Har'El	a9d30f33dd	KVM: nVMX: Introduce vmcs12: a VMCS structure for L1 An implementation of VMX needs to define a VMCS structure. This structure is kept in guest memory, but is opaque to the guest (who can only read or write it with VMX instructions). This patch starts to define the VMCS structure which our nested VMX implementation will present to L1. We call it "vmcs12", as it is the VMCS that L1 keeps for its L2 guest. We will add more content to this structure in later patches. This patch also adds the notion (as required by the VMX spec) of L1's "current VMCS", and finally includes utility functions for mapping the guest-allocated VMCSs in host memory. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:10 +03:00
Nadav Har'El	5e1746d620	KVM: nVMX: Allow setting the VMXE bit in CR4 This patch allows the guest to enable the VMXE bit in CR4, which is a prerequisite to running VMXON. Whether to allow setting the VMXE bit now depends on the architecture (svm or vmx), so its checking has moved to kvm_x86_ops->set_cr4(). This function now returns an int: If kvm_x86_ops->set_cr4() returns 1, __kvm_set_cr4() will also return 1, and this will cause kvm_set_cr4() will throw a #GP. Turning on the VMXE bit is allowed only when the nested VMX feature is enabled, and turning it off is forbidden after a vmxon. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:10 +03:00
Nadav Har'El	ec378aeef9	KVM: nVMX: Implement VMXON and VMXOFF This patch allows a guest to use the VMXON and VMXOFF instructions, and emulates them accordingly. Basically this amounts to checking some prerequisites, and then remembering whether the guest has enabled or disabled VMX operation. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:09 +03:00
Nadav Har'El	801d342432	KVM: nVMX: Add "nested" module option to kvm_intel This patch adds to kvm_intel a module option "nested". This option controls whether the guest can use VMX instructions, i.e., whether we allow nested virtualization. A similar, but separate, option already exists for the SVM module. This option currently defaults to 0, meaning that nested VMX must be explicitly enabled by giving nested=1. When nested VMX matures, the default should probably be changed to enable nested VMX by default - just like nested SVM is currently enabled by default. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:09 +03:00
Nadav Har'El	d462b81923	KVM: VMX: Keep list of loaded VMCSs, instead of vcpus In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it because (at least in theory) the processor might not have written all of its content back to memory. Since a patch from June 26, 2008, this is done using a per-cpu "vcpus_on_cpu" linked list of vcpus loaded on each CPU. The problem is that with nested VMX, we no longer have the concept of a vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, a pool for L2s), and each of those may be have been last loaded on a different cpu. So instead of linking the vcpus, we link the VMCSs, using a new structure loaded_vmcs. This structure contains the VMCS, and the information pertaining to its loading on a specific cpu (namely, the cpu number, and whether it was already launched on this cpu once). In nested we will also use the same structure to hold L2 VMCSs, and vmx->loaded_vmcs is a pointer to the currently active VMCS. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Acked-by: Acked-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 11:45:08 +03:00
Avi Kivity	96304217a7	KVM: VMX: always_inline VMREADs vmcs_readl() and friends are really short, but gcc thinks they are long because of the out-of-line exception handlers. Mark them always_inline to clear the misunderstanding. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:01 +03:00
Avi Kivity	5e520e6278	KVM: VMX: Move VMREAD cleanup to exception handler We clean up a failed VMREAD by clearing the output register. Do it in the exception handler instead of unconditionally. This is worthwhile since there are more than a hundred call sites. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:00 +03:00
Marcelo Tosatti	5233dd51ec	KVM: VMX: do not overwrite uptodate vcpu->arch.cr3 on KVM_SET_SREGS Only decache guest CR3 value if vcpu->arch.cr3 is stale. Fixes loadvm with live guest. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Tested-by: Markus Schade <markus.schade@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-06-19 19:23:13 +03:00
Avi Kivity	2fb92db1ec	KVM: VMX: Cache vmcs segment fields Since the emulator now checks segment limits and access rights, it generates a lot more accesses to the vmcs segment fields. Undo some of the performance hit by cacheing those fields in a read-only cache (the entire cache is invalidated on any write, or on guest exit). Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:47:45 -04:00
Avi Kivity	0a434bb2bf	KVM: VMX: Avoid reading %rip unnecessarily when handling exceptions Avoids a VMREAD. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:40:01 -04:00
Serge E. Hallyn	71f9833bb1	KVM: fix push of wrong eip when doing softint When doing a soft int, we need to bump eip before pushing it to the stack. Otherwise we'll do the int a second time. [apw@canonical.com: merged eip update as per Jan's recommendation.] Signed-off-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:09 -04:00
Jan Kiszka	be6d05cfdf	KVM: VMX: Ensure that vmx_create_vcpu always returns proper error In case certain allocations fail, vmx_create_vcpu may return 0 as error instead of a negative value encoded via ERR_PTR. This causes a NULL pointer dereferencing later on in kvm_vm_ioctl_vcpu_create. Reported-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-05-11 07:57:08 -04:00
Joerg Roedel	857e40999e	KVM: X86: Delegate tsc-offset calculation to architecture code With TSC scaling in SVM the tsc-offset needs to be calculated differently. This patch propagates this calculation into the architecture specific modules so that this complexity can be handled there. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:05 -04:00
Joerg Roedel	4051b18801	KVM: X86: Implement call-back to propagate virtual_tsc_khz This patch implements a call-back into the architecture code to allow the propagation of changes to the virtual tsc_khz of the vcpu. On SVM it updates the tsc_ratio variable, on VMX it does nothing. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:05 -04:00
Joerg Roedel	8a76d7f25f	KVM: x86: Add x86 callback for intercept check This patch adds a callback into kvm_x86_ops so that svm and vmx code can do intercept checks on emulated instructions. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:01 -04:00
Avi Kivity	654f06fc65	KVM: VMX: simplify NMI mask management Use vmx_set_nmi_mask() instead of open-coding management of the hardware bit and the software hint (nmi_known_unmasked). There's a slight change of behaviour when running without hardware virtual NMI support - we now clear the NMI mask if NMI delivery faulted in that case as well. This improves emulation accuracy. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:57 -04:00
Avi Kivity	8878647585	KVM: VMX: Use cached VM_EXIT_INTR_INFO in handle_exception vmx_complete_atomic_exit() cached it for us, so we can use it here. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:56 -04:00
Avi Kivity	c5ca8e572c	KVM: VMX: Don't VMREAD VM_EXIT_INTR_INFO unconditionally Only read it if we're going to use it later. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:56 -04:00
Avi Kivity	00eba012d5	KVM: VMX: Refactor vmx_complete_atomic_exit() Move the exit reason checks to the front of the function, for early exit in the common case. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:56 -04:00
Avi Kivity	f9902069c4	KVM: VMX: Qualify check for host NMI Check for the exit reason first; this allows us, later, to avoid a VMREAD for VM_EXIT_INTR_INFO_FIELD. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:56 -04:00
Avi Kivity	9d58b93192	KVM: VMX: Avoid vmx_recover_nmi_blocking() when unneeded When we haven't injected an interrupt, we don't need to recover the nmi blocking state (since the guest can't set it by itself). This allows us to avoid a VMREAD later on. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:56 -04:00
Avi Kivity	69c7302890	KVM: VMX: Cache cpl We may read the cpl quite often in the same vmexit (instruction privilege check, memory access checks for instruction and operands), so we gain a bit if we cache the value. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:54 -04:00
Avi Kivity	f4c63e5d5a	KVM: VMX: Optimize vmx_get_cpl() In long mode, vm86 mode is disallowed, so we need not check for it. Reading rflags.vm may require a VMREAD, so it is expensive. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:54 -04:00
Avi Kivity	6de12732c4	KVM: VMX: Optimize vmx_get_rflags() If called several times within the same exit, return cached results. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:54 -04:00
Avi Kivity	f6e7847589	KVM: Use kvm_get_rflags() and kvm_set_rflags() instead of the raw versions Some rflags bits are owned by the host, not guest, so we need to use kvm_get_rflags() to strip those bits away or kvm_set_rflags() to add them back. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:54 -04:00
Gleb Natapov	776e58ea3d	KVM: unbreak userspace that does not sets tss address Commit 6440e5967bc broke old userspaces that do not set tss address before entering vcpu. Unbreak it by setting tss address to a safe value on the first vcpu entry. New userspaces should set tss address, so print warning in case it doesn't. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:35 -03:00
Xiao Guangrong	40dcaa9f69	KVM: fix rcu usage in init_rmode_* functions fix: [ 3494.671786] stack backtrace: [ 3494.671789] Pid: 10527, comm: qemu-system-x86 Not tainted 2.6.38-rc6+ #23 [ 3494.671790] Call Trace: [ 3494.671796] [] ? lockdep_rcu_dereference+0x9d/0xa5 [ 3494.671826] [] ? kvm_memslots+0x6b/0x73 [kvm] [ 3494.671834] [] ? gfn_to_memslot+0x16/0x4f [kvm] [ 3494.671843] [] ? gfn_to_hva+0x16/0x27 [kvm] [ 3494.671851] [] ? kvm_write_guest_page+0x31/0x83 [kvm] [ 3494.671861] [] ? kvm_clear_guest_page+0x1a/0x1c [kvm] [ 3494.671867] [] ? vmx_set_tss_addr+0x83/0x122 [kvm_intel] and: [ 8328.789599] stack backtrace: [ 8328.789601] Pid: 18736, comm: qemu-system-x86 Not tainted 2.6.38-rc6+ #23 [ 8328.789603] Call Trace: [ 8328.789609] [] ? lockdep_rcu_dereference+0x9d/0xa5 [ 8328.789621] [] ? kvm_memslots+0x6b/0x73 [kvm] [ 8328.789628] [] ? gfn_to_memslot+0x16/0x4f [kvm] [ 8328.789635] [] ? gfn_to_hva+0x16/0x27 [kvm] [ 8328.789643] [] ? kvm_write_guest_page+0x31/0x83 [kvm] [ 8328.789699] [] ? kvm_clear_guest_page+0x1a/0x1c [kvm] [ 8328.789713] [] ? vmx_create_vcpu+0x316/0x3c8 [kvm_intel] Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:34 -03:00
Takuya Yoshikawa	afc20184b7	KVM: x86: Remove useless regs_page pointer from kvm_lapic Access to this page is mostly done through the regs member which holds the address to this page. The exceptions are in vmx_vcpu_reset() and kvm_free_lapic() and these both can easily be converted to using regs. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:33 -03:00
Gleb Natapov	93ea5388ea	KVM: VMX: Initialize vm86 TSS only once. Currently vm86 task is initialized on each real mode entry and vcpu reset. Initialization is done by zeroing TSS and updating relevant fields. But since all vcpus are using the same TSS there is a race where one vcpu may use TSS while other vcpu is initializing it, so the vcpu that uses TSS will see wrong TSS content and will behave incorrectly. Fix that by initializing TSS only once. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:31 -03:00
Gleb Natapov	a8ba6c2622	KVM: VMX: update live TR selector if it changes in real mode When rmode.vm86 is active TR descriptor is updated with vm86 task values, but selector is left intact. vmx_set_segment() makes sure that if TR register is written into while vm86 is active the new values are saved for use after vm86 is deactivated, but since selector is not updated on vm86 activation/deactivation new value is lost. Fix this by writing new selector into vmcs immediately. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:31 -03:00
Lai Jiangshan	a3b5ba49a8	KVM: VMX: add the __noclone attribute to vmx_vcpu_run The changelog of `104f226` said "adds the __noclone attribute", but it was missing in its patch. I think it is still needed. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:31 -03:00
Joseph Cihula	23f3e99132	KVM: VMX: fix detection of BIOS disabling VMX This patch fixes the logic used to detect whether BIOS has disabled VMX, for the case where VMX is enabled only under SMX, but tboot is not active. Signed-off-by: Joseph Cihula <joseph.cihula@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:30 -03:00
Avi Kivity	40712faeb8	KVM: VMX: Avoid atomic operation in vmx_vcpu_run Instead of exchanging the guest and host rcx, have separate storage for each. This allows us to avoid using the xchg instruction, which is is a little slower than normal operations. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:26 -03:00
Avi Kivity	1c696d0e1b	KVM: VMX: Simplify saving guest rcx in vmx_vcpu_run Change push top-of-stack pop guest-rcx pop dummy to pop guest-rcx which is the same thing, only simpler. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:25 -03:00
Rik van Riel	00c25bce02	KVM: VMX: increase ple_gap default to 128 On some CPUs, a ple_gap of 41 is simply insufficient to ever trigger PLE exits, even with the minimalistic PLE test from kvm-unit-tests. http://git.kernel.org/?p=virt/kvm/kvm-unit-tests.git;a=commitdiff;h=eda71b28fa122203e316483b35f37aaacd42f545 For example, the Xeon X5670 CPU needs a ple_gap of at least 48 in order to get pause loop exits: # modprobe kvm_intel ple_gap=47 # taskset 1 /usr/local/bin/qemu-system-x86_64 \ -device testdev,chardev=log -chardev stdio,id=log \ -kernel x86/vmexit.flat -append ple-round-robin -smp 2 VNC server running on `::1:5900' enabling apic enabling apic ple-round-robin 58298446 # rmmod kvm_intel # modprobe kvm_intel ple_gap=48 # taskset 1 /usr/local/bin/qemu-system-x86_64 \ -device testdev,chardev=log -chardev stdio,id=log \ -kernel x86/vmexit.flat -append ple-round-robin -smp 2 VNC server running on `::1:5900' enabling apic enabling apic ple-round-robin 36616 Increase the ple_gap to 128 to be on the safe side. Signed-off-by: Rik van Riel <riel@redhat.com> Acked-by: Zhai, Edwin <edwin.zhai@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:25 -03:00
Avi Kivity	a917949935	KVM: VMX: Avoid leaking fake realmode state to userspace When emulating real mode, we fake some state: - tr.base points to a fake vm86 tss - segment registers are made to conform to vm86 restrictions change vmx_get_segment() not to expose this fake state to userspace; instead, return the original state. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:25 -03:00
Avi Kivity	d0ba64f9b4	KVM: VMX: Save and restore tr selector across mode switches When emulating real mode we play with tr hidden state, but leave tr.selector alone. That works well, except for save/restore, since loading TR writes it to the hidden state in vmx->rmode. Fix by also saving and restoring the tr selector; this makes things more consistent and allows migration to work during the early boot stages of Windows XP. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:25 -03:00
Gleb Natapov	444e863d13	KVM: VMX: when entering real mode align segment base to 16 bytes VMX checks that base is equal segment shifted 4 bits left. Otherwise guest entry fails. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:31:20 +02:00
Avi Kivity	aff48baa34	KVM: Fetch guest cr3 from hardware on demand Instead of syncing the guest cr3 every exit, which is expensince on vmx with ept enabled, sync it only on demand. [sheng: fix incorrect cr3 seen by Windows XP] Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:31:16 +02:00
Avi Kivity	9f8fe5043f	KVM: Replace reads of vcpu->arch.cr3 by an accessor This allows us to keep cr3 in the VMCS, later on. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:31:15 +02:00
Avi Kivity	16d8f72f70	KVM: VMX: Correct asm constraint in vmcs_load()/vmcs_clear() 'error' is byte sized, so use a byte register constraint. Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:31:12 +02:00
Avi Kivity	110312c84b	KVM: VMX: Optimize atomic EFER load When NX is enabled on the host but not on the guest, we use the entry/exit msr load facility, which is slow. Optimize it to use entry/exit efer load, which is ~1200 cycles faster. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:31:09 +02:00
Andre Przywara	dc25e89e07	KVM: SVM: copy instruction bytes from VMCB In case of a nested page fault or an intercepted #PF newer SVM implementations provide a copy of the faulting instruction bytes in the VMCB. Use these bytes to feed the instruction emulator and avoid the costly guest instruction fetch in this case. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:31:07 +02:00
Andre Przywara	51d8b66199	KVM: cleanup emulate_instruction emulate_instruction had many callers, but only one used all parameters. One parameter was unused, another one is now hidden by a wrapper function (required for a future addition anyway), so most callers use now a shorter parameter list. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:31:00 +02:00
Andre Przywara	db8fcefaa7	KVM: move complete_insn_gp() into x86.c move the complete_insn_gp() helper function out of the VMX part into the generic x86 part to make it usable by SVM. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:30:59 +02:00
Andre Przywara	eea1cff9ab	KVM: x86: fix CR8 handling The handling of CR8 writes in KVM is currently somewhat cumbersome. This patch makes it look like the other CR register handlers and fixes a possible issue in VMX, where the RIP would be incremented despite an injected #GP. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:30:58 +02:00
Anthony Liguori	443381a828	KVM: VMX: add module parameter to avoid trapping HLT instructions (v5) In certain use-cases, we want to allocate guests fixed time slices where idle guest cycles leave the machine idling. There are many approaches to achieve this but the most direct is to simply avoid trapping the HLT instruction which lets the guest directly execute the instruction putting the processor to sleep. Introduce this as a module-level option for kvm-vmx.ko since if you do this for one guest, you probably want to do it for all. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:30:46 +02:00
Avi Kivity	a295673aba	KVM: VMX: Return 0 from a failed VMREAD If we execute VMREAD during reboot we'll just skip over it. Instead of returning garbage, return 0, which has a much smaller chance of confusing the code. Otherwise we risk a flood of debug printk()s which block the reboot process if a serial console or netconsole is enabled. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:30:20 +02:00
Avi Kivity	586f960796	KVM: Add instruction-set-specific exit qualifications to kvm_exit trace The exit reason alone is insufficient to understand exactly why an exit occured; add ISA-specific trace parameters for additional information. Because fetching these parameters is expensive on vmx, and because these parameters are fetched even if tracing is disabled, we fetch the parameters via a callback instead of as traditional trace arguments. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:29:41 +02:00
Avi Kivity	aa17911e3c	KVM: Record instruction set in kvm_exit tracepoint exit_reason's meaning depend on the instruction set; record it so a trace taken on one machine can be interpreted on another. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:29:40 +02:00
Avi Kivity	104f226bfd	KVM: VMX: Fold __vmx_vcpu_run() into vmx_vcpu_run() cea15c2 ("KVM: Move KVM context switch into own function") split vmx_vcpu_run() to prevent multiple copies of the context switch from being generated (causing problems due to a label). This patch folds them back together again and adds the __noclone attribute to prevent the label from being duplicated. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:29:37 +02:00
Shane Wang	f9335afea5	KVM: VMX: Inform user about INTEL_TXT dependency Inform user to either disable TXT in the BIOS or do TXT launch with tboot before enabling KVM since some BIOSes do not set FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX bit when TXT is enabled. Signed-off-by: Shane Wang <shane.wang@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:29:31 +02:00
Avi Kivity	30bd0c4c6c	KVM: VMX: Disallow NMI while blocked by STI While not mandated by the spec, Linux relies on NMI being blocked by an IF-enabling STI. VMX also refuses to enter a guest in this state, at least on some implementations. Disallow NMI while blocked by STI by checking for the condition, and requesting an interrupt window exit if it occurs. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:29:04 +02:00
Gleb Natapov	ec25d5e66e	KVM: handle exit due to INVD in VMX Currently the exit is unhandled, so guest halts with error if it tries to execute INVD instruction. Call into emulator when INVD instruction is executed by a guest instead. This instruction is not needed by ordinary guests, but firmware (like OpenBIOS) use it and fail. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:28:53 +02:00
Marcelo Tosatti	ff1fcb9ebd	KVM: VMX: remove setting of shadow_base_ptes for EPT The EPT present/writable bits use the same position as normal pagetable bits. Since direct_map passes ACC_ALL to mmu_set_spte, thus always setting the writable bit on sptes, use the generic PT_PRESENT shadow_base_pte. Also pass present/writable error code information from EPT violation to generic pagefault handler. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:23:37 +02:00
Andi Kleen	f56f536956	KVM: Move KVM context switch into own function gcc 4.5 with some special options is able to duplicate the VMX context switch asm in vmx_vcpu_run(). This results in a compile error because the inline asm sequence uses an on local label. The non local label is needed because other code wants to set up the return address. This patch moves the asm code into an own function and marks that explicitely noinline to avoid this problem. Better would be probably to just move it into an .S file. The diff looks worse than the change really is, it's all just code movement and no logic change. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:26 +02:00
Joerg Roedel	24d1b15f72	KVM: SVM: Do not report xsave in supported cpuid To support xsave properly for the guest the SVM module need software support for it. As long as this is not present do not report the xsave as supported feature in cpuid. As a side-effect this patch moves the bit() helper function into the x86.h file so that it can be used in svm.c too. KVM-Stable-Tag. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-12-08 17:28:37 +02:00
Avi Kivity	c8770e7ba6	KVM: VMX: Fix host userspace gsbase corruption We now use load_gs_index() to load gs safely; unfortunately this also changes MSR_KERNEL_GS_BASE, which we managed separately. This resulted in confusion and breakage running 32-bit host userspace on a 64-bit kernel. Fix by - saving guest MSR_KERNEL_GS_BASE before we we reload the host's gs - doing the host save/load unconditionally, instead of only when in guest long mode Things can be cleaned up further, but this is the minmal fix for now. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-11-17 19:48:05 -02:00
Avi Kivity	0a77fe4c18	KVM: Correct ordering of ldt reload wrt fs/gs reload If fs or gs refer to the ldt, they must be reloaded after the ldt. Reorder the code to that effect. Userspace code that uses the ldt with kvm is nonexistent, so this doesn't fix a user-visible bug. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-11-17 19:47:59 -02:00
Nicolas Kaiser	9611c18777	KVM: fix typo in copyright notice Fix typo in copyright notice. Signed-off-by: Nicolas Kaiser <nikai@nikai.net> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:14 +02:00
Jan Kiszka	07d6f555d5	KVM: VMX: Add AX to list of registers clobbered by guest switch By chance this caused no harm so far. We overwrite AX during switch to/from guest context, so we must declare this. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:07 +02:00
Avi Kivity	49e9d557f9	KVM: VMX: Respect interrupt window in big real mode If an interrupt is pending, we need to stop emulation so we can inject it. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:02 +02:00
Mohammed Gamal	a92601bb70	KVM: VMX: Emulated real mode interrupt injection Replace the inject-as-software-interrupt hack we currently have with emulated injection. Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:01 +02:00
Avi Kivity	625831a3f4	KVM: VMX: Move fixup_rmode_irq() to avoid forward declaration No code changes. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:54 +02:00
Avi Kivity	b463a6f744	KVM: Non-atomic interrupt injection Change the interrupt injection code to work from preemptible, interrupts enabled context. This works by adding a ->cancel_injection() operation that undoes an injection in case we were not able to actually enter the guest (this condition could never happen with atomic injection). Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:54 +02:00
Avi Kivity	83422e17c1	KVM: VMX: Parameterize vmx_complete_interrupts() for both exit and entry Currently vmx_complete_interrupts() can decode event information from vmx exit fields into the generic kvm event queues. Make it able to decode the information from the entry fields as well by parametrizing it. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:52 +02:00
Avi Kivity	537b37e267	KVM: VMX: Move real-mode interrupt injection fixup to vmx_complete_interrupts() This allows reuse of vmx_complete_interrupts() for cancelling injections. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:51 +02:00

1 2 3 4 5 ...

446 Commits