linux

Commit Graph

Author	SHA1	Message	Date
Robert P. J. Day	50a3485c59	KVM: Replace C code with call to ARRAY_SIZE() macro. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:45 +03:00
Avi Kivity	17c3ba9d37	KVM: Lazy guest cr3 switching Switch guest paging context may require us to allocate memory, which might fail. Instead of wiring up error paths everywhere, make context switching lazy and actually do the switch before the next guest entry, where we can return an error if allocation fails. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:45 +03:00
Eddie Dong	2cc51560ae	KVM: VMX: Avoid saving and restoring msr_efer on lightweight vmexit MSR_EFER.LME/LMA bits are automatically save/restored by VMX hardware, KVM only needs to save NX/SCE bits at time of heavy weight VM Exit. But clearing NX bits in host envirnment may cause system hang if the host page table is using EXB bits, thus we leave NX bits as it is. If Host NX=1 and guest NX=0, we can do guest page table EXB bits check before inserting a shadow pte (though no guest is expecting to see this kind of gp fault). If host NX=0, we present guest no Execute-Disable feature to guest, thus no host NX=0, guest NX=1 combination. This patch reduces raw vmexit time by ~27%. Me: fix compile warnings on i386. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:42 +03:00
Eddie Dong	f2be4dd654	KVM: VMX: Cleanup redundant code in MSR set Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:42 +03:00
Eddie Dong	a75beee6e4	KVM: VMX: Avoid saving and restoring msrs on lightweight vmexit In a lightweight exit (where we exit and reenter the guest without scheduling or exiting to userspace in between), we don't need various msrs on the host, and avoiding shuffling them around reduces raw exit time by 8%. i386 compile fix by Daniel Hecken <dh@bahntechnik.de>. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:41 +03:00
Nitin A Kamble	b3f37707b0	KVM: VMX: Handle #SS faults from real mode Instructions with address size override prefix opcode 0x67 Cause the #SS fault with 0 error code in VM86 mode. Forward them to the emulator. Signed-Off-By: Nitin A Kamble <nitin.a.kamble@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:41 +03:00
Avi Kivity	cd2276a795	KVM: VMX: Use local labels in inline assembly This makes oprofile dumps and disassebly easier to read. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:41 +03:00
Avi Kivity	cd0536d7cb	KVM: Fix vmx I/O bitmap initialization on highmem systems kunmap() expects a struct page, not a virtual address. Fixes an oops loading kvm-intel.ko on i386 with CONFIG_HIGHMEM. Thanks to Michael Ivanov <deruhu@peterstar.ru> for reporting. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:41 +03:00
Avi Kivity	653e3108b7	KVM: Avoid corrupting tr in real mode The real mode tr needs to be set to a specific tss so that I/O instructions can function. Divert the new tr values to the real mode save area from where they will be restored on transition to protected mode. This fixes some crashes on reboot when the bios accesses an I/O instruction. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:41 +03:00
Avi Kivity	eff708bc2b	KVM: VMX: Only reload guest msrs if they are already loaded If we set an msr via an ioctl() instead of by handling a guest exit, we have the host state loaded, so reloading the msrs would clobber host state instead of guest state. This fixes a host oops (and loss of a cpu) on a guest reboot. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:41 +03:00
Avi Kivity	5fd86fcfc0	KVM: Consolidate guest fpu activation and deactivation Easier to keep track of where the fpu is this way. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:39 +03:00
Avi Kivity	abd3f2d622	KVM: Rationalize exception bitmap usage Everyone owns a piece of the exception bitmap, but they happily write to the entire thing like there's no tomorrow. Centralize handling in update_exception_bitmap() and have everyone call that. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:39 +03:00
Avi Kivity	707c087430	KVM: Move some more msr mangling into vmx_save_host_state() Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:39 +03:00
Avi Kivity	33ed632921	KVM: Fix potential guest state leak into host The lightweight vmexit path avoids saving and reloading certain host state. However in certain cases lightweight vmexit handling can schedule() which requires reloading the host state. So we store the host state in the vcpu structure, and reloaded it if we relinquish the vcpu. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:39 +03:00
Avi Kivity	621358455a	KVM: Be more careful restoring fs on lightweight vmexit i386 wants fs for accessing the pda even on a lightweight exit, so ensure we can always restore it. This fixes a regression on i386 introduced by the lightweight vmexit patch. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:38 +03:00
Avi Kivity	05e0c8c344	KVM: Unindent some code Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:38 +03:00
Avi Kivity	e6adf28365	KVM: Avoid saving and restoring some host CPU state on lightweight vmexit Many msrs and the like will only be used by the host if we schedule() or return to userspace. Therefore, we avoid saving them if we handle the exit within the kernel, and if a reschedule is not requested. Based on a patch from Eddie Dong <eddie.dong@intel.com> with a couple of fixes by me. Signed-off-by: Yaozu(Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:38 +03:00
He, Qing	fdef3ad1b3	KVM: VMX: Enable io bitmaps to avoid IO port 0x80 VMEXITs This patch enables IO bitmaps control on vmx and unmask the 0x80 port to avoid VMEXITs caused by accessing port 0x80. 0x80 is used as delays (see include/asm/io.h), and handling VMEXITs on its access is unnecessary but slows things down. This patch improves kernel build test at around 3%~5%. Because every VM uses the same io bitmap, it is shared between all VMs rather than a per-VM data structure. Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:37 +03:00
Avi Kivity	7702fd1f6f	KVM: Prevent guest fpu state from leaking into the host The lazy fpu changes did not take into account that some vmexit handlers can sleep. Move loading the guest state into the inner loop so that it can be reloaded if necessary, and move loading the host state into vmx_vcpu_put() so it can be performed whenever we relinquish the vcpu. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-06-15 12:30:59 +03:00
Sam Ravnborg	39959588f5	kvm: fix section mismatch warning in kvm-intel.o Fix following section mismatch warning in kvm-intel.o: WARNING: o-i386/drivers/kvm/kvm-intel.o(.init.text+0xbd): Section mismatch: reference to .exit.text: (between 'hardware_setup' and 'vmx_disabled_by_bios') The function free_kvm_area is used in the function alloc_kvm_area which is marked __init. The __exit area is discarded by some archs during link-time if a module is built-in resulting in an oops. Note: This warning is only seen by my local copy of modpost but the change will soon hit upstream. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-06-01 08:18:30 -07:00
Alexey Dobriyan	e8edc6e03a	Detach sched.h from mm.h First thing mm.h does is including sched.h solely for can_do_mlock() inline function which has "current" dereference inside. By dealing with can_do_mlock() mm.h can be detached from sched.h which is good. See below, why. This patch a) removes unconditional inclusion of sched.h from mm.h b) makes can_do_mlock() normal function in mm/mlock.c c) exports can_do_mlock() to not break compilation d) adds sched.h inclusions back to files that were getting it indirectly. e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were getting them indirectly Net result is: a) mm.h users would get less code to open, read, preprocess, parse, ... if they don't need sched.h b) sched.h stops being dependency for significant number of files: on x86_64 allmodconfig touching sched.h results in recompile of 4083 files, after patch it's only 3744 (-8.3%). Cross-compile tested on all arm defconfigs, all mips defconfigs, all powerpc defconfigs, alpha alpha-up arm i386 i386-up i386-defconfig i386-allnoconfig ia64 ia64-up m68k mips parisc parisc-up powerpc powerpc-up s390 s390-up sparc sparc-up sparc64 sparc64-up um-x86_64 x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig as well as my two usual configs. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-21 09:18:19 -07:00
Avi Kivity	2ff81f70b5	KVM: Remove unused 'instruction_length' As we no longer emulate in userspace, this is meaningless. We don't compute it on SVM anyway. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:32 +03:00
Anthony Liguori	2ab455ccce	KVM: VMX: Add lazy FPU support for VT Only save/restore the FPU host state when the guest is actually using the FPU. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:31 +03:00
Anthony Liguori	25c4c2762e	KVM: VMX: Properly shadow the CR0 register in the vcpu struct Set all of the host mask bits for CR0 so that we can maintain a proper shadow of CR0. This exposes CR0.TS, paving the way for lazy fpu handling. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:31 +03:00
Avi Kivity	e0e5127d06	KVM: Don't complain about cpu erratum AA15 It slows down Windows x64 horribly. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:31 +03:00
Avi Kivity	1165f5fec1	KVM: Per-vcpu statistics Make the exit statistics per-vcpu instead of global. This gives a 3.5% boost when running one virtual machine per core on my two socket dual core (4 cores total) machine. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:30 +03:00
Avi Kivity	4d56c8a787	KVM: VMX: Only save/restore MSR_K6_STAR if necessary Intel hosts only support syscall/sysret in long more (and only if efer.sce is enabled), so only reload the related MSR_K6_STAR if the guest will actually be able to use it. This reduces vmexit cost by about 500 cycles (6400 -> 5870) on my setup. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:30 +03:00
Avi Kivity	35cc7f9711	KVM: Fold drivers/kvm/kvm_vmx.h into drivers/kvm/vmx.c No meat in that file. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:30 +03:00
Avi Kivity	e38aea3e93	KVM: VMX: Don't switch 64-bit msrs for 32-bit guests Some msrs are only used by x86_64 instructions, and are therefore not needed when the guest is legacy mode. By not bothering to switch them, we reduce vmexit latency by 2400 cycles (from about 8800) when running a 32-bt guest on a 64-bit host. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:30 +03:00
Avi Kivity	2345df8c55	KVM: VMX: Reduce unnecessary saving of host msrs THe automatically switched msrs are never changed on the host (with the exception of MSR_KERNEL_GS_BASE) and thus there is no need to save them on every vm entry. This reduces vmexit latency by ~400 cycles on i386 and by ~900 cycles (10%) on x86_64. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:29 +03:00
Eric Sesterhenn / Snakebyte	3964994bb5	KVM: Fix overflow bug in overflow detection code The expression sp - 6 < sp where sp is a u16 is undefined in C since 'sp - 6' is promoted to int, and signed overflow is undefined in C. gcc 4.2 actually warns about it. Replace with a simpler test. Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:29 +03:00
Avi Kivity	954bbbc236	KVM: Simply gfn_to_page() Mapping a guest page to a host page is a common operation. Currently, one has first to find the memory slot where the page belongs (gfn_to_memslot), then locate the page itself (gfn_to_page()). This is clumsy, and also won't work well with memory aliases. So simplify gfn_to_page() not to require memory slot translation first, and instead do it internally. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:28 +03:00
Avi Kivity	afeb1f14c5	KVM: Remove debug message No longer interesting. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:27 +03:00
Avi Kivity	038881c8be	KVM: Hack real-mode segments on vmx from KVM_SET_SREGS As usual, we need to mangle segment registers when emulating real mode as vm86 has specific constraints. We special case the reset segment base, and set the "access rights" (or descriptor flags) to vm86 comaptible values. This fixes reboot on vmx. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:26 +03:00
Avi Kivity	f6528b03f1	KVM: Remove set_cr0_no_modeswitch() arch op set_cr0_no_modeswitch() was a hack to avoid corrupting segment registers. As we now cache the protected mode values on entry to real mode, this isn't an issue anymore, and it interferes with reboot (which usually _is_ a modeswitch). Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:25 +03:00
Avi Kivity	8cb5b03332	KVM: Workaround vmx inability to virtualize the reset state The reset state has cs.selector == 0xf000 and cs.base == 0xffff0000, which aren't compatible with vm86 mode, which is used for real mode virtualization. When we create a vcpu, we set cs.base to 0xf0000, but if we get there by way of a reset, the values are inconsistent and vmx refuses to enter guest mode. Workaround by detecting the state and munging it appropriately. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:25 +03:00
Avi Kivity	039576c03c	KVM: Avoid guest virtual addresses in string pio userspace interface The current string pio interface communicates using guest virtual addresses, relying on userspace to translate addresses and to check permissions. This interface cannot fully support guest smp, as the check needs to take into account two pages at one in case an unaligned string transfer straddles a page boundary. Change the interface not to communicate guest addresses at all; instead use a buffer page (mmaped by userspace) and do transfers there. The kernel manages the virtual to physical translation and can perform the checks atomically by taking the appropriate locks. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:25 +03:00
Avi Kivity	1b19f3e61d	KVM: Add a special exit reason when exiting due to an interrupt This is redundant, as we also return -EINTR from the ioctl, but it allows us to examine the exit_reason field on resume without seeing old data. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:24 +03:00
Avi Kivity	8eb7d334bd	KVM: Fold kvm_run::exit_type into kvm_run::exit_reason Currently, userspace is told about the nature of the last exit from the guest using two fields, exit_type and exit_reason, where exit_type has just two enumerations (and no need for more). So fold exit_type into exit_reason, reducing the complexity of determining what really happened. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:24 +03:00
Avi Kivity	06465c5a3a	KVM: Handle cpuid in the kernel instead of punting to userspace KVM used to handle cpuid by letting userspace decide what values to return to the guest. We now handle cpuid completely in the kernel. We still let userspace decide which values the guest will see by having userspace set up the value table beforehand (this is necessary to allow management software to set the cpu features to the least common denominator, so that live migration can work). The motivation for the change is that kvm kernel code can be impacted by cpuid features, for example the x86 emulator. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:23 +03:00
Avi Kivity	46fc147788	KVM: Do not communicate to userspace through cpu registers during PIO Currently when passing the a PIO emulation request to userspace, we rely on userspace updating %rax (on 'in' instructions) and %rsi/%rdi/%rcx (on string instructions). This (a) requires two extra ioctls for getting and setting the registers and (b) is unfriendly to non-x86 archs, when they get kvm ports. So fix by doing the register fixups in the kernel and passing to userspace only an abstract description of the PIO to be done. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:23 +03:00
Dor Laor	510043da85	KVM: Use the generic skip_emulated_instruction() in hypercall code Instead of twiddling the rip registers directly, use the skip_emulated_instruction() function to do that for us. Signed-off-by: Dor Laor <dor.laor@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:22 +03:00
Ingo Molnar	6d9658df07	KVM: always reload segment selectors failed VM entry on VMX might still change %fs or %gs, thus make sure that KVM always reloads the segment selectors. This is crutial on both x86 and x86_64: x86 has __KERNEL_PDA in %fs on which things like 'current' depends and x86_64 has 0 there and needs MSR_GS_BASE to work. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-03-27 17:55:48 +02:00
Avi Kivity	6af11b9e82	KVM: Prevent system selectors leaking into guest on real->protected mode transition on vmx Intel virtualization extensions do not support virtualizing real mode. So kvm uses virtualized vm86 mode to run real mode code. Unfortunately, this virtualized vm86 mode does not support the so called "big real" mode, where the segment selector and base do not agree with each other according to the real mode rules (base == selector << 4). To work around this, kvm checks whether a selector/base pair violates the virtualized vm86 rules, and if so, forces it into conformance. On a transition back to protected mode, if we see that the guest did not touch a forced segment, we restore it back to the original protected mode value. This pile of hacks breaks down if the gdt has changed in real mode, as it can cause a segment selector to point to a system descriptor instead of a normal data segment. In fact, this happens with the Windows bootloader and the qemu acpi bios, where a protected mode memcpy routine issues an innocent 'pop %es' and traps on an attempt to load a system descriptor. "Fix" by checking if the to-be-restored selector points at a system segment, and if so, coercing it into a normal data segment. The long term solution, of course, is to abandon vm86 mode and use emulation for big real mode. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-03-27 17:54:38 +02:00
Avi Kivity	f5b42c3324	KVM: Fix guest sysenter on vmx The vmx code currently treats the guest's sysenter support msrs as 32-bit values, which breaks 32-bit compat mode userspace on 64-bit guests. Fix by using the native word width of the machine. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-03-18 10:49:06 +02:00
Avi Kivity	bccf2150fe	KVM: Per-vcpu inodes Allocate a distinct inode for every vcpu in a VM. This has the following benefits: - the filp cachelines are no longer bounced when f_count is incremented on every ioctl() - the API and internal code are distinctly clearer; for example, on the KVM_GET_REGS ioctl, there is no need to copy the vcpu number from userspace and then copy the registers back; the vcpu identity is derived from the fd used to make the call Right now the performance benefits are completely theoretical since (a) we don't support more than one vcpu per VM and (b) virtualization hardware inefficiencies completely everwhelm any cacheline bouncing effects. But both of these will change, and we need to prepare the API today. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-03-04 11:12:42 +02:00
Avi Kivity	270fd9b96f	KVM: Wire up hypercall handlers to a central arch-independent location Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-03-04 11:12:41 +02:00
Ingo Molnar	c21415e843	KVM: Add host hypercall support for vmx Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-03-04 11:12:40 +02:00
Ingo Molnar	102d8325a1	KVM: add MSR based hypercall API This adds a special MSR based hypercall API to KVM. This is to be used by paravirtual kernels and virtual drivers. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-03-04 11:12:40 +02:00
Ahmed S. Darwish	9d8f549dc6	KVM: Use ARRAY_SIZE macro instead of manual calculation. Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com> Signed-off-by: Dor Laor <dor.laor@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-03-04 11:12:39 +02:00
Joerg Roedel	de979caacc	KVM: vmx: hack set_cr0_no_modeswitch() to actually do modeswitch The whole thing is rotten, but this allows vmx to boot with the guest reboot fix. Signed-off-by: Markus Rechberger <markus.rechberger@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-03-04 11:12:39 +02:00
Avi Kivity	d27d4aca18	KVM: Cosmetics Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-03-04 11:12:39 +02:00
Jeremy Fitzhardinge	464d1a78fb	[PATCH] i386: Convert i386 PDA code to use %fs Convert the PDA code to use %fs rather than %gs as the segment for per-processor data. This is because some processors show a small but measurable performance gain for reloading a NULL segment selector (as %fs generally is in user-space) versus a non-NULL one (as %gs generally is). On modern processors the difference is very small, perhaps undetectable. Some old AMD "K6 3D+" processors are noticably slower when %fs is used rather than %gs; I have no idea why this might be, but I think they're sufficiently rare that it doesn't matter much. This patch also fixes the math emulator, which had not been adjusted to match the changed struct pt_regs. [frederik.deweerdt@gmail.com: fixit with gdb] [mingo@elte.hu: Fix KVM too] Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Ian Campbell <Ian.Campbell@XenSource.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Zachary Amsden <zach@vmware.com> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org>	2007-02-13 13:26:20 +01:00
Avi Kivity	774c47f1d7	[PATCH] KVM: cpu hotplug support On hotplug, we execute the hardware extension enable sequence. On unplug, we decache any vcpus that last ran on the exiting cpu, and execute the hardware extension disable sequence. Signed-off-by: Avi Kivity <avi@qumranet.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:41 -08:00
Avi Kivity	8d0be2b3bf	[PATCH] KVM: VMX: add vcpu_clear() Like the inline code it replaces, this function decaches the vmcs from the cpu it last executed on. in addition: - vcpu_clear() works if the last cpu is also the cpu we're running on - it is faster on larger smps by virtue of using smp_call_function_single() Includes fix from Ingo Molnar. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Avi Kivity <avi@qumranet.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:41 -08:00
Avi Kivity	26bb83a755	[PATCH] kvm: VMX: Reload ds and es even in 64-bit mode Or 32-bit userspace will get confused. Signed-off-by: Avi Kivity <avi@qumranet.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:40 -08:00
Avi Kivity	988ad74ff6	[PATCH] kvm: vmx: handle triple faults by returning EXIT_REASON_SHUTDOWN to userspace Just like svm. Signed-off-by: Avi Kivity <avi@qumranet.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:40 -08:00
Ingo Molnar	96958231ce	[PATCH] kvm: optimize inline assembly Forms like "0(%rsp)" generate an instruction with an unnecessary one byte displacement under certain circumstances. replace with the equivalent "(%rsp)". Signed-off-by: Avi Kivity <avi@qumranet.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:40 -08:00
Al Viro	8b6d44c7bd	[PATCH] kvm: NULL noise removal Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-09 09:14:07 -08:00
Avi Kivity	432bd6cbf9	[PATCH] KVM: fix lockup on 32-bit intel hosts with nx disabled in the bios Intel hosts, without long mode, and with nx support disabled in the bios have an efer that is readable but not writable. This causes a lockup on switch to guest mode (even though it should exit with reason 34 according to the documentation). Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-01 16:22:41 -08:00
Avi Kivity	cccf748b81	[PATCH] KVM: fix race between mmio reads and injected interrupts The kvm mmio read path looks like: 1. guest read faults 2. kvm emulates read, calls emulator_read_emulated() 3. fails as a read requires userspace help 4. exit to userspace 5. userspace emulates read, kvm sets vcpu->mmio_read_completed 6. re-enter guest, fault again 7. kvm emulates read, calls emulator_read_emulated() 8. succeeds as vcpu->mmio_read_emulated is set 9. instruction completes and guest is resumed A problem surfaces if the userspace exit (step 5) also requests an interrupt injection. In that case, the guest does not re-execute the original instruction, but the interrupt handler. The next time an mmio read is exectued (likely for a different address), step 3 will find vcpu->mmio_read_completed set and return the value read for the original instruction. The problem manifested itself in a few annoying ways: - little squares appear randomly on console when switching virtual terminals - ne2000 fails under nfs read load - rtl8139 complains about "pci errors" even though the device model is incapable of issuing them. Fix by skipping interrupt injection if an mmio read is pending. A better fix is to avoid re-entry into the guest, and re-emulating immediately instead. However that's a bit more complex. Signed-off-by: Avi Kivity <avi@qumranet.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-01-23 07:52:06 -08:00
Herbert Xu	e001548911	[PATCH] vmx: Fix register constraint in launch code Both "=r" and "=g" breaks my build on i386: $ make CC [M] drivers/kvm/vmx.o {standard input}: Assembler messages: {standard input}:3318: Error: bad register name `%sil' make[1]: * [drivers/kvm/vmx.o] Error 1 make: * [_module_drivers/kvm] Error 2 The reason is that setbe requires an 8-bit register but "=r" does not constrain the target register to be one that has an 8-bit version on i386. According to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10153 the correct constraint is "=q". Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-01-22 19:27:02 -08:00
Ingo Molnar	07031e14c1	[PATCH] KVM: add VM-exit profiling This adds the profile=kvm boot option, which enables KVM to profile VM exits. Use: "readprofile -m ./System.map \| sort -n" to see the resulting output: [...] 18246 serial_out 148.3415 18945 native_flush_tlb 378.9000 23618 serial_in 212.7748 29279 __spin_unlock_irq 622.9574 43447 native_apic_write 2068.9048 52702 enable_8259A_irq 742.2817 54250 vgacon_scroll 89.3740 67394 ide_inb 6126.7273 79514 copy_page_range 98.1654 84868 do_wp_page 86.6000 140266 pit_read 783.6089 151436 ide_outb 25239.3333 152668 native_io_delay 21809.7143 174783 mask_and_ack_8259A 783.7803 362404 native_set_pte_at 36240.4000 1688747 total 0.5009 Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2007-01-11 18:18:21 -08:00
Dor Laor	022a93080c	[PATCH] KVM: Simplify test for interrupt window No need to test for rflags.if as both VT and SVM specs assure us that on exit caused from interrupt window opening, 'if' is set. Signed-off-by: Dor Laor <dor.laor@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2007-01-05 23:55:28 -08:00
Avi Kivity	4db9c47c05	[PATCH] KVM: Don't set guest cr3 from vmx_vcpu_setup() It overwrites the right cr3 set from mmu setup. Happens only with the test harness. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2007-01-05 23:55:28 -08:00
Avi Kivity	e52de1b8cf	[PATCH] KVM: Improve reporting of vmwrite errors This will allow us to see the root cause when a vmwrite error happens. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2007-01-05 23:55:27 -08:00
Avi Kivity	e2dec939db	[PATCH] KVM: MMU: Detect oom conditions and propagate error to userspace Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2007-01-05 23:55:27 -08:00
Avi Kivity	5f015a5b28	[PATCH] KVM: MMU: Remove invlpg interception Since we write protect shadowed guest page tables, there is no need to trap page invalidations (the guest will always change the mapping before issuing the invlpg instruction). Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2007-01-05 23:55:25 -08:00
Avi Kivity	ebeace8609	[PATCH] KVM: MMU: oom handling When beginning to process a page fault, make sure we have enough shadow pages available to service the fault. If not, free some pages. Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2007-01-05 23:55:25 -08:00
Avi Kivity	399badf315	[PATCH] KVM: Prevent stale bits in cr0 and cr4 Hardware virtualization implementations allow the guests to freely change some of the bits in cr0 and cr4, but trap when changing the other bits. This is useful to avoid excessive exits due to changing, for example, the ts flag. It also means the kvm's copy of cr0 and cr4 may be stale with respect to these bits. most of the time this doesn't matter as these bits are not very interesting. Other times, however (for example when returning cr0 to userspace), they are, so get the fresh contents of these bits from the guest by means of a new arch operation. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2007-01-05 23:55:23 -08:00
Dor Laor	c1150d8cf9	[PATCH] KVM: Improve interrupt response The current interrupt injection mechanism might delay an interrupt under the following circumstances: - if injection fails because the guest is not interruptible (rflags.IF clear, or after a 'mov ss' or 'sti' instruction). Userspace can check rflags, but the other cases or not testable under the current API. - if injection fails because of a fault during delivery. This probably never happens under normal guests. - if injection fails due to a physical interrupt causing a vmexit so that it can be handled by the host. In all cases the guest proceeds without processing the interrupt, reducing the interactive feel and interrupt throughput of the guest. This patch fixes the situation by allowing userspace to request an exit when the 'interrupt window' opens, so that it can re-inject the interrupt at the right time. Guest interactivity is very visibly improved. Signed-off-by: Dor Laor <dor.laor@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2007-01-05 23:55:22 -08:00
Ingo Molnar	d3b2c33860	[PATCH] KVM: Use raw_smp_processor_id() instead of smp_processor_id() where applicable Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2007-01-05 23:55:22 -08:00
Ingo Molnar	965b58a550	[PATCH] KVM: Fix GFP_KERNEL alloc in atomic section bug KVM does kmalloc() in an atomic section while having preemption disabled via vcpu_load(). Fix this by moving the ->*_msr setup from the vcpu_setup method to the vcpu_create method. (This is also a small speedup for setting up a vcpu, which can in theory be more frequent than the vcpu_create method). Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2007-01-05 23:55:21 -08:00
Nguyen Anh Quynh	c68876fd28	[PATCH] KVM: Rename some msrs No need to append _MSR to msr names, a prefix should suffice. Signed-off-by: Nguyen Anh Quynh <aquynh@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-30 10:56:44 -08:00
Avi Kivity	3bab1f5dda	[PATCH] KVM: Move common msr handling to arch independent code Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-30 10:56:44 -08:00
Avi Kivity	671d656479	[PATCH] KVM: Implement a few system configuration msrs Resolves sourceforge bug 1622229 (guest crashes running benchmark software). Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-30 10:56:44 -08:00
Avi Kivity	a9058ecd3c	[PATCH] KVM: Simplify is_long_mode() Instead of doing tricky stuff with the arch dependent virtualization registers, take a peek at the guest's efer. This simlifies some code, and fixes some confusion in the mmu branch. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-30 10:56:44 -08:00
Michael Riepe	0f8e3d365a	[PATCH] KVM: Handle p5 mce msrs This allows plan9 to get a little further booting. Signed-off-by: Michael Riepe <michael@mr511.de> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-22 08:55:46 -08:00
Michael Riepe	abacf8dff9	[PATCH] KVM: Force real-mode cs limit to 64K This allows opensolaris to boot on kvm/intel. Signed-off-by: Michael Riepe <michael@mr511.de> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-22 08:55:46 -08:00
Avi Kivity	bfdc0c280a	[PATCH] KVM: Fix vmx hardware_enable() on macbooks It seems macbooks set bit 2 but not bit 0, which is an "enabled but vmxon will fault" setting. Signed-off-by: Avi Kivity <avi@qumranet.com> Tested-by: Alex Larsson (sometimes testing helps) Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-13 09:05:48 -08:00
Michael Riepe	3b99ab2421	[PATCH] KVM: Don't touch the virtual apic vt registers on 32-bit Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-13 09:05:48 -08:00
Avi Kivity	873a7c423b	[PATCH] KVM: Disallow the kvm-amd module on intel hardware, and vice versa They're not on speaking terms. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-13 09:05:48 -08:00
Avi Kivity	7725f0badd	[PATCH] KVM: Move find_vmx_entry() to vmx.c Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-13 09:05:47 -08:00
Uri Lublin	f7fbf1fdf0	[PATCH] KVM: Make the GET_SREGS and SET_SREGS ioctls symmetric This makes the SET_SREGS ioctl behave symmetrically to the GET_SREGS ioctl wrt the segment access rights flag. Signed-off-by: Uri Lublin <uril@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-13 09:05:47 -08:00
Avi Kivity	05b3e0c2c7	[PATCH] KVM: Replace __x86_64__ with CONFIG_X86_64 As per akpm's request. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-13 09:05:46 -08:00
Anthony Liguori	3b3be0d1cc	[PATCH] KVM: Add missing include load_TR_desc() lives in asm/desc.h, so #include that file. Signed-off-by: Anthony Liguori <anthony@codemonkey.ws> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-13 09:05:46 -08:00
Avi Kivity	6aa8b732ca	[PATCH] kvm: userspace interface web site: http://kvm.sourceforge.net mailing list: kvm-devel@lists.sourceforge.net (http://lists.sourceforge.net/lists/listinfo/kvm-devel) The following patchset adds a driver for Intel's hardware virtualization extensions to the x86 architecture. The driver adds a character device (/dev/kvm) that exposes the virtualization capabilities to userspace. Using this driver, a process can run a virtual machine (a "guest") in a fully virtualized PC containing its own virtual hard disks, network adapters, and display. Using this driver, one can start multiple virtual machines on a host. Each virtual machine is a process on the host; a virtual cpu is a thread in that process. kill(1), nice(1), top(1) work as expected. In effect, the driver adds a third execution mode to the existing two: we now have kernel mode, user mode, and guest mode. Guest mode has its own address space mapping guest physical memory (which is accessible to user mode by mmap()ing /dev/kvm). Guest mode has no access to any I/O devices; any such access is intercepted and directed to user mode for emulation. The driver supports i386 and x86_64 hosts and guests. All combinations are allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae and non-pae paging modes are supported. SMP hosts and UP guests are supported. At the moment only Intel hardware is supported, but AMD virtualization support is being worked on. Performance currently is non-stellar due to the naive implementation of the mmu virtualization, which throws away most of the shadow page table entries every context switch. We plan to address this in two ways: - cache shadow page tables across tlb flushes - wait until AMD and Intel release processors with nested page tables Currently a virtual desktop is responsive but consumes a lot of CPU. Under Windows I tried playing pinball and watching a few flash movies; with a recent CPU one can hardly feel the virtualization. Linux/X is slower, probably due to X being in a separate process. In addition to the driver, you need a slightly modified qemu to provide I/O device emulation and the BIOS. Caveats (akpm: might no longer be true): - The Windows install currently bluescreens due to a problem with the virtual APIC. We are working on a fix. A temporary workaround is to use an existing image or install through qemu - Windows 64-bit does not work. That's also true for qemu, so it's probably a problem with the device model. [bero@arklinux.org: build fix] [simon.kagstrom@bth.se: build fix, other fixes] [uril@qumranet.com: KVM: Expose interrupt bitmap] [akpm@osdl.org: i386 build fix] [mingo@elte.hu: i386 fixes] [rdreier@cisco.com: add log levels to all printks] [randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings] [anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support] Signed-off-by: Yaniv Kamay <yaniv@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Cc: Simon Kagstrom <simon.kagstrom@bth.se> Cc: Bernhard Rosenkraenzer <bero@arklinux.org> Signed-off-by: Uri Lublin <uril@qumranet.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Anthony Liguori <anthony@codemonkey.ws> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-10 09:57:22 -08:00

1 2 3

137 Commits