Commit Graph

45458 Commits

Author SHA1 Message Date
Alexander Graf dd84c21748 KVM: PPC: Add KVM intercept handlers
When an interrupt occurs we don't know yet if we're in guest context or
in host context. When in guest context, KVM needs to handle it.

So let's pull the same trick we did on Book3S_64: Just add a macro to
determine if we're in guest context or not and if so jump on to KVM code.

CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:52 +03:00
Alexander Graf ada7ba17b4 KVM: PPC: Check max IRQ prio
We have a define on what the highest bit of IRQ priorities is. So we can
just as well use it in the bit checking code and avoid invalid IRQ values
to be triggered.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:51 +03:00
Alexander Graf 218d169c4c PPC: Export SWITCH_FRAME_SIZE
We need the SWITCH_FRAME_SIZE define on Book3S_32 now too.
So let's export it unconditionally.

CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:49 +03:00
Alexander Graf be85669886 KVM: PPC: Export MMU variables
Our shadow MMU code needs to know where the HTAB is located and how
big it is. So we need some variables from the kernel exported to
module space if KVM is built as a module.

CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:48 +03:00
Alexander Graf 07b0907db1 KVM: PPC: Add Book3S compatibility code
Some code we had so far required defines and had code that was completely
Book3S_64 specific. Since we now opened book3s.c to Book3S_32 too, we need
to take care of these pieces.

So let's add some minor code where it makes sense to not go the Book3S_64
code paths and add compat defines on others.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:46 +03:00
Alexander Graf 61db97cc1e KVM: PPC: Emulate segment fault
Book3S_32 doesn't know about segment faults. It only knows about page faults.
So in order to know that we didn't map a segment, we need to fake segment
faults.

We do this by setting invalid segment registers to an invalid VSID and then
check for that VSID on normal page faults.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:45 +03:00
Alexander Graf 97e492558f KVM: PPC: Add SVCPU to Book3S_32
We need to keep the pointer to the shadow vcpu somewhere accessible from
within really early interrupt code. The best fit I found was the thread
struct, as that resides in an SPRG.

So let's put a pointer to the shadow vcpu in the thread struct and add
an asm-offset so we can find it.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:43 +03:00
Alexander Graf 0474b259d0 KVM: PPC: Remove fetch fail code
When instruction fetch failed, the inline function hook automatically
detects that and starts the internal guest memory load function. So
whenever we access kvmppc_get_last_inst(), we're sure the result is sane.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:41 +03:00
Alexander Graf 33fd27c7d2 KVM: PPC: Release clean pages as clean
When we mapped a page as read-only, we can just release it as clean to
KVM's page claim mechanisms, because we're pretty sure it hasn't been
touched.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:40 +03:00
Alexander Graf 53e5b8bbbd KVM: PPC: Make SLB switching code the new segment framework
We just introduced generic segment switching code that only needs to call
small macros to do the actual switching, but keeps most of the entry / exit
code generic.

So let's move the SLB switching code over to use this new mechanism.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:38 +03:00
Alexander Graf b79fcdf67e KVM: PPC: Make highmem code generic
Since we now have several fields in the shadow VCPU, we also change
the internal calling convention between the different entry/exit code
layers.

Let's reflect that in the IR=1 code and make sure we use "long" defines
for long field access.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:37 +03:00
Alexander Graf 8c3a4e0b67 KVM: PPC: Make real mode handler generic
The real mode handler code was originally writen for 64 bit Book3S only.
But since we not add 32 bit functionality too, we need to make some tweaks
to it.

This patch basically combines using the "long" access defines and using
fields from the shadow VCPU we just moved there.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:35 +03:00
Alexander Graf 9cc5e9538a KVM: PPC: Extract MMU init
The host shadow mmu code needs to get initialized. It needs to fetch a
segment it can use to put shadow PTEs into.

That initialization code was in generic code, which is icky. Let's move
it over to the respective MMU file.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:34 +03:00
Alexander Graf 0604675fe1 KVM: PPC: Use now shadowed vcpu fields
The shadow vcpu now contains some fields we don't use from the vcpu anymore.
Access to them happens using inline functions that happily use the shadow
vcpu fields.

So let's now ifdef them out to booke only and add asm-offsets.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:32 +03:00
Alexander Graf 56db45a5cd PPC: Add STLU
For assembly code there are several "long" load and store defines already.
The one that's missing is the typical stack store, stdu/stwu.

So let's add that define as well, making my KVM code happy.

CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:30 +03:00
Alexander Graf 00c3a37ca3 KVM: PPC: Use CONFIG_PPC_BOOK3S define
Upstream recently added a new name for PPC64: Book3S_64.

So instead of using CONFIG_PPC64 we should use CONFIG_PPC_BOOK3S consotently.
That makes understanding the code easier (I hope).

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:29 +03:00
Alexander Graf c14dea04a2 KVM: PPC: Use KVM_BOOK3S_HANDLER
So far we had a lot of conditional code on CONFIG_KVM_BOOK3S_64_HANDLER.
As we're moving towards common code between 32 and 64 bits, most of
these ifdefs can be moved to a more generic term define, called
CONFIG_KVM_BOOK3S_HANDLER.

This patch adds the new generic config option and moves ifdefs over.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:28 +03:00
Alexander Graf c7f38f46f2 KVM: PPC: Improve indirect svcpu accessors
We already have some inline fuctions we use to access vcpu or svcpu structs,
depending on whether we're on booke or book3s. Since we just put a few more
registers into the svcpu, we also need to make sure the respective callbacks
are available and get used.

So this patch moves direct use of the now in the svcpu struct fields to
inline function calls. While at it, it also moves the definition of those
inline function calls to respective header files for booke and book3s,
greatly improving readability.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:26 +03:00
Alexander Graf 66bb170655 KVM: PPC: Add fields to shadow vcpu
After a lot of thought on how to make the entry / exit code easier,
I figured it'd be clever to put even more register state into the
shadow vcpu. That way we have more registers available to use, making
the code easier to read.

So this patch adds a few new fields to that shadow vcpu. Later on we
will remove the originals from the vcpu and paca.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:24 +03:00
Alexander Graf 8c60b9fb0f KVM: PPC: Add kvm_book3s_32.h
In analogy to the 64 bit specific header file, this is the 32 bit
pendant. With this in place we can just always call to_svcpu and
be assured we get the right pointer anywhere.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:23 +03:00
Alexander Graf 3ae07890dd KVM: PPC: Add kvm_book3s_64.h
In the process of generalizing as much code as possible, I also moved
the shadow vcpu code together to a generic book3s file. Unfortunately
the location of the shadow vcpu is different on 32 and 64 bit, so we
need a wrapper function to tell us where it is.

That sounded like a perfect fit for a subarch specific header file.
Here we can put anything that needs to be different between those two.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:21 +03:00
Alexander Graf c83ec269e6 PPC: Split context init/destroy functions
We need to reserve a context from KVM to make sure we have our own
segment space. While we did that split for Book3S_64 already, 32 bit
is still outstanding.

So let's split it now.

Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:20 +03:00
Alexander Graf 0737279427 KVM: PPC: Add generic segment switching code
This is the code that will later be used instead of book3s_64_slb.S. It
does the last step of guest entry and the first generic steps of guest
exiting, once we have determined the interrupt is a KVM interrupt.

It also reads the last used instruction from the guest virtual address
space if necessary, to speed up that path.

The new thing about this file is that it makes use of generic long load
and store functions and calls a macro to fill in the actual segment
switching code. That still needs to be done differently for book3s_32 and
book3s_64.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:18 +03:00
Alexander Graf 786f19daa8 KVM: PPC: Add SR swapping code
Later in this series we will move the current segment switch code to
generic code and make that call hooks for the specific sub-archs (32
vs. 64 bit). This is the hook for 32 bits.

It enabled the entry and exit code to swap segment registers with
values from the shadow cpu structure.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:17 +03:00
Alexander Graf d32154f1b8 KVM: PPC: Add host MMU Support
In order to support 32 bit Book3S, we need to add code to enable our
shadow MMU to actually add shadow PTEs. This is the module enabling
that support.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:15 +03:00
Alexander Graf 2191d657c9 KVM: PPC: Name generic 64-bit code generic
We have quite some code that can be used by Book3S_32 and Book3S_64 alike,
so let's call it "Book3S" instead of "Book3S_64", so we can later on
use it from the 32 bit port too.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:14 +03:00
Wei Yongjun 77a1a71570 KVM: MMU: cleanup for function unaccount_shadowed()
Since gfn is not changed in the for loop, we do not need to call
gfn_to_memslot_unaliased() under the loop, and it is safe to move
it out.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:12 +03:00
Gui Jianfeng 2a059bf444 KVM: Get rid of dead function gva_to_page()
Nobody use gva_to_page() anymore, get rid of it.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:10 +03:00
Gui Jianfeng b2fc15a5ef KVM: MMU: Remove unused varialbe in rmap_next()
Remove unused varialbe in rmap_next()

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:09 +03:00
Gui Jianfeng 814a59d207 KVM: MMU: Make use of is_large_pte() in walker
Make use of is_large_pte() instead of checking PT_PAGE_SIZE_MASK
bit directly.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:07 +03:00
Gui Jianfeng 51fb60d81b KVM: MMU: Move sync_page() first pte address calculation out of loop
Move first pte address calculation out of loop to save some cycles.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:06 +03:00
Avi Kivity 87bc3bf972 KVM: MMU: Drop cr4.pge from shadow page role
Since commit bf47a760f6, we no longer handle ptes with the global bit
set specially, so there is no reason to distinguish between shadow pages
created with cr4.gpe set and clear.

Such tracking is expensive when the guest toggles cr4.pge, so drop it.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:03 +03:00
Lai Jiangshan 90d83dc3d4 KVM: use the correct RCU API for PROVE_RCU=y
The RCU/SRCU API have already changed for proving RCU usage.

I got the following dmesg when PROVE_RCU=y because we used incorrect API.
This patch coverts rcu_deference() to srcu_dereference() or family API.

===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
arch/x86/kvm/mmu.c:3020 invoked rcu_dereference_check() without protection!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
2 locks held by qemu-system-x86/8550:
 #0:  (&kvm->slots_lock){+.+.+.}, at: [<ffffffffa011a6ac>] kvm_set_memory_region+0x29/0x50 [kvm]
 #1:  (&(&kvm->mmu_lock)->rlock){+.+...}, at: [<ffffffffa012262d>] kvm_arch_commit_memory_region+0xa6/0xe2 [kvm]

stack backtrace:
Pid: 8550, comm: qemu-system-x86 Not tainted 2.6.34-rc4-tip-01028-g939eab1 #27
Call Trace:
 [<ffffffff8106c59e>] lockdep_rcu_dereference+0xaa/0xb3
 [<ffffffffa012f6c1>] kvm_mmu_calculate_mmu_pages+0x44/0x7d [kvm]
 [<ffffffffa012263e>] kvm_arch_commit_memory_region+0xb7/0xe2 [kvm]
 [<ffffffffa011a5d7>] __kvm_set_memory_region+0x636/0x6e2 [kvm]
 [<ffffffffa011a6ba>] kvm_set_memory_region+0x37/0x50 [kvm]
 [<ffffffffa015e956>] vmx_set_tss_addr+0x46/0x5a [kvm_intel]
 [<ffffffffa0126592>] kvm_arch_vm_ioctl+0x17a/0xcf8 [kvm]
 [<ffffffff810a8692>] ? unlock_page+0x27/0x2c
 [<ffffffff810bf879>] ? __do_fault+0x3a9/0x3e1
 [<ffffffffa011b12f>] kvm_vm_ioctl+0x364/0x38d [kvm]
 [<ffffffff81060cfa>] ? up_read+0x23/0x3d
 [<ffffffff810f3587>] vfs_ioctl+0x32/0xa6
 [<ffffffff810f3b19>] do_vfs_ioctl+0x495/0x4db
 [<ffffffff810e6b2f>] ? fget_light+0xc2/0x241
 [<ffffffff810e416c>] ? do_sys_open+0x104/0x116
 [<ffffffff81382d6d>] ? retint_swapgs+0xe/0x13
 [<ffffffff810f3ba6>] sys_ioctl+0x47/0x6a
 [<ffffffff810021db>] system_call_fastpath+0x16/0x1b

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:18:01 +03:00
Avi Kivity 9beeaa2d68 Merge branch 'perf'
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:17:58 +03:00
Xiao Guangrong 3246af0ece KVM: MMU: cleanup for hlist walk restart
Quote from Avi:

|Just change the assignment to a 'goto restart;' please,
|I don't like playing with list_for_each internals.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:56 +03:00
Gleb Natapov acb5451789 KVM: prevent spurious exit to userspace during task switch emulation.
If kvm_task_switch() fails code exits to userspace without specifying
exit reason, so the previous exit reason is reused by userspace. Fix
this by specifying exit reason correctly.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:55 +03:00
Xiao Guangrong 6b18493d60 KVM: MMU: remove unused parameter in mmu_parent_walk()
'vcpu' is unused, remove it

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:53 +03:00
Xiao Guangrong 0571d366e0 KVM: MMU: reduce 'struct kvm_mmu_page' size
Define 'multimapped' as 'bool'.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:52 +03:00
Xiao Guangrong 1b8c7934a4 KVM: MMU: remove unused struct kvm_unsync_walk
Remove 'struct kvm_unsync_walk' since it's not used.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:50 +03:00
Gleb Natapov 19d0443726 KVM: fix emulator_task_switch() return value.
emulator_task_switch() should return -1 for failure and 0 for success to
the caller, just like x86_emulate_insn() does.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:49 +03:00
Avi Kivity 5b7e0102ae KVM: MMU: Replace role.glevels with role.cr4_pae
There is no real distinction between glevels=3 and glevels=4; both have
exactly the same format and the code is treated exactly the same way.  Drop
role.glevels and replace is with role.cr4_pae (which is meaningful).  This
simplifies the code a bit.

As a side effect, it allows sharing shadow page tables between pae and
longmode guest page tables at the same guest page.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:47 +03:00
Jan Kiszka e269fb2189 KVM: x86: Push potential exception error code on task switches
When a fault triggers a task switch, the error code, if existent, has to
be pushed on the new task's stack. Implement the missing bits.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:46 +03:00
Jan Kiszka 0760d44868 KVM: x86: Terminate early if task_switch_16/32 failed
Stop the switch immediately if task_switch_16/32 returned an error. Only
if that step succeeded, the switch should actually take place and update
any register states.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:44 +03:00
Gleb Natapov 8f6abd06f5 KVM: x86: get rid of mmu_only parameter in emulator_write_emulated()
We can call kvm_mmu_pte_write() directly from
emulator_cmpxchg_emulated() instead of passing mmu_only down to
emulator_write_emulated_onepage() and call it there.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:43 +03:00
Gleb Natapov 020df0794f KVM: move DR register access handling into generic code
Currently both SVM and VMX have their own DR handling code. Move it to
x86.c.

Acked-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:39 +03:00
Andre Przywara 6bc31bdc55 KVM: SVM: implement NEXTRIPsave SVM feature
On SVM we set the instruction length of skipped instructions
to hard-coded, well known values, which could be wrong when (bogus,
but valid) prefixes (REX, segment override) are used.
Newer AMD processors (Fam10h 45nm and better, aka. PhenomII or
AthlonII) have an explicit NEXTRIP field in the VMCB containing the
desired information.
Since it is cheap to do so, we use this field to override the guessed
value on newer processors.
A fix for older CPUs would be rather expensive, as it would require
to fetch and partially decode the instruction. As the problem is not
a security issue and needs special, handcrafted code to trigger
(no compiler will ever generate such code), I omit a fix for older
CPUs.
If someone is interested, I have both a patch for these CPUs as well as
demo code triggering this issue: It segfaults under KVM, but runs
perfectly on native Linux.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:38 +03:00
Avi Kivity f7a711971e KVM: Fix MAXPHYADDR calculation when cpuid does not support it
MAXPHYADDR is derived from cpuid 0x80000008, but when that isn't present, we
get some random value.

Fix by checking first that cpuid 0x80000008 is supported.

Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:36 +03:00
Avi Kivity e46479f852 KVM: Trace emulated instructions
Log emulated instructions in ftrace, especially if they failed.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:35 +03:00
Avi Kivity 2fb53ad811 KVM: x86 emulator: Don't overwrite decode cache
Currently if we an instruction spans a page boundary, when we fetch the
second half we overwrite the first half.  This prevents us from tracing
the full instruction opcodes.

Fix by appending the second half to the first.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:33 +03:00
Alexander Graf 4496f97482 KVM: PPC: Add dequeue for external on BookE
Commit a0abee86af2d1f048dbe99d2bcc4a2cefe685617 introduced unsetting of the
IRQ line from userspace. This added a new core specific callback that I
apparently forgot to add for BookE.

So let's add the callback for BookE as well, making it build again.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:32 +03:00
Xiao Guangrong 24222c2fec KVM: MMU: remove unnecessary NX check in walk_addr
After is_rsvd_bits_set() checks, EFER.NXE must be enabled if NX bit is seted

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:30 +03:00
Xiao Guangrong f84cbb0561 KVM: MMU: remove unused field
kvm_mmu_page.oos_link is not used, so remove it

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:29 +03:00
Xiao Guangrong 805d32dea4 KVM: MMU: cleanup/fix mmu audit code
This patch does:
- 'sp' parameter in inspect_spte_fn() is not used, so remove it
- fix 'kvm' and 'slots' is not defined in count_rmaps()
- fix a bug in inspect_spte_has_rmap()

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:17:27 +03:00
Alexander Graf 306d071f17 KVM: PPC: Don't export Book3S symbols on BookE
Book3S knows how to convert floats to doubles and vice versa. BookE doesn't.
So let's make sure we don't export them on BookE.

This fixes a link error on BookE with CONFIG_KVM=y.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:17:25 +03:00
Alexander Graf 287d5611fa KVM: PPC: Only use QPRs when available
BookE KVM doesn't know about QPRs, so let's not try to access then.

This fixes a build error on BookE KVM.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:17:24 +03:00
Alexander Graf 05b0ab1c0b KVM: PPC: Disable MSR_FEx for Cell hosts
Cell can't handle MSR_FE0 and MSR_FE1 too well. It gets dog slow.
So let's just override the guest whenever we see one of the two and mask them
out. See commit ddf5f75a16 for reference.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:17:21 +03:00
Alexander Graf 3ed9c6d2b5 KVM: PPC: Make bools bitfields
Bool defaults to at least byte width. We usually only want to waste a single
bit on this. So let's move all the bool values to bitfields, potentially
saving memory.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:17:20 +03:00
Alexander Graf 5a1b419fc9 KVM: PPC: Use ULL for big numbers
Some constants were bigger than ints. Let's mark them as such so we don't
accidently truncate them.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:17:18 +03:00
Alexander Graf a1eda280cc KVM: PPC: Add check if pte was mapped secondary
Some HTAB providers (namely the PS3) ignore the SECONDARY flag. They
just put an entry in the htab as secondary when they see fit.

So we need to check the return value of htab_insert to remember the
correct slot id so we can actually invalidate the entry again.

Fixes KVM on the PS3.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:17:17 +03:00
Alexander Graf bd7cdbb7fc KVM: PPC: Add emulation for dcba
Mac OS X uses the dcba instruction. According to the specification it doesn't
guarantee any functionality, so let's just emulate it as nop.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:17:15 +03:00
Alexander Graf 9fb244a2c2 KVM: PPC: Fix dcbz emulation
On most systems we need to emulate dcbz when running 32 bit guests. So
far we've been rather slack, not giving correct DSISR values to the guest.

This patch makes the emulation more accurate, introducing a difference
between "page not mapped" and "write protection fault". While at it, it
also speeds up dcbz emulation by an order of magnitude by using kmap.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:17:14 +03:00
Alexander Graf a2b07664f6 KVM: PPC: Make build work without CONFIG_VSX/ALTIVEC
The FPU/Altivec/VSX enablement also brought access to some structure
elements that are only defined when the respective config options
are enabled.

Unfortuately I forgot to check for the config options at some places,
so let's do that now.

Unbreaks the build when CONFIG_VSX is not set.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:17:12 +03:00
Alexander Graf ad0a048b09 KVM: PPC: Add OSI hypercall interface
MOL uses its own hypercall interface to call back into userspace when
the guest wants to do something.

So let's implement that as an exit reason, specify it with a CAP and
only really use it when userspace wants us to.

The only user of it so far is MOL.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:17:10 +03:00
Alexander Graf 71fbfd5f38 KVM: Add support for enabling capabilities per-vcpu
Some times we don't want all capabilities to be available to all
our vcpus. One example for that is the OSI interface, implemented
in the next patch.

In order to have a generic mechanism in how to enable capabilities
individually, this patch introduces a new ioctl that can be used
for this purpose. That way features we don't want in all guests or
userspace configurations can just not be enabled and we're good.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:17:09 +03:00
Alexander Graf ca7f4203b9 KVM: PPC: Implement alignment interrupt
Mac OS X has some applications - namely the Finder - that require alignment
interrupts to work properly. So we need to implement them.

But the spec for 970 and 750 also looks different. While 750 requires the
DSISR and DAR fields to reflect some instruction bits (DSISR) and the fault
address (DAR), the 970 declares this as an optional feature. So we need
to reconstruct DSISR and DAR manually.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:17:07 +03:00
Alexander Graf 1c85e73303 KVM: PPC: Implement emulation for lbzux and lhax
We get MMIOs with the weirdest instructions. But every time we do,
we need to improve our emulator to implement them.

So let's do that - this time it's lbzux and lhax's round.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:17:06 +03:00
Alexander Graf 1bec1677ca KVM: PPC: Make XER load 32 bit
We have a 32 bit value in the PACA to store XER in. We also do an stw
when storing XER in there. But then we load it with ld, completely
screwing it up on every entry.

Welcome to the Big Endian world.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:17:04 +03:00
Alexander Graf c04a695a44 KVM: PPC: Implement BAT reads
BATs can't only be written to, you can also read them out!
So let's implement emulation for reading BAT values again.

While at it, I also made BAT setting flush the segment cache,
so we're absolutely sure there's no MMU state left when writing
BATs.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:17:03 +03:00
Alexander Graf c664876c6d KVM: PPC: Implement mfsr emulation
We emulate the mfsrin instruction already, that passes the SR number
in a register value. But we lacked support for mfsr that encoded the
SR number in the opcode.

So let's implement it.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:17:01 +03:00
Alexander Graf a56cf347c2 KVM: PPC: Load VCPU for register fetching
When trying to read or store vcpu register data, we should also make
sure the vcpu is actually loaded, so we're 100% sure we get the correct
values.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:16:59 +03:00
Alexander Graf c2453693d4 KVM: PPC: Don't reload FPU with invalid values
When the guest activates the FPU, we load it up. That's fine when
it wasn't activated before on the host, but if it was we end up
reloading FPU values from last time the FPU was deactivated on the
host without writing the proper values back to the vcpu struct.

This patch checks if the FPU is enabled already and if so just doesn't
bother activating it, making FPU operations survive guest context switches.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:16:57 +03:00
Alexander Graf 8963221d7d KVM: PPC: Split instruction reading out
The current check_ext function reads the instruction and then does
the checking. Let's split the reading out so we can reuse it for
different functions.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:16:56 +03:00
Alexander Graf 4b389ca2e7 KVM: PPC: Book3S_32 guest MMU fixes
This patch makes the VSID of mapped pages always reflecting all special cases
we have, like split mode.

It also changes the tlbie mask to 0x0ffff000 according to the spec. The mask
we used before was incorrect.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:16:54 +03:00
Alexander Graf c8027f1652 KVM: PPC: Make DSISR 32 bits wide
DSISR is only defined as 32 bits wide. So let's reflect that in the
structs too.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:16:53 +03:00
Alexander Graf 18978768d8 KVM: PPC: Allow userspace to unset the IRQ line
Userspace can tell us that it wants to trigger an interrupt. But
so far it can't tell us that it wants to stop triggering one.

So let's interpret the parameter to the ioctl that we have anyways
to tell us if we want to raise or lower the interrupt line.

Signed-off-by: Alexander Graf <agraf@suse.de>

v2 -> v3:

 - Add CAP for unset irq
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:16:51 +03:00
Alexander Graf 3eeafd7da2 KVM: PPC: Ensure split mode works
On PowerPC we can go into MMU Split Mode. That means that either
data relocation is on but instruction relocation is off or vice
versa.

That mode didn't work properly, as we weren't always flushing
entries when going into a new split mode, potentially mapping
different code or data that we're supposed to.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:16:49 +03:00
Avi Kivity 84b0c8c6a6 KVM: MMU: Disassociate direct maps from guest levels
Direct maps are linear translations for a section of memory, used for
real mode or with large pages.  As such, they are independent of the guest
levels.

Teach the mmu about this by making page->role.glevels = 0 for direct maps.
This allows direct maps to be shared among real mode and the various paging
modes.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:16:44 +03:00
Xiao Guangrong f815bce894 KVM: MMU: check reserved bits only if CR4.PSE=1 or CR4.PAE=1
- Check reserved bits only if CR4.PAE=1 or CR4.PSE=1 when guest #PF occurs
- Fix a typo in reset_rsvds_bits_mask()

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:16:42 +03:00
Marcelo Tosatti 041b1359a2 KVM: x86: document KVM_REQ_PENDING_TIMER usage
Document that KVM_REQ_PENDING_TIMER is implicitly used during guest
entry.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:16:40 +03:00
Gleb Natapov de3e6480f7 KVM: x86 emulator: fix unlocked CMPXCHG8B emulation
When CMPXCHG8B is executed without LOCK prefix it is racy. Preserve this
behaviour in emulator too.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:16:38 +03:00
Gleb Natapov 6550e1f165 KVM: x86 emulator: add decoding of CMPXCHG8B dst operand
Decode CMPXCHG8B destination operand in decoding stage. Fixes regression
introduced by "If LOCK prefix is used dest arg should be memory" commit.
This commit relies on dst operand be decoded at the beginning of an
instruction emulation.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:16:37 +03:00
Gleb Natapov 482ac18ae2 KVM: x86 emulator: commit rflags as part of registers commit
Make sure that rflags is committed only after successful instruction
emulation.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:16:35 +03:00
Jan Kiszka 9749a6c0f0 KVM: x86: Fix 32-bit build breakage due to typo
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17 12:16:34 +03:00
Gleb Natapov 92bf9748b5 KVM: small kvm_arch_vcpu_ioctl_run() cleanup.
Unify all conditions that get us back into emulator after returning from
userspace.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:16:32 +03:00
Gleb Natapov 7b262e90fc KVM: x86 emulator: introduce pio in string read ahead.
To optimize "rep ins" instruction do IO in big chunks ahead of time
instead of doing it only when required during instruction emulation.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:16:31 +03:00
Gleb Natapov 5cd21917da KVM: x86 emulator: restart string instruction without going back to a guest.
Currently when string instruction is only partially complete we go back
to a guest mode, guest tries to reexecute instruction and exits again
and at this point emulation continues. Avoid all of this by restarting
instruction without going back to a guest mode, but return to a guest
mode each 1024 iterations to allow interrupt injection. Pending
exception causes immediate guest entry too.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:16:29 +03:00
Gleb Natapov cb404fe089 KVM: x86 emulator: remove saved_eip
c->eip is never written back in case of emulation failure, so no need to
set it to old value.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:16:28 +03:00
Gleb Natapov 7972995b0c KVM: x86 emulator: Move string pio emulation into emulator.c
Currently emulation is done outside of emulator so things like doing
ins/outs to/from mmio are broken it also makes it hard (if not impossible)
to implement single stepping in the future. The implementation in this
patch is not efficient since it exits to userspace for each IO while
previous implementation did 'ins' in batches. Further patch that
implements pio in string read ahead address this problem.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:16:26 +03:00
Gleb Natapov cf8f70bfe3 KVM: x86 emulator: fix in/out emulation.
in/out emulation is broken now. The breakage is different depending
on where IO device resides. If it is in userspace emulator reports
emulation failure since it incorrectly interprets kvm_emulate_pio()
return value. If IO device is in the kernel emulation of 'in' will do
nothing since kvm_emulate_pio() stores result directly into vcpu
registers, so emulator will overwrite result of emulation during
commit of shadowed register.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:16:25 +03:00
Gleb Natapov d9271123a4 KVM: x86 emulator: during rep emulation decrement ECX only if emulation succeeded
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:16:23 +03:00
Gleb Natapov a682e35449 KVM: x86 emulator: add decoding of X,Y parameters from Intel SDM
Add decoding of X,Y parameters from Intel SDM which are used by string
instruction to specify source and destination. Use this new decoding
to implement movs, cmps, stos, lods in a generic way.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:16:21 +03:00
Gleb Natapov 69f55cb11e KVM: x86 emulator: populate OP_MEM operand during decoding.
All struct operand fields are initialized during decoding for all
operand types except OP_MEM, but there is no reason for that. Move
OP_MEM operand initialization into decoding stage for consistency.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:16:20 +03:00
Gleb Natapov ceffb45972 KVM: Use task switch from emulator.c
Remove old task switch code from x86.c

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:16:18 +03:00
Gleb Natapov 2e873022f5 KVM: x86 emulator: Use load_segment_descriptor() instead of kvm_load_segment_descriptor()
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:16:17 +03:00
Gleb Natapov 38ba30ba51 KVM: x86 emulator: Emulate task switch in emulator.c
Implement emulation of 16/32 bit task switch in emulator.c

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:16:15 +03:00
Gleb Natapov 2dafc6c234 KVM: x86 emulator: Provide more callbacks for x86 emulator.
Provide get_cached_descriptor(), set_cached_descriptor(),
get_segment_selector(), set_segment_selector(), get_gdt(),
write_std() callbacks.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:16:14 +03:00
Gleb Natapov aca06a8307 KVM: x86 emulator: cleanup grp3 return value
When x86_emulate_insn() does not know how to emulate instruction it
exits via cannot_emulate label in all cases except when emulating
grp3. Fix that.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:16:12 +03:00
Gleb Natapov a41ffb7540 KVM: x86 emulator: If LOCK prefix is used dest arg should be memory.
If LOCK prefix is used dest arg should be memory, otherwise instruction
should generate #UD.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:16:11 +03:00
Gleb Natapov fd5253658b KVM: x86 emulator: do not call writeback if msr access fails.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:16:09 +03:00
Gleb Natapov 2e901c4cf4 KVM: x86 emulator: fix return values of syscall/sysenter/sysexit emulations
Return X86EMUL_PROPAGATE_FAULT is fault was injected. Also inject #UD
for those instruction when appropriate.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17 12:16:08 +03:00