Cores provide 3 emulation hooks, implemented for example in the new
4xx_emulate.c:
kvmppc_core_emulate_op
kvmppc_core_emulate_mtspr
kvmppc_core_emulate_mfspr
Strictly speaking the last two aren't necessary, but provide for more
informative error reporting ("unknown SPR").
Long term I'd like to have instruction decoding autogenerated from tables of
opcodes, and that way we could aggregate universal, Book E, and core-specific
instructions more easily and without redundant switch statements.
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This is used in a couple places in KVM, but isn't KVM-specific.
However, this patch doesn't modify other in-kernel emulation code:
- xmon uses a direct copy of ppc_opc.c from binutils
- emulate_instruction() doesn't need it because it can use a series
of mask tests.
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
This introduces a set of core-provided hooks. For 440, some of these are
implemented by booke.c, with the rest in (the new) 44x.c.
Note that these hooks are link-time, not run-time. Since it is not possible to
build a single kernel for both e500 and 440 (for example), using function
pointers would only add overhead.
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The division was somewhat artificial and cumbersome, and had no functional
benefit anyways: we can only guests built for the real host processor.
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This will ease ports to other cores.
Also remove unused "struct kvm_tlb" while we're at it.
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This will make it easier to provide implementations for other cores.
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Some areas of kvm x86 mmu are using gfn offset inside a slot without
unaliasing the gfn first. This patch makes sure that the gfn will be
unaliased and add gfn_to_memslot_unaliased() to save the calculating
of the gfn unaliasing in case we have it unaliased already.
Signed-off-by: Izik Eidus <ieidus@redhat.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Ideally, every assigned device should in a clear condition before and after
assignment, so that the former state of device won't affect later work.
Some devices provide a mechanism named Function Level Reset, which is
defined in PCI/PCI-e document. We should execute it before and after device
assignment.
(But sadly, the feature is new, and most device on the market now don't
support it. We are considering using D0/D3hot transmit to emulate it later,
but not that elegant and reliable as FLR itself.)
[Update: Reminded by Xiantao, execute FLR after we ensure that the device can
be assigned to the guest.]
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Remove the lock protection for kvm halt logic, otherwise,
once other vcpus want to acquire the lock, and they have to
wait all vcpus are waken up from halt.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
If emulate_invalid_guest_state is enabled, the emulator is called
when guest state is invalid. Until now, we reported an mmio failure
when emulate_instruction() returned EMULATE_DO_MMIO. This patch adds
the case where emulate_instruction() failed and an MMIO emulation
is needed.
Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Avi Kivity <avi@redhat.com>
If we call the emulator we shouldn't call skip_emulated_instruction()
in the first place, since the emulator already computes the next rip
for us. Thus we move ->skip_emulated_instruction() out of
kvm_emulate_pio() and into handle_io() (and the svm equivalent). We
also replaced "return 0" by "break" in the "do_io:" case because now
the shadow register state needs to be committed. Otherwise eip will never
be updated.
Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Avi Kivity <avi@redhat.com>
The busy flag of the TR selector is not set by the hardware. This breaks
migration from amd hosts to intel hosts.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The hardware does not set the 'g' bit of the cs selector and this breaks
migration from amd hosts to intel hosts. Set this bit if the segment
limit is beyond 1 MB.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Also remove unnecessary parameter of unregister irq ack notifier.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
As suggested by Avi, this patch introduces a counter of VCPUs that have
LVT0 set to NMI mode. Only if the counter > 0, we push the PIT ticks via
all LAPIC LVT0 lines to enable NMI watchdog support.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Acked-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch refactors the NMI watchdog delivery patch, consolidating
tests and providing a proper API for delivering watchdog events.
An included micro-optimization is to check only for apic_hw_enabled in
kvm_apic_local_deliver (the test for LVT mask is covering the
soft-disabled case already).
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Acked-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
PCI device assignment would map guest MMIO spaces as separate slot, so it is
possible that the device has more than 2 MMIO spaces and overwrite current
private memslot.
The patch move private memory slot to the top of userspace visible memory slots.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Otherwise set_bit() for private memory slot(above KVM_MEMORY_SLOTS) would
corrupted memory in 32bit host.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The effective memory type of EPT is the mixture of MSR_IA32_CR_PAT and memory
type field of EPT entry.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
GUEST_PAT support is a new feature introduced by Intel Core i7 architecture.
With this, cpu would save/load guest and host PAT automatically, for EPT memory
type in guest depends on MSR_IA32_CR_PAT.
Also add save/restore for MSR_IA32_CR_PAT.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
For KVM can reuse the type define, and need them to support shadow MTRR.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Call kvm_arch_vcpu_reset() instead of directly using arch callback.
The function does additional things.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Older VMX supporting CPUs do not provide the "Virtual NMI" feature for
tracking the NMI-blocked state after injecting such events. For now
KVM is unable to inject NMIs on those CPUs.
Derived from Sheng Yang's suggestion to use the IRQ window notification
for detecting the end of NMI handlers, this patch implements virtual
NMI support without impact on the host's ability to receive real NMIs.
The downside is that the given approach requires some heuristics that
can cause NMI nesting in vary rare corner cases.
The approach works as follows:
- inject NMI and set a software-based NMI-blocked flag
- arm the IRQ window start notification whenever an NMI window is
requested
- if the guest exits due to an opening IRQ window, clear the emulated
NMI-blocked flag
- if the guest net execution time with NMI-blocked but without an IRQ
window exceeds 1 second, force NMI-blocked reset and inject anyway
This approach covers most practical scenarios:
- succeeding NMIs are seperated by at least one open IRQ window
- the guest may spin with IRQs disabled (e.g. due to a bug), but
leaving the NMI handler takes much less time than one second
- the guest does not rely on strict ordering or timing of NMIs
(would be problematic in virtualized environments anyway)
Successfully tested with the 'nmi n' monitor command, the kgdbts
testsuite on smp guests (additional patches required to add debug
register support to kvm) + the kernel's nmi_watchdog=1, and a Siemens-
specific board emulation (+ guest) that comes with its own NMI
watchdog mechanism.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch adds the required bits to the VMX side for user space
injected NMIs. As with the preexisting in-kernel irqchip support, the
CPU must provide the "virtual NMI" feature for proper tracking of the
NMI blocking state.
Based on the original patch by Sheng Yang.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Introduces the KVM_NMI IOCTL to the generic x86 part of KVM for
injecting NMIs from user space and also extends the statistic report
accordingly.
Based on the original patch by Sheng Yang.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Kick the NMI receiving VCPU in case the triggering caller runs in a
different context.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Ensure that a VCPU with pending NMIs is considered runnable.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
LINT0 of the LAPIC can be used to route PIT events as NMI watchdog ticks
into the guest. This patch aligns the in-kernel irqchip emulation with
the user space irqchip with already supports this feature. The trick is
to route PIT interrupts to all LAPIC's LVT0 lines.
Rebased and slightly polished patch originally posted by Sheng Yang.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Fix NMI injection in real-mode with the same pattern we perform IRQ
injection.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
do_interrupt_requests and vmx_intr_assist go different way for
achieving the same: enabling the nmi/irq window start notification.
Unify their code over enable_{irq|nmi}_window, get rid of a redundant
call to enable_intr_window instead of direct enable_nmi_window
invocation and unroll enable_intr_window for both in-kernel and user
space irq injection accordingly.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
There are currently two ways in VMX to check if an IRQ or NMI can be
injected:
- vmx_{nmi|irq}_enabled and
- vcpu.arch.{nmi|interrupt}_window_open.
Even worse, one test (at the end of vmx_vcpu_run) uses an inconsistent,
likely incorrect logic.
This patch consolidates and unifies the tests over
{nmi|interrupt}_window_open as cache + vmx_update_window_states
for updating the cache content.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
CPU reset invalidates pending or already injected NMIs, therefore reset
the related state variables.
Based on original patch by Gleb Natapov.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Properly set GUEST_INTR_STATE_NMI and reset nmi_injected when a
task-switch vmexit happened due to a task gate being used for handling
NMIs. Also avoid the false warning about valid vectoring info in
kvm_handle_exit.
Based on original patch by Gleb Natapov.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
irq_window_exits only tracks IRQ window exits due to user space
requests, nmi_window_exits include all exits. The latter makes more
sense, so let's adjust irq_window_exits accounting.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
* 'for-linus' of git://oss.sgi.com/xfs/xfs: (184 commits)
[XFS] Fix race in xfs_write() between direct and buffered I/O with DMAPI
[XFS] handle unaligned data in xfs_bmbt_disk_get_all
[XFS] avoid memory allocations in xfs_fs_vcmn_err
[XFS] Fix speculative allocation beyond eof
[XFS] Remove XFS_BUF_SHUT() and friends
[XFS] Use the incore inode size in xfs_file_readdir()
[XFS] set b_error from bio error in xfs_buf_bio_end_io
[XFS] use inode_change_ok for setattr permission checking
[XFS] add a FMODE flag to make XFS invisible I/O less hacky
[XFS] resync headers with libxfs
[XFS] simplify projid check in xfs_rename
[XFS] replace b_fspriv with b_mount
[XFS] Remove unused tracing code
[XFS] Remove unnecessary assertion
[XFS] Remove unused variable in ktrace_free()
[XFS] Check return value of xfs_buf_get_noaddr()
[XFS] Fix hang after disallowed rename across directory quota domains
[XFS] Fix compile with CONFIG_COMPAT enabled
move inode tracing out of xfs_vnode.
move vn_iowait / vn_iowake into xfs_aops.c
...
* git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (70 commits)
fs/nfs/nfs4proc.c: make nfs4_map_errors() static
rpc: add service field to new upcall
rpc: add target field to new upcall
nfsd: support callbacks with gss flavors
rpc: allow gss callbacks to client
rpc: pass target name down to rpc level on callbacks
nfsd: pass client principal name in rsc downcall
rpc: implement new upcall
rpc: store pointer to pipe inode in gss upcall message
rpc: use count of pipe openers to wait for first open
rpc: track number of users of the gss upcall pipe
rpc: call release_pipe only on last close
rpc: add an rpc_pipe_open method
rpc: minor gss_alloc_msg cleanup
rpc: factor out warning code from gss_pipe_destroy_msg
rpc: remove unnecessary assignment
NFS: remove unused status from encode routines
NFS: increment number of operations in each encode routine
NFS: fix comment placement in nfs4xdr.c
NFS: fix tabs in nfs4xdr.c
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/mlx4: Fix reading SL field out of cqe->sl_vid
RDMA/addr: Fix build breakage when IPv6 is disabled
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6: (583 commits)
V4L/DVB (10130): use USB API functions rather than constants
V4L/DVB (10129): dvb: remove deprecated use of RW_LOCK_UNLOCKED in frontends
V4L/DVB (10128): modify V4L documentation to be a valid XHTML
V4L/DVB (10127): stv06xx: Avoid having y unitialized
V4L/DVB (10125): em28xx: Don't do AC97 vendor detection for i2s audio devices
V4L/DVB (10124): em28xx: expand output formats available
V4L/DVB (10123): em28xx: fix reversed definitions of I2S audio modes
V4L/DVB (10122): em28xx: don't load em28xx-alsa for em2870 based devices
V4L/DVB (10121): em28xx: remove worthless Pinnacle PCTV HD Mini 80e device profile
V4L/DVB (10120): em28xx: remove redundant Pinnacle Dazzle DVC 100 profile
V4L/DVB (10119): em28xx: fix corrupted XCLK value
V4L/DVB (10118): zoran: fix warning for a variable not used
V4L/DVB (10116): af9013: Fix gcc false warnings
V4L/DVB (10111a): usbvideo.h: remove an useless blank line
V4L/DVB (10111): quickcam_messenger.c: fix a warning
V4L/DVB (10110): v4l2-ioctl: Fix warnings when using .unlocked_ioctl = __video_ioctl2
V4L/DVB (10109): anysee: Fix usage of an unitialized function
V4L/DVB (10104): uvcvideo: Add support for video output devices
V4L/DVB (10102): uvcvideo: Ignore interrupt endpoint for built-in iSight webcams.
V4L/DVB (10101): uvcvideo: Fix bulk URB processing when the header is erroneous
...
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
net: Fix percpu counters deadlock
cpumask: prepare for iterators to only go to nr_cpu_ids/nr_cpumask_bits: net
drivers/net/usb: use USB API functions rather than constants
cls_cgroup: clean up Kconfig
cls_cgroup: clean up for cgroup part
cls_cgroup: fix an oops when removing a cgroup
EtherExpress16: fix printing timed out status
mlx4_en: Added "set_ringparam" Ethtool interface implementation
mlx4_en: Always allocate RX ring for each interrupt vector
mlx4_en: Verify number of RX rings doesn't exceed MAX_RX_RINGS
IPVS: Make "no destination available" message more consistent between schedulers
net: KS8695: removed duplicated #include
tun: Fix SIOCSIFHWADDR error.
smsc911x: compile fix re netif_rx signature changes
netns: foreach_netdev_safe is insufficient in default_device_exit
net: make xfrm_statistics_seq_show use generic snmp_fold_field
net: Fix more NAPI interface netdev argument drop fallout.
net: Fix unused variable warnings in pasemi_mac.c and spider_net.c