Commit Graph

779 Commits

Author SHA1 Message Date
Paul Mackerras 8e44ddc3f3 powerpc/kvm/book3s: Add support for H_IPOLL and H_XIRR_X in XICS emulation
This adds the remaining two hypercalls defined by PAPR for manipulating
the XICS interrupt controller, H_IPOLL and H_XIRR_X.  H_IPOLL returns
information about the priority and pending interrupts for a virtual
cpu, without changing any state.  H_XIRR_X is like H_XIRR in that it
reads and acknowledges the highest-priority pending interrupt, but it
also returns the timestamp (timebase register value) from when the
interrupt was first received by the hypervisor.  Currently we just
return the current time, since we don't do any software queueing of
virtual interrupts inside the XICS emulation code.

These hcalls are not currently used by Linux guests, but may be in
future.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:27 +10:00
Marc Zyngier 535cf7b3b1 KVM: get rid of $(addprefix ../../../virt/kvm/, ...) in Makefiles
As requested by the KVM maintainers, remove the addprefix used to
refer to the main KVM code from the arch code, and replace it with
a KVM variable that does the same thing.

Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Christoffer Dall <cdall@cs.columbia.edu>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Alexander Graf <agraf@suse.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-19 15:14:00 +03:00
Linus Torvalds 01227a889e Merge tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Gleb Natapov:
 "Highlights of the updates are:

  general:
   - new emulated device API
   - legacy device assignment is now optional
   - irqfd interface is more generic and can be shared between arches

  x86:
   - VMCS shadow support and other nested VMX improvements
   - APIC virtualization and Posted Interrupt hardware support
   - Optimize mmio spte zapping

  ppc:
    - BookE: in-kernel MPIC emulation with irqfd support
    - Book3S: in-kernel XICS emulation (incomplete)
    - Book3S: HV: migration fixes
    - BookE: more debug support preparation
    - BookE: e6500 support

  ARM:
   - reworking of Hyp idmaps

  s390:
   - ioeventfd for virtio-ccw

  And many other bug fixes, cleanups and improvements"

* tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
  kvm: Add compat_ioctl for device control API
  KVM: x86: Account for failing enable_irq_window for NMI window request
  KVM: PPC: Book3S: Add API for in-kernel XICS emulation
  kvm/ppc/mpic: fix missing unlock in set_base_addr()
  kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write
  kvm/ppc/mpic: remove users
  kvm/ppc/mpic: fix mmio region lists when multiple guests used
  kvm/ppc/mpic: remove default routes from documentation
  kvm: KVM_CAP_IOMMU only available with device assignment
  ARM: KVM: iterate over all CPUs for CPU compatibility check
  KVM: ARM: Fix spelling in error message
  ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally
  KVM: ARM: Fix API documentation for ONE_REG encoding
  ARM: KVM: promote vfp_host pointer to generic host cpu context
  ARM: KVM: add architecture specific hook for capabilities
  ARM: KVM: perform HYP initilization for hotplugged CPUs
  ARM: KVM: switch to a dual-step HYP init code
  ARM: KVM: rework HYP page table freeing
  ARM: KVM: enforce maximum size for identity mapped code
  ARM: KVM: move to a KVM provided HYP idmap
  ...
2013-05-05 14:47:31 -07:00
Linus Torvalds 5a148af669 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc update from Benjamin Herrenschmidt:
 "The main highlights this time around are:

   - A pile of addition POWER8 bits and nits, such as updated
     performance counter support (Michael Ellerman), new branch history
     buffer support (Anshuman Khandual), base support for the new PCI
     host bridge when not using the hypervisor (Gavin Shan) and other
     random related bits and fixes from various contributors.

   - Some rework of our page table format by Aneesh Kumar which fixes a
     thing or two and paves the way for THP support.  THP itself will
     not make it this time around however.

   - More Freescale updates, including Altivec support on the new e6500
     cores, new PCI controller support, and a pile of new boards support
     and updates.

   - The usual batch of trivial cleanups & fixes"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits)
  powerpc: Fix build error for book3e
  powerpc: Context switch the new EBB SPRs
  powerpc: Turn on the EBB H/FSCR bits
  powerpc: Replace CPU_FTR_BCTAR with CPU_FTR_ARCH_207S
  powerpc: Setup BHRB instructions facility in HFSCR for POWER8
  powerpc: Fix interrupt range check on debug exception
  powerpc: Update tlbie/tlbiel as per ISA doc
  powerpc: Print page size info during boot
  powerpc: print both base and actual page size on hash failure
  powerpc: Fix hpte_decode to use the correct decoding for page sizes
  powerpc: Decode the pte-lp-encoding bits correctly.
  powerpc: Use encode avpn where we need only avpn values
  powerpc: Reduce PTE table memory wastage
  powerpc: Move the pte free routines from common header
  powerpc: Reduce the PTE_INDEX_SIZE
  powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format
  powerpc: New hugepage directory format
  powerpc: Don't truncate pgd_index wrongly
  powerpc: Don't hard code the size of pte page
  powerpc: Save DAR and DSISR in pt_regs on MCE
  ...
2013-05-02 10:16:16 -07:00
Paul Mackerras 5975a2e095 KVM: PPC: Book3S: Add API for in-kernel XICS emulation
This adds the API for userspace to instantiate an XICS device in a VM
and connect VCPUs to it.  The API consists of a new device type for
the KVM_CREATE_DEVICE ioctl, a new capability KVM_CAP_IRQ_XICS, which
functions similarly to KVM_CAP_IRQ_MPIC, and the KVM_IRQ_LINE ioctl,
which is used to assert and deassert interrupt inputs of the XICS.

The XICS device has one attribute group, KVM_DEV_XICS_GRP_SOURCES.
Each attribute within this group corresponds to the state of one
interrupt source.  The attribute number is the same as the interrupt
source number.

This does not support irq routing or irqfd yet.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-05-02 15:28:36 +02:00
Wei Yongjun d133b40f2c kvm/ppc/mpic: fix missing unlock in set_base_addr()
Add the missing unlock before return from function set_base_addr()
when disables the mapping.

Introduced by commit 5df554ad5b
(kvm/ppc/mpic: in-kernel MPIC emulation)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-05-02 15:28:35 +02:00
Scott Wood ed840ee9c8 kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write
These functions do an srcu_dereference without acquiring the srcu lock
themselves.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-05-02 15:28:35 +02:00
Scott Wood 1d6f6b7339 kvm/ppc/mpic: remove users
This is an unused (no pun intended) leftover from when this code did
reference counting.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-05-02 15:28:34 +02:00
Scott Wood 398d87836e kvm/ppc/mpic: fix mmio region lists when multiple guests used
Keeping a linked list of statically defined objects doesn't work
very well when we have multiple guests. :-P

Switch to an array of constant objects.  This fixes a hang when
multiple guests are used.

Signed-off-by: Scott Wood <scottwood@freescale.com>
[agraf: remove struct list_head from mem_reg]
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-05-02 15:28:33 +02:00
Linus Torvalds 20b4fb4852 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS updates from Al Viro,

Misc cleanups all over the place, mainly wrt /proc interfaces (switch
create_proc_entry to proc_create(), get rid of the deprecated
create_proc_read_entry() in favor of using proc_create_data() and
seq_file etc).

7kloc removed.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
  don't bother with deferred freeing of fdtables
  proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
  proc: Make the PROC_I() and PDE() macros internal to procfs
  proc: Supply a function to remove a proc entry by PDE
  take cgroup_open() and cpuset_open() to fs/proc/base.c
  ppc: Clean up scanlog
  ppc: Clean up rtas_flash driver somewhat
  hostap: proc: Use remove_proc_subtree()
  drm: proc: Use remove_proc_subtree()
  drm: proc: Use minor->index to label things, not PDE->name
  drm: Constify drm_proc_list[]
  zoran: Don't print proc_dir_entry data in debug
  reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
  proc: Supply an accessor for getting the data from a PDE's parent
  airo: Use remove_proc_subtree()
  rtl8192u: Don't need to save device proc dir PDE
  rtl8187se: Use a dir under /proc/net/r8180/
  proc: Add proc_mkdir_data()
  proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
  proc: Move PDE_NET() to fs/proc/proc_net.c
  ...
2013-05-01 17:51:54 -07:00
Linus Torvalds 5d434fcb25 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "Usual stuff, mostly comment fixes, typo fixes, printk fixes and small
  code cleanups"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (45 commits)
  mm: Convert print_symbol to %pSR
  gfs2: Convert print_symbol to %pSR
  m32r: Convert print_symbol to %pSR
  iostats.txt: add easy-to-find description for field 6
  x86 cmpxchg.h: fix wrong comment
  treewide: Fix typo in printk and comments
  doc: devicetree: Fix various typos
  docbook: fix 8250 naming in device-drivers
  pata_pdc2027x: Fix compiler warning
  treewide: Fix typo in printks
  mei: Fix comments in drivers/misc/mei
  treewide: Fix typos in kernel messages
  pm44xx: Fix comment for "CONFIG_CPU_IDLE"
  doc: Fix typo "CONFIG_CGROUP_CGROUP_MEMCG_SWAP"
  mmzone: correct "pags" to "pages" in comment.
  kernel-parameters: remove outdated 'noresidual' parameter
  Remove spurious _H suffixes from ifdef comments
  sound: Remove stray pluses from Kconfig file
  radio-shark: Fix printk "CONFIG_LED_CLASS"
  doc: put proper reference to CONFIG_MODULE_SIG_ENFORCE
  ...
2013-04-30 09:36:50 -07:00
Aneesh Kumar K.V b1022fbd29 powerpc: Decode the pte-lp-encoding bits correctly.
We look at both the segment base page size and actual page size and store
the pte-lp-encodings in an array per base page size.

We also update all relevant functions to take actual page size argument
so that we can use the correct PTE LP encoding in HPTE. This should also
get the basic Multiple Page Size per Segment (MPSS) support. This is needed
to enable THP on ppc64.

[Fixed PR KVM build --BenH]

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 16:00:14 +10:00
Paul Mackerras 8b78645c93 KVM: PPC: Book3S: Facilities to save/restore XICS presentation ctrler state
This adds the ability for userspace to save and restore the state
of the XICS interrupt presentation controllers (ICPs) via the
KVM_GET/SET_ONE_REG interface.  Since there is one ICP per vcpu, we
simply define a new 64-bit register in the ONE_REG space for the ICP
state.  The state includes the CPU priority setting, the pending IPI
priority, and the priority and source number of any pending external
interrupt.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:34 +02:00
Paul Mackerras d19bd86204 KVM: PPC: Book3S: Add support for ibm,int-on/off RTAS calls
This adds support for the ibm,int-on and ibm,int-off RTAS calls to the
in-kernel XICS emulation and corrects the handling of the saved
priority by the ibm,set-xive RTAS call.  With this, ibm,int-off sets
the specified interrupt's priority in its saved_priority field and
sets the priority to 0xff (the least favoured value).  ibm,int-on
restores the saved_priority to the priority field, and ibm,set-xive
sets both the priority and the saved_priority to the specified
priority value.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:33 +02:00
Paul Mackerras 4619ac88b7 KVM: PPC: Book3S HV: Improve real-mode handling of external interrupts
This streamlines our handling of external interrupts that come in
while we're in the guest.  First, when waking up a hardware thread
that was napping, we split off the "napping due to H_CEDE" case
earlier, and use the code that handles an external interrupt (0x500)
in the guest to handle that too.  Secondly, the code that handles
those external interrupts now checks if any other thread is exiting
to the host before bouncing an external interrupt to the guest, and
also checks that there is actually an external interrupt pending for
the guest before setting the LPCR MER bit (mediated external request).

This also makes sure that we clear the "ceded" flag when we handle a
wakeup from cede in real mode, and fixes a potential infinite loop
in kvmppc_run_vcpu() which can occur if we ever end up with the ceded
flag set but MSR[EE] off.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:32 +02:00
Benjamin Herrenschmidt e7d26f285b KVM: PPC: Book3S HV: Add support for real mode ICP in XICS emulation
This adds an implementation of the XICS hypercalls in real mode for HV
KVM, which allows us to avoid exiting the guest MMU context on all
threads for a variety of operations such as fetching a pending
interrupt, EOI of messages, IPIs, etc.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:32 +02:00
Benjamin Herrenschmidt 54695c3088 KVM: PPC: Book3S HV: Speed up wakeups of CPUs on HV KVM
Currently, we wake up a CPU by sending a host IPI with
smp_send_reschedule() to thread 0 of that core, which will take all
threads out of the guest, and cause them to re-evaluate their
interrupt status on the way back in.

This adds a mechanism to differentiate real host IPIs from IPIs sent
by KVM for guest threads to poke each other, in order to target the
guest threads precisely when possible and avoid that global switch of
the core to host state.

We then use this new facility in the in-kernel XICS code.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:31 +02:00
Benjamin Herrenschmidt bc5ad3f370 KVM: PPC: Book3S: Add kernel emulation for the XICS interrupt controller
This adds in-kernel emulation of the XICS (eXternal Interrupt
Controller Specification) interrupt controller specified by PAPR, for
both HV and PR KVM guests.

The XICS emulation supports up to 1048560 interrupt sources.
Interrupt source numbers below 16 are reserved; 0 is used to mean no
interrupt and 2 is used for IPIs.  Internally these are represented in
blocks of 1024, called ICS (interrupt controller source) entities, but
that is not visible to userspace.

Each vcpu gets one ICP (interrupt controller presentation) entity,
used to store the per-vcpu state such as vcpu priority, pending
interrupt state, IPI request, etc.

This does not include any API or any way to connect vcpus to their
ICP state; that will be added in later patches.

This is based on an initial implementation by Michael Ellerman
<michael@ellerman.id.au> reworked by Benjamin Herrenschmidt and
Paul Mackerras.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix typo, add dependency on !KVM_MPIC]
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:30 +02:00
Michael Ellerman 8e591cb720 KVM: PPC: Book3S: Add infrastructure to implement kernel-side RTAS calls
For pseries machine emulation, in order to move the interrupt
controller code to the kernel, we need to intercept some RTAS
calls in the kernel itself.  This adds an infrastructure to allow
in-kernel handlers to be registered for RTAS services by name.
A new ioctl, KVM_PPC_RTAS_DEFINE_TOKEN, then allows userspace to
associate token values with those service names.  Then, when the
guest requests an RTAS service with one of those token values, it
will be handled by the relevant in-kernel handler rather than being
passed up to userspace as at present.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix warning]
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:29 +02:00
Scott Wood 91194919a6 kvm/ppc/mpic: Eliminate mmio_mapped
We no longer need to keep track of this now that MPIC destruction
always happens either during VM destruction (after MMIO has been
destroyed) or during a failed creation (before the fd has been exposed
to userspace, and thus before the MMIO region could have been
registered).

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:28 +02:00
Scott Wood 07f0a7bdec kvm: destroy emulated devices on VM exit
The hassle of getting refcounting right was greater than the hassle
of keeping a list of devices to destroy on VM exit.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:28 +02:00
Alexander Graf 447a03c02a KVM: PPC: MPIC: Restrict to e500 platforms
The code as is doesn't make any sense on non-e500 platforms. Restrict it
there, so that people don't get wrong ideas on what would actually work.

This patch should get reverted as soon as it's possible to either run e500
guests on non-e500 hosts or the MPIC emulation gains support for non-e500
modes.

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:26 +02:00
Alexander Graf 5efdb4be59 KVM: PPC: MPIC: Add support for KVM_IRQ_LINE
Now that all pieces are in place for reusing generic irq infrastructure,
we can copy x86's implementation of KVM_IRQ_LINE irq injection and simply
reuse it for PPC, as it will work there just as well.

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:25 +02:00
Alexander Graf de9ba2f363 KVM: PPC: Support irq routing and irqfd for in-kernel MPIC
Now that all the irq routing and irqfd pieces are generic, we can expose
real irqchip support to all of KVM's internal helpers.

This allows us to use irqfd with the in-kernel MPIC.

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:25 +02:00
Scott Wood eb1e4f43e0 kvm/ppc/mpic: add KVM_CAP_IRQ_MPIC
Enabling this capability connects the vcpu to the designated in-kernel
MPIC.  Using explicit connections between vcpus and irqchips allows
for flexibility, but the main benefit at the moment is that it
simplifies the code -- KVM doesn't need vm-global state to remember
which MPIC object is associated with this vm, and it doesn't need to
care about ordering between irqchip creation and vcpu creation.

Signed-off-by: Scott Wood <scottwood@freescale.com>
[agraf: add stub functions for kvmppc_mpic_{dis,}connect_vcpu]
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:24 +02:00
Scott Wood 5df554ad5b kvm/ppc/mpic: in-kernel MPIC emulation
Hook the MPIC code up to the KVM interfaces, add locking, etc.

Signed-off-by: Scott Wood <scottwood@freescale.com>
[agraf: add stub function for kvmppc_mpic_set_epr, non-booke, 64bit]
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:23 +02:00
Scott Wood f0f5c481a9 kvm/ppc/mpic: adapt to kernel style and environment
Remove braces that Linux style doesn't permit, remove space after
'*' that Lindent added, keep error/debug strings contiguous, etc.

Substitute type names, debug prints, etc.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:22 +02:00
Scott Wood 6dd830a09a kvm/ppc/mpic: remove some obviously unneeded code
Remove some parts of the code that are obviously QEMU or Raven specific
before fixing style issues, to reduce the style issues that need to be
fixed.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:22 +02:00
Scott Wood b823f98f89 kvm/ppc/mpic: import hw/openpic.c from QEMU
This is QEMU's hw/openpic.c from commit
abd8d4a4d6dfea7ddea72f095f993e1de941614e ("Update version for
1.4.0-rc0"), run through Lindent with no other changes to ease merging
future changes between Linux and QEMU.  Remaining style issues
(including those introduced by Lindent) will be fixed in a later patch.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:20 +02:00
Paul Mackerras c35635efdc KVM: PPC: Book3S HV: Report VPA and DTL modifications in dirty map
At present, the KVM_GET_DIRTY_LOG ioctl doesn't report modifications
done by the host to the virtual processor areas (VPAs) and dispatch
trace logs (DTLs) registered by the guest.  This is because those
modifications are done either in real mode or in the host kernel
context, and in neither case does the access go through the guest's
HPT, and thus no change (C) bit gets set in the guest's HPT.

However, the changes done by the host do need to be tracked so that
the modified pages get transferred when doing live migration.  In
order to track these modifications, this adds a dirty flag to the
struct representing the VPA/DTL areas, and arranges to set the flag
when the VPA/DTL gets modified by the host.  Then, when we are
collecting the dirty log, we also check the dirty flags for the
VPA and DTL for each vcpu and set the relevant bit in the dirty log
if necessary.  Doing this also means we now need to keep track of
the guest physical address of the VPA/DTL areas.

So as not to lose track of modifications to a VPA/DTL area when it gets
unregistered, or when a new area gets registered in its place, we need
to transfer the dirty state to the rmap chain.  This adds code to
kvmppc_unpin_guest_page() to do that if the area was dirty.  To simplify
that code, we now require that all VPA, DTL and SLB shadow buffer areas
fit within a single host page.  Guests already comply with this
requirement because pHyp requires that these areas not cross a 4k
boundary.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:13 +02:00
Paul Mackerras a1b4a0f606 KVM: PPC: Book3S HV: Make HPT reading code notice R/C bit changes
At present, the code that determines whether a HPT entry has changed,
and thus needs to be sent to userspace when it is copying the HPT,
doesn't consider a hardware update to the reference and change bits
(R and C) in the HPT entries to constitute a change that needs to
be sent to userspace.  This adds code to check for changes in R and C
when we are scanning the HPT to find changed entries, and adds code
to set the changed flag for the HPTE when we update the R and C bits
in the guest view of the HPTE.

Since we now need to set the HPTE changed flag in book3s_64_mmu_hv.c
as well as book3s_hv_rm_mmu.c, we move the note_hpte_modification()
function into kvm_book3s_64.h.

Current Linux guest kernels don't use the hardware updates of R and C
in the HPT, so this change won't affect them.  Linux (or other) kernels
might in future want to use the R and C bits and have them correctly
transferred across when a guest is migrated, so it is better to correct
this deficiency.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:12 +02:00
Mihai Caraman d9ce6041b3 KVM: PPC: e500: Add e6500 core to Kconfig description
Add e6500 core to Kconfig description.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:11 +02:00
Mihai Caraman ea17a971c2 KVM: PPC: e500mc: Enable e6500 cores
Extend processor compatibility names to e6500 cores.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:10 +02:00
Mihai Caraman 5b21501045 KVM: PPC: e500: Remove E.PT and E.HV.LRAT categories from VCPUs
Embedded.Page Table (E.PT) category is not supported yet in e6500 kernel.
Configure TLBnCFG to remove E.PT and E.HV.LRAT categories from VCPUs.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:09 +02:00
Mihai Caraman 9a6061d7fd KVM: PPC: e500: Add support for EPTCFG register
EPTCFG register defined by E.PT is accessed unconditionally by Linux guests
in the presence of MAV 2.0. Emulate it now.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:08 +02:00
Mihai Caraman 307d9008ed KVM: PPC: e500: Add support for TLBnPS registers
Add support for TLBnPS registers available in MMU Architecture Version
(MAV) 2.0.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:07 +02:00
Mihai Caraman 8893a188b1 KVM: PPC: e500: Move vcpu's MMU configuration to dedicated functions
Vcpu's MMU default configuration and geometry update logic was buried in
a chunk of code. Move them to dedicated functions to add more clarity.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:07 +02:00
Mihai Caraman a85d2aa23e KVM: PPC: e500: Expose MMU registers via ONE_REG
MMU registers were exposed to user-space using sregs interface. Add them
to ONE_REG interface using kvmppc_get_one_reg/kvmppc_set_one_reg delegation
mechanism.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:06 +02:00
Mihai Caraman 35b299e279 KVM: PPC: Book3E: Refactor ONE_REG ioctl implementation
Refactor Book3E ONE_REG ioctl implementation to use kvmppc_get_one_reg/
kvmppc_set_one_reg delegation interface introduced by Book3S. This is
necessary for MMU SPRs which are platform specifics.

Get rid of useless case braces in the process.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:05 +02:00
Bharat Bhushan 9b4f530807 booke: exit to user space if emulator request
This allows the exit to user space if emulator request by returning
EMULATE_EXIT_USER. This will be used in subsequent patches in list

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:04 +02:00
Bharat Bhushan 0f47f9b517 KVM: extend EMULATE_EXIT_USER to support different exit reasons
Currently the instruction emulator code returns EMULATE_EXIT_USER
and common code initializes the "run->exit_reason = .." and
"vcpu->arch.hcall_needed = .." with one fixed reason.
But there can be different reasons when emulator need to exit
to user space. To support that the "run->exit_reason = .."
and "vcpu->arch.hcall_needed = .." initialization is moved a
level up to emulator.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:03 +02:00
Bharat Bhushan c402a3f457 Rename EMULATE_DO_PAPR to EMULATE_EXIT_USER
Instruction emulation return EMULATE_DO_PAPR when it requires
exit to userspace on book3s. Similar return is required
for booke. EMULATE_DO_PAPR reads out to be confusing so it is
renamed to EMULATE_EXIT_USER.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:03 +02:00
Bharat Bhushan 092d62ee93 KVM: PPC: debug stub interface parameter defined
This patch defines the interface parameter for KVM_SET_GUEST_DEBUG
ioctl support. Follow up patches will use this for setting up
hardware breakpoints, watchpoints and software breakpoints.

Also kvm_arch_vcpu_ioctl_set_guest_debug() is brought one level below.
This is because I am not sure what is required for book3s. So this ioctl
behaviour will not change for book3s.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:02 +02:00
Benjamin Herrenschmidt 234d15def9 Merge remote-tracking branch 'origin/master' into next
Merge upstream to get the audit fixes
2013-04-24 14:43:36 +10:00
Paul Mackerras 3cc33d50f5 powerpc: Fix build errors with UP configs in HV-style KVM
This fixes these errors when building UP with CONFIG_KVM_BOOK3S_64_HV=y:

arch/powerpc/kvm/book3s_hv.c:1855:2: error: implicit declaration of function 'inhibit_secondary_onlining' [-Werror=implicit-function-declaration]
arch/powerpc/kvm/book3s_hv.c:1862:2: error: implicit declaration of function 'uninhibit_secondary_onlining' [-Werror=implicit-function-declaration]
cc1: all warnings being treated as errors

and this error (with CONFIG_KVM_BOOK3S_64=m, or a vmlinux link error
with CONFIG_KVM_BOOK3S_64=y):

ERROR: "smp_send_reschedule" [arch/powerpc/kvm/kvm.ko] undefined!
make[2]: *** [__modpost] Error 1

The fix for the link error is suboptimal; ideally we want a self_ipi()
function from irq.c, connected at least to the MPIC code, to initiate
an IPI to this cpu.  The fix here at least lets the code build, and it
will work, just with interrupts being delayed sometimes.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 13:03:57 +10:00
Zhang Yanfei c843be8a54 powerpc: remove cast for kmalloc/kzalloc return value
remove cast for kmalloc/kzalloc return value.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 13:03:56 +10:00
Scott Wood be28a27c99 kvm/ppc: don't call complete_mmio_load when it's a store
complete_mmio_load writes back the mmio result into the
destination register.  Doing this on a store results in
register corruption.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-17 15:21:16 +02:00
Stuart Yoder c32498ee64 KVM: PPC: emulate dcbst
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-17 15:21:15 +02:00
Bharat Bhushan 8c32a2ea65 Added ONE_REG interface for debug instruction
This patch adds the one_reg interface to get the special instruction
to be used for setting software breakpoint from userspace.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-17 15:21:14 +02:00
Scott Wood 4d2be6f7c7 kvm/ppc/e500: eliminate tlb_refs
Commit 523f0e5421 ("KVM: PPC: E500:
Explicitly mark shadow maps invalid") began using E500_TLB_VALID
for guest TLB1 entries, and skipping invalidations if it's not set.

However, when E500_TLB_VALID was set for such entries, it was on a
fake local ref, and so the invalidations never happen.  gtlb_privs
is documented as being only for guest TLB0, though we already violate
that with E500_TLB_BITMAP.

Now that we have MMU notifiers, and thus don't need to actually
retain a reference to the mapped pages, get rid of tlb_refs, and
use gtlb_privs for E500_TLB_VALID in TLB1.

Since we can have more than one host TLB entry for a given tlbe_ref,
be careful not to clear existing flags that are relevant to other
host TLB entries when preparing a new host TLB entry.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-11 15:53:43 +02:00
Scott Wood 66a5fecdcc kvm/ppc/e500: g2h_tlb1_map: clear old bit before setting new bit
It's possible that we're using the same host TLB1 slot to map (a
presumably different portion of) the same guest TLB1 entry.  Clear
the bit in the map before setting it, so that if the esels are the same
the bit will remain set.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-11 15:53:38 +02:00
Scott Wood 6b2ba1a912 kvm/ppc/e500: h2g_tlb1_rmap: esel 0 is valid
Add one to esel values in h2g_tlb1_rmap, so that "no mapping" can be
distinguished from "esel 0".  Note that we're not saved by the fact
that host esel 0 is reserved for non-KVM use, because KVM host esel
numbering is not the raw host numbering (see to_htlb1_esel).

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-11 15:53:34 +02:00
Scott Wood c5e6cb051c kvm/powerpc/e500mc: fix tlb invalidation on cpu migration
The existing check handles the case where we've migrated to a different
core than we last ran on, but it doesn't handle the case where we're
still on the same cpu we last ran on, but some other vcpu has run on
this cpu in the meantime.

Without this, guest segfaults (and other misbehavior) have been seen in
smp guests.

Cc: stable@vger.kernel.org # 3.8.x
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-11 00:06:39 +02:00
Al Viro 75ef9de126 constify a bunch of struct file_operations instances
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-09 14:16:20 -04:00
Paul Mackerras 4fe27d2add KVM: PPC: Remove unused argument to kvmppc_core_dequeue_external
Currently kvmppc_core_dequeue_external() takes a struct kvm_interrupt *
argument and does nothing with it, in any of its implementations.
This removes it in order to make things easier for forthcoming
in-kernel interrupt controller emulation code.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-03-22 01:21:17 +01:00
Scott Wood 47bf379742 kvm/ppc/e500: eliminate tlb_refs
Commit 523f0e5421 ("KVM: PPC: E500:
Explicitly mark shadow maps invalid") began using E500_TLB_VALID
for guest TLB1 entries, and skipping invalidations if it's not set.

However, when E500_TLB_VALID was set for such entries, it was on a
fake local ref, and so the invalidations never happen.  gtlb_privs
is documented as being only for guest TLB0, though we already violate
that with E500_TLB_BITMAP.

Now that we have MMU notifiers, and thus don't need to actually
retain a reference to the mapped pages, get rid of tlb_refs, and
use gtlb_privs for E500_TLB_VALID in TLB1.

Since we can have more than one host TLB entry for a given tlbe_ref,
be careful not to clear existing flags that are relevant to other
host TLB entries when preparing a new host TLB entry.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-03-22 01:21:15 +01:00
Scott Wood 36ada4f431 kvm/ppc/e500: g2h_tlb1_map: clear old bit before setting new bit
It's possible that we're using the same host TLB1 slot to map (a
presumably different portion of) the same guest TLB1 entry.  Clear
the bit in the map before setting it, so that if the esels are the same
the bit will remain set.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-03-22 01:21:13 +01:00
Scott Wood d6940b6416 kvm/ppc/e500: h2g_tlb1_rmap: esel 0 is valid
Add one to esel values in h2g_tlb1_rmap, so that "no mapping" can be
distinguished from "esel 0".  Note that we're not saved by the fact
that host esel 0 is reserved for non-KVM use, because KVM host esel
numbering is not the raw host numbering (see to_htlb1_esel).

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-03-22 01:21:11 +01:00
Bharat Bhushan 15b708beee KVM: PPC: booke: Added debug handler
Installed debug handler will be used for guest debug support
and debug facility emulation features (patches for these
features will follow this patch).

Signed-off-by: Liu Yu <yu.liu@freescale.com>
[bharat.bhushan@freescale.com: Substantial changes]
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-03-22 01:21:09 +01:00
Bharat Bhushan 78accda4f8 KVM: PPC: Added one_reg interface for timer registers
If userspace wants to change some specific bits of TSR
(timer status register) then it uses GET/SET_SREGS ioctl interface.
So the steps will be:
      i)   user-space will make get ioctl,
      ii)  change TSR in userspace
      iii) then make set ioctl.
It can happen that TSR gets changed by kernel after step i) and
before step iii).

To avoid this we have added below one_reg ioctls for oring and clearing
specific bits in TSR. This patch adds one registerface for:
     1) setting specific bit in TSR (timer status register)
     2) clearing specific bit in TSR (timer status register)
     3) setting/getting the TCR register. There are cases where we want to only
        change TCR and not TSR. Although we can uses SREGS without
        KVM_SREGS_E_UPDATE_TSR flag but I think one reg is better. I am open
        if someone feels we should use SREGS only here.
     4) getting/setting TSR register

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-03-22 01:21:06 +01:00
Bharat Bhushan d26f22c9cd KVM: PPC: move tsr update in a separate function
This is done so that same function can be called from SREGS and
ONE_REG interface (follow up patch).

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-03-22 01:21:05 +01:00
Marcelo Tosatti 2ae33b3896 Merge remote-tracking branch 'upstream/master' into queue
Merge reason:

From: Alexander Graf <agraf@suse.de>

"Just recently this really important patch got pulled into Linus' tree for 3.9:

commit 1674400aae
Author: Anton Blanchard <anton <at> samba.org>
Date:   Tue Mar 12 01:51:51 2013 +0000

Without that commit, I can not boot my G5, thus I can't run automated tests on it against my queue.

Could you please merge kvm/next against linus/master, so that I can base my trees against that?"

* upstream/master: (653 commits)
  PCI: Use ROM images from firmware only if no other ROM source available
  sparc: remove unused "config BITS"
  sparc: delete "if !ULTRA_HAS_POPULATION_COUNT"
  KVM: Fix bounds checking in ioapic indirect register reads (CVE-2013-1798)
  KVM: x86: Convert MSR_KVM_SYSTEM_TIME to use gfn_to_hva_cache functions (CVE-2013-1797)
  KVM: x86: fix for buffer overflow in handling of MSR_KVM_SYSTEM_TIME (CVE-2013-1796)
  arm64: Kconfig.debug: Remove unused CONFIG_DEBUG_ERRORS
  arm64: Do not select GENERIC_HARDIRQS_NO_DEPRECATED
  inet: limit length of fragment queue hash table bucket lists
  qeth: Fix scatter-gather regression
  qeth: Fix invalid router settings handling
  qeth: delay feature trace
  sgy-cts1000: Remove __dev* attributes
  KVM: x86: fix deadlock in clock-in-progress request handling
  KVM: allow host header to be included even for !CONFIG_KVM
  hwmon: (lm75) Fix tcn75 prefix
  hwmon: (lm75.h) Update header inclusion
  MAINTAINERS: Remove Mark M. Hoffman
  xfs: ensure we capture IO errors correctly
  xfs: fix xfs_iomap_eof_prealloc_initial_size type
  ...

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-03-21 11:11:52 -03:00
Zhang Yanfei 6e51c9ff6a powerpc: remove cast for kmalloc/kzalloc return value
remove cast for kmalloc/kzalloc return value.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-03-18 14:16:00 +01:00
Aneesh Kumar K.V af81d7878c powerpc: Rename USER_ESID_BITS* to ESID_BITS*
Now we use ESID_BITS of kernel address to build proto vsid. So rename
USER_ESIT_BITS to ESID_BITS

Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@vger.kernel.org> [v3.8]
2013-03-17 12:45:44 +11:00
Takuya Yoshikawa 8482644aea KVM: set_memory_region: Refactor commit_memory_region()
This patch makes the parameter old a const pointer to the old memory
slot and adds a new parameter named change to know the change being
requested: the former is for removing extra copying and the latter is
for cleaning up the code.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-03-04 20:21:08 -03:00
Takuya Yoshikawa 7b6195a91d KVM: set_memory_region: Refactor prepare_memory_region()
This patch drops the parameter old, a copy of the old memory slot, and
adds a new parameter named change to know the change being requested.

This not only cleans up the code but also removes extra copying of the
memory slot structure.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-03-04 20:21:08 -03:00
Takuya Yoshikawa 462fce4606 KVM: set_memory_region: Drop user_alloc from prepare/commit_memory_region()
X86 does not use this any more.  The remaining user, s390's !user_alloc
check, can be simply removed since KVM_SET_MEMORY_REGION ioctl is no
longer supported.

Note: fixed powerpc's indentations with spaces to suppress checkpatch
errors.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-03-04 20:21:08 -03:00
Sasha Levin b67bfe0d42 hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived

        list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

        hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

 - Fix up the actual hlist iterators in linux/list.h
 - Fix up the declaration of other iterators based on the hlist ones.
 - A very small amount of places were using the 'node' parameter, this
 was modified to use 'obj->member' instead.
 - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
 properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
    <+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
    ...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:24 -08:00
Linus Torvalds 89f883372f Merge tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Marcelo Tosatti:
 "KVM updates for the 3.9 merge window, including x86 real mode
  emulation fixes, stronger memory slot interface restrictions, mmu_lock
  spinlock hold time reduction, improved handling of large page faults
  on shadow, initial APICv HW acceleration support, s390 channel IO
  based virtio, amongst others"

* tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits)
  Revert "KVM: MMU: lazily drop large spte"
  x86: pvclock kvm: align allocation size to page size
  KVM: nVMX: Remove redundant get_vmcs12 from nested_vmx_exit_handled_msr
  x86 emulator: fix parity calculation for AAD instruction
  KVM: PPC: BookE: Handle alignment interrupts
  booke: Added DBCR4 SPR number
  KVM: PPC: booke: Allow multiple exception types
  KVM: PPC: booke: use vcpu reference from thread_struct
  KVM: Remove user_alloc from struct kvm_memory_slot
  KVM: VMX: disable apicv by default
  KVM: s390: Fix handling of iscs.
  KVM: MMU: cleanup __direct_map
  KVM: MMU: remove pt_access in mmu_set_spte
  KVM: MMU: cleanup mapping-level
  KVM: MMU: lazily drop large spte
  KVM: VMX: cleanup vmx_set_cr0().
  KVM: VMX: add missing exit names to VMX_EXIT_REASONS array
  KVM: VMX: disable SMEP feature when guest is in non-paging mode
  KVM: Remove duplicate text in api.txt
  Revert "KVM: MMU: split kvm_mmu_free_page"
  ...
2013-02-24 13:07:18 -08:00
Linus Torvalds 9d3cae26ac Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc updates from Benjamin Herrenschmidt:
 "So from the depth of frozen Minnesota, here's the powerpc pull request
  for 3.9.  It has a few interesting highlights, in addition to the
  usual bunch of bug fixes, minor updates, embedded device tree updates
  and new boards:

   - Hand tuned asm implementation of SHA1 (by Paulus & Michael
     Ellerman)

   - Support for Doorbell interrupts on Power8 (kind of fast
     thread-thread IPIs) by Ian Munsie

   - Long overdue cleanup of the way we handle relocation of our open
     firmware trampoline (prom_init.c) on 64-bit by Anton Blanchard

   - Support for saving/restoring & context switching the PPR (Processor
     Priority Register) on server processors that support it.  This
     allows the kernel to preserve thread priorities established by
     userspace.  By Haren Myneni.

   - DAWR (new watchpoint facility) support on Power8 by Michael Neuling

   - Ability to change the DSCR (Data Stream Control Register) which
     controls cache prefetching on a running process via ptrace by
     Alexey Kardashevskiy

   - Support for context switching the TAR register on Power8 (new
     branch target register meant to be used by some new specific
     userspace perf event interrupt facility which is yet to be enabled)
     by Ian Munsie.

   - Improve preservation of the CFAR register (which captures the
     origin of a branch) on various exception conditions by Paulus.

   - Move the Bestcomm DMA driver from arch powerpc to drivers/dma where
     it belongs by Philippe De Muyter

   - Support for Transactional Memory on Power8 by Michael Neuling
     (based on original work by Matt Evans).  For those curious about
     the feature, the patch contains a pretty good description."

(See commit db8ff907027b: "powerpc: Documentation for transactional
memory on powerpc" for the mentioned description added to the file
Documentation/powerpc/transactional_memory.txt)

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (140 commits)
  powerpc/kexec: Disable hard IRQ before kexec
  powerpc/85xx: l2sram - Add compatible string for BSC9131 platform
  powerpc/85xx: bsc9131 - Correct typo in SDHC device node
  powerpc/e500/qemu-e500: enable coreint
  powerpc/mpic: allow coreint to be determined by MPIC version
  powerpc/fsl_pci: Store the pci ctlr device ptr in the pci ctlr struct
  powerpc/85xx: Board support for ppa8548
  powerpc/fsl: remove extraneous DIU platform functions
  arch/powerpc/platforms/85xx/p1022_ds.c: adjust duplicate test
  powerpc: Documentation for transactional memory on powerpc
  powerpc: Add transactional memory to pseries and ppc64 defconfigs
  powerpc: Add config option for transactional memory
  powerpc: Add transactional memory to POWER8 cpu features
  powerpc: Add new transactional memory state to the signal context
  powerpc: Hook in new transactional memory code
  powerpc: Routines for FP/VSX/VMX unavailable during a transaction
  powerpc: Add transactional memory unavaliable execption handler
  powerpc: Add reclaim and recheckpoint functions for context switching transactional memory processes
  powerpc: Add FP/VSX and VMX register load functions for transactional memory
  powerpc: Add helper functions for transactional memory context switching
  ...
2013-02-23 17:09:55 -08:00
Paul Mackerras deb26c274d powerpc/kvm/book3s_pr: Fix compilation on 32-bit machines
Commit a413f474a0 ("powerpc: Disable relocation on exceptions whenever
PR KVM is active") added calls to pSeries_disable_reloc_on_exc() and
pSeries_enable_reloc_on_exc() to book3s_pr.c, and added declarations
of those functions to <asm/hvcall.h>, but didn't add an include of
<asm/hvcall.h> to book3s_pr.c.  64-bit kernels seem to get hvcall.h
included via some other path, but 32-bit kernels fail to compile with:

arch/powerpc/kvm/book3s_pr.c: In function ‘kvmppc_core_init_vm’:
arch/powerpc/kvm/book3s_pr.c:1300:4: error: implicit declaration of function ‘pSeries_disable_reloc_on_exc’ [-Werror=implicit-function-declaration]
arch/powerpc/kvm/book3s_pr.c: In function ‘kvmppc_core_destroy_vm’:
arch/powerpc/kvm/book3s_pr.c:1316:4: error: implicit declaration of function ‘pSeries_enable_reloc_on_exc’ [-Werror=implicit-function-declaration]
cc1: all warnings being treated as errors
make[2]: *** [arch/powerpc/kvm/book3s_pr.o] Error 1
make[1]: *** [arch/powerpc/kvm] Error 2
make: *** [sub-make] Error 2

This fixes it by adding an include of hvcall.h.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-02-15 16:54:36 +11:00
Paul Mackerras 0acb91112a powerpc/kvm/book3s_hv: Preserve guest CFAR register value
The CFAR (Come-From Address Register) is a useful debugging aid that
exists on POWER7 processors.  Currently HV KVM doesn't save or restore
the CFAR register for guest vcpus, making the CFAR of limited use in
guests.

This adds the necessary code to capture the CFAR value saved in the
early exception entry code (it has to be saved before any branch is
executed), save it in the vcpu.arch struct, and restore it on entry
to the guest.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-02-15 16:54:33 +11:00
Alexander Graf 011da89962 KVM: PPC: BookE: Handle alignment interrupts
When the guest triggers an alignment interrupt, we don't handle it properly
today and instead BUG_ON(). This really shouldn't happen.

Instead, we should just pass the interrupt back into the guest so it can deal
with it.

Reported-by: Gao Guanhua-B22826 <B22826@freescale.com>
Tested-by: Gao Guanhua-B22826 <B22826@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-02-13 12:56:45 +01:00
Bharat Bhushan 1d542d9c2b KVM: PPC: booke: Allow multiple exception types
Current kvmppc_booke_handlers uses the same macro (KVM_HANDLER) and
all handlers are considered to be the same size. This will not be
the case if we want to use different macros for different handlers.

This patch improves the kvmppc_booke_handler so that it can
support different macros for different handlers.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
[bharat.bhushan@freescale.com: Substantial changes]
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-02-13 12:56:40 +01:00
Bharat Bhushan ffe129ecd7 KVM: PPC: booke: use vcpu reference from thread_struct
Like other places, use thread_struct to get vcpu reference.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-02-13 12:56:39 +01:00
Benjamin Herrenschmidt dfd0436ad0 Merge branch 'merge' into next
Merge "merge" branch to bring in various bug fixes that are
going into 3.8
2013-01-29 11:33:37 +11:00
Greg Kroah-Hartman 422d26b6ec Merge 3.8-rc5 into driver-core-next
This resolves a gpio driver merge issue pointed out in linux-next.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-25 21:06:30 -08:00
Alexander Graf b9e3e20893 KVM: PPC: E500: Remove kvmppc_e500_tlbil_all usage from guest TLB code
The guest TLB handling code should not have any insight into how the host
TLB shadow code works.

kvmppc_e500_tlbil_all() is a function that is used for distinction between
e500v2 and e500mc (E.HV) on how to flush shadow entries. This function really
is private between the e500.c/e500mc.c file and e500_mmu_host.c.

Instead of this one, use the public kvmppc_core_flush_tlb() function to flush
all shadow TLB entries. As a nice side effect, with this we also end up
flushing TLB1 entries which we forgot to do before.

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-01-24 19:23:34 +01:00
Alexander Graf 483ba97c0f KVM: PPC: E500: Make clear_tlb_refs and clear_tlb1_bitmap static
Host shadow TLB flushing is logic that the guest TLB code should have
no insight about. Declare the internal clear_tlb_refs and clear_tlb1_bitmap
functions static to the host TLB handling file.

Instead of these, we can use the already exported kvmppc_core_flush_tlb().
This gives us a common API across the board to say "please flush any
pending host shadow translation".

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-01-24 19:23:33 +01:00
Alexander Graf c015c62b13 KVM: PPC: e500: Implement TLB1-in-TLB0 mapping
When a host mapping fault happens in a guest TLB1 entry today, we
map the translated guest entry into the host's TLB1.

This isn't particularly clever when the guest is mapped by normal 4k
pages, since these would be a lot better to put into TLB0 instead.

This patch adds the required logic to map 4k TLB1 shadow maps into
the host's TLB0.

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-01-24 19:23:32 +01:00
Alexander Graf b71c9e2fb7 KVM: PPC: E500: Split host and guest MMU parts
This patch splits the file e500_tlb.c into e500_mmu.c (guest TLB handling)
and e500_mmu_host.c (host TLB handling).

The main benefit of this split is readability and maintainability. It's
just a lot harder to write dirty code :).

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-01-24 19:23:31 +01:00
Alexander Graf 9d98b3ff94 KVM: PPC: e500: Call kvmppc_mmu_map for initial mapping
When emulating tlbwe, we want to automatically map the entry that just got
written in our shadow TLB map, because chances are quite high that it's
going to be used very soon.

Today this happens explicitly, duplicating all the logic that is in
kvmppc_mmu_map() already. Just call that one instead.

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-01-24 19:23:31 +01:00
Alexander Graf 2c378fd779 KVM: PPC: E500: Propagate errors when shadow mapping
When shadow mapping a page, mapping this page can fail. In that case we
don't have a shadow map.

Take this case into account, otherwise we might end up writing bogus TLB
entries into the host TLB.

While at it, also move the write_stlbe() calls into the respective TLBn
handlers.

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-01-24 19:23:30 +01:00
Alexander Graf 523f0e5421 KVM: PPC: E500: Explicitly mark shadow maps invalid
When we invalidate shadow TLB maps on the host, we don't mark them
as not valid. But we should.

Fix this by removing the E500_TLB_VALID from their flags when
invalidating.

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-01-24 19:23:30 +01:00
Alexander Graf 9445ef0181 KVM: PPC: E500: Move write_stlbe higher
Later patches want to call the function and it doesn't have
dependencies on anything below write_host_tlbe.

Move it higher up in the file.

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-01-24 19:23:29 +01:00
Kees Cook 07ff8b5358 arch/powerpc/kvm: remove depends on CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

CC: Alexander Graf <agraf@suse.de>
CC: Avi Kivity <avi@redhat.com>
CC: Marcelo Tosatti <mtosatti@redhat.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-21 14:43:12 -08:00
Alexander Graf d3286144c9 KVM: PPC: Emulate dcbf
Guests can trigger MMIO exits using dcbf. Since we don't emulate cache
incoherent MMIO, just do nothing and move on.

Reported-by: Ben Collins <ben.c@servergy.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Tested-by: Ben Collins <ben.c@servergy.com>
CC: stable@vger.kernel.org
2013-01-18 00:40:49 +01:00
Alexander Graf 324b3e6316 KVM: PPC: BookE: Add EPR ONE_REG sync
We need to be able to read and write the contents of the EPR register
from user space.

This patch implements that logic through the ONE_REG API and declares
its (never implemented) SREGS counterpart as deprecated.

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-01-10 13:42:33 +01:00
Alexander Graf 1c81063655 KVM: PPC: BookE: Implement EPR exit
The External Proxy Facility in FSL BookE chips allows the interrupt
controller to automatically acknowledge an interrupt as soon as a
core gets its pending external interrupt delivered.

Today, user space implements the interrupt controller, so we need to
check on it during such a cycle.

This patch implements logic for user space to enable EPR exiting,
disable EPR exiting and EPR exiting itself, so that user space can
acknowledge an interrupt when an external interrupt has successfully
been delivered into the guest vcpu.

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-01-10 13:42:31 +01:00
Alexander Graf 37ecb257f6 KVM: PPC: BookE: Emulate mfspr on EPR
The EPR register is potentially valid for PR KVM as well, so we need
to emulate accesses to it. It's only defined for reading, so only
handle the mfspr case.

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-01-10 13:42:30 +01:00
Alexander Graf b8c649a99d KVM: PPC: BookE: Allow irq deliveries to inject requests
When injecting an interrupt into guest context, we usually don't need
to check for requests anymore. At least not until today.

With the introduction of EPR, we will have to create a request when the
guest has successfully accepted an external interrupt though.

So we need to prepare the interrupt delivery to abort guest entry
gracefully. Otherwise we'd delay the EPR request.

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-01-10 13:42:21 +01:00
Mihai Caraman f2be655004 KVM: PPC: Fix mfspr/mtspr MMUCFG emulation
On mfspr/mtspr emulation path Book3E's MMUCFG SPR with value 1015 clashes
with G4's MSSSR0 SPR. Move MSSSR0 emulation from generic part to Books3S.
MSSSR0 also clashes with Book3S's DABRX SPR. DABRX was not explicitly
handled so Book3S execution flow will behave as before.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-01-10 13:30:11 +01:00
Alexander Graf 50c7bb80b5 KVM: PPC: Book3S: PR: Enable alternative instruction for SC 1
When running on top of pHyp, the hypercall instruction "sc 1" goes
straight into pHyp without trapping in supervisor mode.

So if we want to support PAPR guest in this configuration we need to
add a second way of accessing PAPR hypercalls, preferably with the
exact same semantics except for the instruction.

So let's overlay an officially reserved instruction and emulate PAPR
hypercalls whenever we hit that one.

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-01-10 13:15:08 +01:00
Alexander Graf 5a33169ed2 KVM: PPC: Only WARN on invalid emulation
When we hit an emulation result that we didn't expect, that is an error,
but it's nothing that warrants a BUG(), because it can be guest triggered.

So instead, let's only WARN() the user that this happened.

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-01-10 13:15:08 +01:00
Ian Munsie a413f474a0 powerpc: Disable relocation on exceptions whenever PR KVM is active
For PR KVM we allow userspace to map 0xc000000000000000. Because
transitioning from userspace to the guest kernel may use the relocated
exception vectors we have to disable relocation on exceptions whenever
PR KVM is active as we cannot trust that address.

This issue does not apply to HV KVM, since changing from a guest to the
hypervisor will never use the relocated exception vectors.

Currently the hypervisor interface only allows us to toggle relocation
on exceptions on a partition wide scope, so we need to globally disable
relocation on exceptions when the first PR KVM instance is started and
only re-enable them when all PR KVM instances have been destroyed.

It's a bit heavy handed, but until the hypervisor gives us a lightweight
way to toggle relocation on exceptions on a single thread it's only real
option.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-01-10 17:00:42 +11:00
Andreas Schwab d591390da9 KVM: PPC: Book3S HV: Fix compilation without CONFIG_PPC_POWERNV
Fixes this build breakage:

arch/powerpc/kvm/book3s_hv_ras.c: In function ‘kvmppc_realmode_mc_power7’:
arch/powerpc/kvm/book3s_hv_ras.c:126:23: error: ‘struct paca_struct’ has no member named ‘opal_mc_evt’

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-01-06 14:02:00 +01:00
Alex Williamson f82a8cfe93 KVM: struct kvm_memory_slot.user_alloc -> bool
There's no need for this to be an int, it holds a boolean.
Move to the end of the struct for alignment.

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-12-13 23:24:38 -02:00
Alex Williamson bbacc0c111 KVM: Rename KVM_MEMORY_SLOTS -> KVM_USER_MEM_SLOTS
It's easy to confuse KVM_MEMORY_SLOTS and KVM_MEM_SLOTS_NUM.  One is
the user accessible slots and the other is user + private.  Make this
more obvious.

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-12-13 23:21:57 -02:00
Mihai Caraman 352df1deb2 KVM: PPC: booke: Get/set guest EPCR register using ONE_REG interface
Implement ONE_REG interface for EPCR register adding KVM_REG_PPC_EPCR to
the list of ONE_REG PPC supported registers.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
[agraf: remove HV dependency, use get/put_user]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:34:20 +01:00
Mihai Caraman 38f988240c KVM: PPC: bookehv: Add EPCR support in mtspr/mfspr emulation
Add EPCR support in booke mtspr/mfspr emulation. EPCR register is defined only
for 64-bit and HV categories, we will expose it at this point only to 64-bit
virtual processors running on 64-bit HV hosts.
Define a reusable setter function for vcpu's EPCR.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
[agraf: move HV dependency in the code]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:34:19 +01:00