Commit Graph

236 Commits

Author SHA1 Message Date
Jeremy Fitzhardinge af7ae3b9c4 xen: allow compilation with non-flat memory
There's no real reason we can't support sparsemem/discontigmem, so do so.
This is mostly useful to support hotplug memory.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:33 +02:00
Jeremy Fitzhardinge 41e332b2a2 xen: disable preemption during tlb flush
Various places in the kernel flush the tlb even though preemption doens't
guarantee the tlb flush is happening on any particular CPU.  In many cases
this doesn't seem to matter, so don't make a fuss about it.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:33 +02:00
Jeremy Fitzhardinge dbe9e994c9 xen: no need for domU to worry about MCE/MCA
Mask MCE/MCA out of cpu caps.  Its harmless to leave them there, but
it does prevent the kernel from starting an unnecessary thread.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:32 +02:00
Jeremy Fitzhardinge e2a81baf66 xen: support sysenter/sysexit if hypervisor does
64-bit Xen supports sysenter for 32-bit guests, so support its
use.  (sysenter is faster than int $0x80 in 32-on-64.)

sysexit is still not supported, so we fake it up using iret.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge 6944a9c894 x86: rename paravirt_alloc_pt etc after the pagetable structure
Rename (alloc|release)_(pt|pd) to pte/pmd to explicitly match the name
of the appropriate pagetable level structure.

[ x86.git merge work by Mark McLoughlin <markmc@redhat.com> ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge 81e103f1f1 xen: use iret instruction all the time
Change iret implementation to not be dependent on direct-access vcpu
structure.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:40:51 +02:00
Markus Armbruster b8c2d3dfbc xen: make hvc0 the preferred console in domU
This makes the Xen console just work.  Before, you had to ask for it
on the kernel command line with console=hvc0

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17 17:40:51 +02:00
Mark McLoughlin c946c7de49 xen: Clear PG_pinned in release_{pt,pd}()
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Cc: xen-devel@lists.xensource.com
Cc: Mark McLoughlin <markmc@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-04 18:36:48 +02:00
Mark McLoughlin a684d69d15 xen: Do not pin/unpin PMD pages
i.e. with this simple test case:

    int fd = open("/dev/zero", O_RDONLY);
    munmap(mmap((void *)0x40000000, 0x1000_LEN, PROT_READ, MAP_PRIVATE, fd, 0), 0x1000);
    close(fd);

we currently get:

   kernel BUG at arch/x86/xen/enlighten.c:678!
   ...
   EIP is at xen_release_pt+0x79/0xa9
   ...
   Call Trace:
    [<c041da25>] ? __pmd_free_tlb+0x1a/0x75
    [<c047a192>] ? free_pgd_range+0x1d2/0x2b5
    [<c047a2f3>] ? free_pgtables+0x7e/0x93
    [<c047b272>] ? unmap_region+0xb9/0xf5
    [<c047c1bd>] ? do_munmap+0x193/0x1f5
    [<c047c24f>] ? sys_munmap+0x30/0x3f
    [<c0408cce>] ? syscall_call+0x7/0xb
    =======================

and xen complains:

  (XEN) mm.c:2241:d4 Mfn 1cc37 not pinned

Further details at:

  https://bugzilla.redhat.com/436453

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Cc: xen-devel@lists.xensource.com
Cc: Mark McLoughlin <markmc@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-04 18:36:48 +02:00
Mark McLoughlin f64337062c xen: refactor xen_{alloc,release}_{pt,pd}()
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Cc: xen-devel@lists.xensource.com
Cc: Mark McLoughlin <markmc@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-04 18:36:48 +02:00
Jeremy Fitzhardinge 2e8fe719b5 xen: fix UP setup of shared_info
We need to set up the shared_info pointer once we've mapped the real
shared_info into its fixmap slot.  That needs to happen once the general
pagetable setup has been done.  Previously, the UP shared_info was set
up one in xen_start_kernel, but that was left pointing to the dummy
shared info.  Unfortunately there's no really good place to do a later
setup of the shared_info in UP, so just do it once the pagetable setup
has been done.

[ Stable: needed in 2.6.24.x ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-27 16:08:45 +01:00
Jeremy Fitzhardinge 04c44a080d xen: fix RMW when unmasking events
xen_irq_enable_direct and xen_sysexit were using "andw $0x00ff,
XEN_vcpu_info_pending(vcpu)" to unmask events and test for pending ones
in one instuction.

Unfortunately, the pending flag must be modified with a locked operation
since it can be set by another CPU, and the unlocked form of this
operation was causing the pending flag to get lost, allowing the processor
to return to usermode with pending events and ultimately deadlock.

The simple fix would be to make it a locked operation, but that's rather
costly and unnecessary.  The fix here is to split the mask-clearing and
pending-testing into two instructions; the interrupt window between
them is of no concern because either way pending or new events will
be processed.

This should fix lingering bugs in using direct vcpu structure access too.

[ Stable: needed in 2.6.24.x ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-27 16:08:45 +01:00
Jeremy Fitzhardinge d40e705903 xen: mask out SEP from CPUID
Fix 32-on-64 pvops kernel:

we don't want userspace using syscall/sysenter, even if the hypervisor
supports it, so mask it out from CPUID.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-29 18:55:43 +01:00
Jeremy Fitzhardinge 2b5407811d xen: unpin initial Xen pagetable once we're finished with it
Unpin the Xen-provided pagetable once we've finished with it, so it
doesn't cause stray references which cause later swapper_pg_dir
pagetable updates to fail.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Tested-by: Jody Belka <knew-linux@pimb.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-13 16:20:35 +01:00
Jeremy Fitzhardinge 1c70e9bd83 xen: deal with pmd being allocated/freed
Deal properly with pmd-level pages being allocated and freed
dynamically.  We can handle them more or less the same as pte pages.

Also, deal with early_ioremap pagetable manipulations.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:39 +01:00
Jan Beulich 4dbf7af644 x86: adjust/fix LDT handling for Xen
Based on patch from Jan Beulich <jbeulich@novell.com>.

Don't rely on kmalloc(PAGE_SIZE) returning PAGE_SIZE aligned memory
(Xen requires GDT *and* LDT to be page-aligned). Using the page
allocator interface also removes the (albeit small) slab allocator
overhead. The same change being done for 64-bits for consistency.

Further, the Xen hypercall interface expects the LDT address to be
virtual, not machine.

[ Adjusted to unified ldt.c - Jeremy ]

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:33:14 +01:00
Glauber de Oliveira Costa 75b8bb3e56 x86: change write_ldt_entry signature
this patch changes the signature of write_ldt_entry.

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
CC: Zachary Amsden <zach@vmware.com>
CC: Jeremy Fitzhardinge <Jeremy.Fitzhardinge.citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:31:13 +01:00
Glauber de Oliveira Costa 014b15be30 x86: change write_gdt_entry signature.
This patch changes the write_gdt_entry function signature.
Instead of the old "a" and "b" parameters, it now receives
a pointer to a desc_struct, and the size of the entry being
handled. This is because x86_64 can have some 16-byte entries
as well as 8-byte ones.

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
CC: Zachary Amsden <zach@vmware.com>
CC: Jeremy Fitzhardinge <Jeremy.Fitzhardinge.citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:31:13 +01:00
Glauber de Oliveira Costa 8d947344c4 x86: change write_idt_entry signature
this patch changes write_idt_entry signature. It now takes a gate_desc
instead of the a and b parameters. It will allow it to be later unified
between i386 and x86_64.

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
CC: Zachary Amsden <zach@vmware.com>
CC: Jeremy Fitzhardinge <Jeremy.Fitzhardinge.citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:31:12 +01:00
Glauber de Oliveira Costa 6b68f01baa x86: unify struct desc_ptr
This patch unifies struct desc_ptr between i386 and x86_64.
They can be expressed in the exact same way in C code, only
having to change the name of one of them. As Xgt_desc_struct
is ugly and big, this is the one that goes away.

There's also a padding field in i386, but it is not really
needed in the C structure definition.

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:31:12 +01:00
H. Peter Anvin faca62273b x86: use generic register name in the thread and tss structures
This changes size-specific register names (eip/rip, esp/rsp, etc.) to
generic names in the thread and tss structures.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:31:02 +01:00
H. Peter Anvin 65ea5b0349 x86: rename the struct pt_regs members for 32/64-bit consistency
We have a lot of code which differs only by the naming of specific
members of structures that contain registers.  In order to enable
additional unifications, this patch drops the e- or r- size prefix
from the register names in struct pt_regs, and drops the x- prefixes
for segment registers on the 32-bit side.

This patch also performs the equivalent renames in some additional
places that might be candidates for unification in the future.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:30:56 +01:00
Glauber de Oliveira Costa 6abcd98ffa x86: irqflags consolidation
This patch consolidates the irqflags include files containing common
paravirt definitions. The native definition for interrupt handling, halt,
and such, are the same for 32 and 64 bit, and they are kept in irqflags.h.
the differences are split in the arch-specific files.

The syscall function, irq_enable_sysexit, has a very specific i386 naming,
and its name is then changed to a more general one.

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:30:33 +01:00
Thomas Gleixner 42e0a9aa5d x86: use u32 for some lapic functions
Use u32 so 32 and 64bit have the same interface.

Andrew Morton: xen, lguest build fixes

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30 13:30:15 +01:00
Jeremy Fitzhardinge f9c4cfe954 xen: disable vcpu_info placement for now
There have been several reports of Xen guest domains locking up when
using vcpu_info structure placement.  Disable it for now.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-23 18:04:54 -08:00
Jeremy Fitzhardinge 7999f4b4e5 xen: relax signature check
Some versions of Xen 3.x set their magic number to "xen-3.[12]", so
relax the test to match them.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-10 19:46:58 -08:00
Linus Torvalds fb9fc39517 Merge branch 'xen-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen
* 'xen-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
  xfs: eagerly remove vmap mappings to avoid upsetting Xen
  xen: add some debug output for failed multicalls
  xen: fix incorrect vcpu_register_vcpu_info hypercall argument
  xen: ask the hypervisor how much space it needs reserved
  xen: lock pte pages while pinning/unpinning
  xen: deal with stale cr3 values when unpinning pagetables
  xen: add batch completion callbacks
  xen: yield to IPI target if necessary
  Clean up duplicate includes in arch/i386/xen/
  remove dead code in pgtable_cache_init
  paravirt: clean up lazy mode handling
  paravirt: refactor struct paravirt_ops into smaller pv_*_ops
2007-10-17 11:10:11 -07:00
H. Peter Anvin 30c826451d [x86] remove uses of magic macros for boot_params access
Instead of using magic macros for boot_params access, simply use the
boot_params structure.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2007-10-16 17:38:31 -07:00
Jeremy Fitzhardinge e3d2697669 xen: fix incorrect vcpu_register_vcpu_info hypercall argument
The kernel's copy of struct vcpu_register_vcpu_info was out of date,
at best causing the hypercall to fail and the guest kernel to fall
back to the old mechanism, or worse, causing random memory corruption.

[ Stable folks: applies to 2.6.23 ]

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Stable Kernel <stable@kernel.org>
Cc: Morten =?utf-8?q?B=C3=B8geskov?= <xen-users@morten.bogeskov.dk>
Cc: Mark Williamson <mark.williamson@cl.cam.ac.uk>
2007-10-16 11:51:31 -07:00
Jeremy Fitzhardinge fb1d84043c xen: ask the hypervisor how much space it needs reserved
Ask the hypervisor how much space it needs reserved, since 32-on-64
doesn't need any space, and it may change in future.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
2007-10-16 11:51:31 -07:00
Jeremy Fitzhardinge 74260714c5 xen: lock pte pages while pinning/unpinning
When a pagetable is created, it is made globally visible in the rmap
prio tree before it is pinned via arch_dup_mmap(), and remains in the
rmap tree while it is unpinned with arch_exit_mmap().

This means that other CPUs may race with the pinning/unpinning
process, and see a pte between when it gets marked RO and actually
pinned, causing any pte updates to fail with write-protect faults.

As a result, all pte pages must be properly locked, and only unlocked
once the pinning/unpinning process has finished.

In order to avoid taking spinlocks for the whole pagetable - which may
overflow the PREEMPT_BITS portion of preempt counter - it locks and pins
each pte page individually, and then finally pins the whole pagetable.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickens <hugh@veritas.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Keir Fraser <keir@xensource.com>
Cc: Jan Beulich <jbeulich@novell.com>
2007-10-16 11:51:30 -07:00
Jeremy Fitzhardinge 9f79991d41 xen: deal with stale cr3 values when unpinning pagetables
When a pagetable is no longer in use, it must be unpinned so that its
pages can be freed.  However, this is only possible if there are no
stray uses of the pagetable.  The code currently deals with all the
usual cases, but there's a rare case where a vcpu is changing cr3, but
is doing so lazily, and the change hasn't actually happened by the time
the pagetable is unpinned, even though it appears to have been completed.

This change adds a second per-cpu cr3 variable - xen_current_cr3 -
which tracks the actual state of the vcpu cr3.  It is only updated once
the actual hypercall to set cr3 has been completed.  Other processors
wishing to unpin a pagetable can check other vcpu's xen_current_cr3
values to see if any cross-cpu IPIs are needed to clean things up.

[ Stable folks: 2.6.23 bugfix ]

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Stable Kernel <stable@kernel.org>
2007-10-16 11:51:30 -07:00
Jesper Juhl d626a1f1cb Clean up duplicate includes in arch/i386/xen/
This patch cleans up duplicate includes in
	arch/i386/xen/

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
2007-10-16 11:51:29 -07:00
Jeremy Fitzhardinge 8965c1c095 paravirt: clean up lazy mode handling
Currently, the set_lazy_mode pv_op is overloaded with 5 functions:
 1. enter lazy cpu mode
 2. leave lazy cpu mode
 3. enter lazy mmu mode
 4. leave lazy mmu mode
 5. flush pending batched operations

This complicates each paravirt backend, since it needs to deal with
all the possible state transitions, handling flushing, etc. In
particular, flushing is quite distinct from the other 4 functions, and
seems to just cause complication.

This patch removes the set_lazy_mode operation, and adds "enter" and
"leave" lazy mode operations on mmu_ops and cpu_ops.  All the logic
associated with enter and leaving lazy states is now in common code
(basically BUG_ONs to make sure that no mode is current when entering
a lazy mode, and make sure that the mode is current when leaving).
Also, flush is handled in a common way, by simply leaving and
re-entering the lazy mode.

The result is that the Xen, lguest and VMI lazy mode implementations
are much simpler.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Zach Amsden <zach@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Anthony Liguory <aliguori@us.ibm.com>
Cc: "Glauber de Oliveira Costa" <glommer@gmail.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
2007-10-16 11:51:29 -07:00
Jeremy Fitzhardinge 93b1eab3d2 paravirt: refactor struct paravirt_ops into smaller pv_*_ops
This patch refactors the paravirt_ops structure into groups of
functionally related ops:

pv_info - random info, rather than function entrypoints
pv_init_ops - functions used at boot time (some for module_init too)
pv_misc_ops - lazy mode, which didn't fit well anywhere else
pv_time_ops - time-related functions
pv_cpu_ops - various privileged instruction ops
pv_irq_ops - operations for managing interrupt state
pv_apic_ops - APIC operations
pv_mmu_ops - operations for managing pagetables

There are several motivations for this:

1. Some of these ops will be general to all x86, and some will be
   i386/x86-64 specific.  This makes it easier to share common stuff
   while allowing separate implementations where needed.

2. At the moment we must export all of paravirt_ops, but modules only
   need selected parts of it.  This allows us to export on a case by case
   basis (and also choose which export license we want to apply).

3. Functional groupings make things a bit more readable.

Struct paravirt_ops is now only used as a template to generate
patch-site identifiers, and to extract function pointers for inserting
into jmp/calls when patching.  It is only instantiated when needed.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Cc: Zach Amsden <zach@vmware.com>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Anthony Liguory <aliguori@us.ibm.com>
Cc: "Glauber de Oliveira Costa" <glommer@gmail.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
2007-10-16 11:51:29 -07:00
Thomas Gleixner 9702785a74 i386: move xen
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-11 11:16:51 +02:00