linux

Commit Graph

Author	SHA1	Message	Date
Jeremy Fitzhardinge	577eebeae3	xen: make -fstack-protector work under Xen -fstack-protector uses a special per-cpu "stack canary" value. gcc generates special code in each function to test the canary to make sure that the function's stack hasn't been overrun. On x86-64, this is simply an offset of %gs, which is the usual per-cpu base segment register, so setting it up simply requires loading %gs's base as normal. On i386, the stack protector segment is %gs (rather than the usual kernel percpu %fs segment register). This requires setting up the full kernel GDT and then loading %gs accordingly. We also need to make sure %gs is initialized when bringing up secondary cpus too. To keep things consistent, we do the full GDT/segment register setup on both architectures. Because we need to avoid -fstack-protected code before setting up the GDT and because there's no way to disable it on a per-function basis, several files need to have stack-protector inhibited. [ Impact: allow Xen booting with stack-protector enabled ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-09-09 16:37:39 -07:00
H. Peter Anvin	0cc0213e73	x86, msr: Have the _safe MSR functions return -EIO, not -EFAULT For some reason, the _safe MSR functions returned -EFAULT, not -EIO. However, the only user which cares about the return code as anything other than a boolean is the MSR driver, which wants -EIO. Change it to -EIO across the board. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Alok Kataria <akataria@vmware.com> Cc: Rusty Russell <rusty@rustcorp.com.au>	2009-08-31 15:15:23 -07:00
Thomas Gleixner	2d826404f0	x86: Move tsc_calibration to x86_init_ops TSC calibration is modified by the vmware hypervisor and paravirt by separate means. Moorestown wants to add its own calibration routine as well. So make calibrate_tsc a proper x86_init_ops function and override it by paravirt or by the early setup of the vmware hypervisor. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-31 09:35:47 +02:00
Thomas Gleixner	845b3944bb	x86: Add timer_init to x86_init_ops The timer init code is convoluted with several quirks and the paravirt timer chooser. Figuring out which code path is actually taken is not for the faint hearted. Move the numaq TSC quirk to tsc_pre_init x86_init_ops function and replace the paravirt time chooser and the remaining x86 quirk with a simple x86_init_ops function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-31 09:35:46 +02:00
Thomas Gleixner	736decac64	x86: Move percpu clockevents setup to x86_init_ops paravirt overrides the setup of the default apic timers as per cpu timers. Moorestown needs to override that as well. Move it to x86_init_ops setup and create a separate x86_cpuinit struct which holds the function for the secondary evtl. hotplugabble CPUs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-31 09:35:46 +02:00
Thomas Gleixner	f1d7062a23	x86: Move xen_post_allocator_init into xen_pagetable_setup_done We really do not need two paravirt/x86_init_ops functions which are called in two consecutive source lines. Move the only user of post_allocator_init into the already existing pagetable_setup_done function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-31 09:35:45 +02:00
Thomas Gleixner	030cb6c00d	x86: Move paravirt pagetable_setup to x86_init_ops Replace more paravirt hackery by proper x86_init_ops. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-31 09:35:45 +02:00
Thomas Gleixner	6f30c1ac3f	x86: Move paravirt banner printout to x86_init_ops Replace another obscure paravirt magic and move it to x86_init_ops. Such a hook is also useful for embedded and special hardware. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-31 09:35:45 +02:00
Thomas Gleixner	42bbdb43b1	x86: Replace ARCH_SETUP by a proper x86_init_ops ARCH_SETUP is a horrible leftover from the old arch/i386 mach support code. It still has a lonely user in xen. Move it to x86_init_ops. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-31 09:35:45 +02:00
Thomas Gleixner	6b18ae3e2f	x86: Move memory_setup to x86_init_ops memory_setup is overridden by x86_quirks and by paravirts with weak functions and quirks. Unify the whole mess and make it an unconditional x86_init_ops function which defaults to the standard function and can be overridden by the early platform code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-27 17:12:52 +02:00
H. Peter Anvin	7adb4df410	x86, xen: Initialize cx to suppress warning Initialize cx before calling xen_cpuid(), in order to suppress the "may be used uninitialized in this function" warning. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org>	2009-08-25 21:10:32 -07:00
Jeremy Fitzhardinge	d560bc6157	x86, xen: Suppress WP test on Xen Xen always runs on CPUs which properly support WP enforcement in privileged mode, so there's no need to test for it. This also works around a crash reported by Arnd Hannemann, though I think its just a band-aid for that case. Reported-by: Arnd Hannemann <hannemann@nets.rwth-aachen.de> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-25 21:10:32 -07:00
Ingo Molnar	cbcb340cb6	Merge branch 'bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen into x86/urgent	2009-08-20 12:05:24 +02:00
Jeremy Fitzhardinge	ce2eef33d3	xen: rearrange things to fix stackprotector Make sure the stack-protector segment registers are properly set up before calling any functions which may have stack-protection compiled into them. [ Impact: prevent Xen early-boot crash when stack-protector is enabled ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-08-19 17:09:28 -07:00
Jeremy Fitzhardinge	a789ed5fb6	xen: cache cr0 value to avoid trap'n'emulate for read_cr0 stts() is implemented in terms of read_cr0/write_cr0 to update the state of the TS bit. This happens during context switch, and so is fairly performance critical. Rather than falling back to a trap-and-emulate native read_cr0, implement our own by caching the last-written value from write_cr0 (the TS bit is the only one we really care about). Impact: optimise Xen context switches Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-05-08 15:55:24 -07:00
Jeremy Fitzhardinge	b80119bb35	xen/x86-64: clean up warnings about IST-using traps Ignore known IST-using traps. Aside from the debugger traps, they're low-level faults which Xen will handle for us, so the kernel needn't worry about them. Keep warning in case unknown trap starts using IST. Impact: suppress spurious warnings Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-05-08 15:55:10 -07:00
Jeremy Fitzhardinge	6cac5a9246	xen/x86-64: fix breakpoints and hardware watchpoints Native x86-64 uses the IST mechanism to run int3 and debug traps on an alternative stack. Xen does not do this, and so the frames were being misinterpreted by the ptrace code. This change special-cases these two exceptions by using Xen variants which run on the normal kernel stack properly. Impact: avoid crash or bad data when IST trap is invoked under Xen Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-05-08 15:51:03 -07:00
Jeremy Fitzhardinge	2b2a733447	xen: clean up gate trap/interrupt constants Use GATE_INTERRUPT/TRAP rather than 0xe/f. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-04-08 14:25:50 -07:00
Jeremy Fitzhardinge	bc6081ff98	xen: set _PAGE_NX in __supported_pte_mask before pagetable construction Some 64-bit machines don't support the NX flag in ptes. Check for NX before constructing the kernel pagetables. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-04-08 14:25:49 -07:00
Jeremy Fitzhardinge	191216b928	xen: mask XSAVE from cpuid Xen leaves XSAVE set in cpuid, but doesn't allow cr4.OSXSAVE to be set. This confuses the kernel and it ends up crashing on an xsetbv instruction. At boot time, try to set cr4.OSXSAVE, and mask XSAVE out of cpuid it we can't. This will produce a spurious error from Xen, but allows us to support XSAVE if/when Xen does. This also factors out the cpuid mask decisions to boot time. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-04-08 11:51:46 -07:00
Jeremy Fitzhardinge	c667d5d6a7	xen: remove xen_load_gdt debug Don't need the noise. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-04-08 11:51:45 -07:00
Jeremy Fitzhardinge	a957fac500	xen: make xen_load_gdt simpler Remove use of multicall machinery which is unused (gdt loading is never performance critical). This removes the implicit use of percpu variables, which simplifies understanding how the percpu code's use of load_gdt interacts with this code. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-04-08 11:51:45 -07:00
Jeremy Fitzhardinge	c7da8c829b	xen: clean up xen_load_gdt Makes the logic a bit clearer. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-04-08 11:51:45 -07:00
Jeremy Fitzhardinge	6d02c42698	xen: clean up gate trap/interrupt constants Use GATE_INTERRUPT/TRAP rather than 0xe/f. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-03-30 09:25:36 -07:00
Jeremy Fitzhardinge	707ebbc81c	xen: set _PAGE_NX in __supported_pte_mask before pagetable construction Some 64-bit machines don't support the NX flag in ptes. Check for NX before constructing the kernel pagetables. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-03-30 09:25:36 -07:00
Jeremy Fitzhardinge	e826fe1ba1	xen: mask XSAVE from cpuid Xen leaves XSAVE set in cpuid, but doesn't allow cr4.OSXSAVE to be set. This confuses the kernel and it ends up crashing on an xsetbv instruction. At boot time, try to set cr4.OSXSAVE, and mask XSAVE out of cpuid it we can't. This will produce a spurious error from Xen, but allows us to support XSAVE if/when Xen does. This also factors out the cpuid mask decisions to boot time. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-03-30 09:25:28 -07:00
Jeremy Fitzhardinge	b4b7e58590	xen: remove xen_load_gdt debug Don't need the noise. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-03-30 09:25:26 -07:00
Jeremy Fitzhardinge	3ce5fa7ebf	xen: make xen_load_gdt simpler Remove use of multicall machinery which is unused (gdt loading is never performance critical). This removes the implicit use of percpu variables, which simplifies understanding how the percpu code's use of load_gdt interacts with this code. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-03-30 09:25:25 -07:00
Jeremy Fitzhardinge	6ed6bf428a	xen: clean up xen_load_gdt Makes the logic a bit clearer. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-03-30 09:25:24 -07:00
Jeremy Fitzhardinge	224101ed69	x86/paravirt: finish change from lazy cpu to context switch start/end Impact: fix lazy context switch API Pass the previous and next tasks into the context switch start end calls, so that the called functions can properly access the task state (esp in end_context_switch, in which the next task is not yet completely current). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>	2009-03-29 23:36:01 -07:00
Jeremy Fitzhardinge	b407fc57b8	x86/paravirt: flush pending mmu updates on context switch Impact: allow preemption during lazy mmu updates If we're in lazy mmu mode when context switching, leave lazy mmu mode, but remember the task's state in TIF_LAZY_MMU_UPDATES. When we resume the task, check this flag and re-enter lazy mmu mode if its set. This sets things up for allowing lazy mmu mode while preemptible, though that won't actually be active until the next change. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>	2009-03-29 23:36:00 -07:00
Ingo Molnar	91d75e209b	Merge branch 'x86/core' into core/percpu	2009-03-04 02:29:19 +01:00
Jeremy Fitzhardinge	9976b39b50	xen: deal with virtually mapped percpu data The virtually mapped percpu space causes us two problems: - for hypercalls which take an mfn, we need to do a full pagetable walk to convert the percpu va into an mfn, and - when a hypercall requires a page to be mapped RO via all its aliases, we need to make sure its RO in both the percpu mapping and in the linear mapping This primarily affects the gdt and the vcpu info structure. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Xen-devel <xen-devel@lists.xensource.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <htejun@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-02 12:58:19 +01:00
Ingo Molnar	13b2eda64d	Merge branch 'x86/urgent' into x86/core Conflicts: arch/x86/mach-voyager/voyager_smp.c	2009-02-26 06:30:42 +01:00
Jeremy Fitzhardinge	55d8085671	xen: disable interrupts early, as start_kernel expects This avoids a lockdep warning from: if (DEBUG_LOCKS_WARN_ON(unlikely(!early_boot_irqs_enabled))) return; in trace_hardirqs_on_caller(); Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Mark McLoughlin <markmc@redhat.com> Cc: Xen-devel <xen-devel@lists.xensource.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 18:51:57 +01:00
Ingo Molnar	7b6aa335ca	x86, apic: remove genapic.h Impact: cleanup Remove genapic.h and remove all references to it. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-17 17:52:44 +01:00
Yinghai Lu	c1eeb2de41	x86: fold apic_ops into genapic Impact: cleanup make it simpler, don't need have one extra struct. v2: fix the sgi_uv build Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-17 12:22:20 +01:00
Tejun Heo	ccbeed3a05	x86: make lazy %gs optional on x86_32 Impact: pt_regs changed, lazy gs handling made optional, add slight overhead to SAVE_ALL, simplifies error_code path a bit On x86_32, %gs hasn't been used by kernel and handled lazily. pt_regs doesn't have place for it and gs is saved/loaded only when necessary. In preparation for stack protector support, this patch makes lazy %gs handling optional by doing the followings. * Add CONFIG_X86_32_LAZY_GS and place for gs in pt_regs. * Save and restore %gs along with other registers in entry_32.S unless LAZY_GS. Note that this unfortunately adds "pushl $0" on SAVE_ALL even when LAZY_GS. However, it adds no overhead to common exit path and simplifies entry path with error code. * Define different user_gs accessors depending on LAZY_GS and add lazy_save_gs() and lazy_load_gs() which are noop if !LAZY_GS. The lazy__gs() ops are used to save, load and clear %gs lazily. Define ELF_CORE_COPY_KERNEL_REGS() which always read %gs directly. xen and lguest changes need to be verified. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-10 00:42:00 +01:00
Jeremy Fitzhardinge	e4d0407185	xen: use direct ops on 64-bit Enable the use of the direct vcpu-access operations on 64-bit. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2009-02-04 17:00:50 -08:00
Jeremy Fitzhardinge	383414322b	xen: setup percpu data pointers We need to access percpu data fairly early, so set up the percpu registers as soon as possible. We only need to load the appropriate segment register. We already have a GDT, but its hard to change it early because we need to manipulate the pagetable to do so, and that hasn't been set up yet. Also, set the kernel stack when bringing up secondary CPUs. If we don't they all end up sharing the same stack... Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2009-02-04 16:59:02 -08:00
H. Peter Anvin	327641da8e	Merge branch 'core/percpu' into x86/paravirt	2009-02-04 16:58:26 -08:00
Jeremy Fitzhardinge	795f99b61d	xen: setup percpu data pointers Impact: fix xen booting We need to access percpu data fairly early, so set up the percpu registers as soon as possible. We only need to load the appropriate segment register. We already have a GDT, but its hard to change it early because we need to manipulate the pagetable to do so, and that hasn't been set up yet. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2009-01-31 14:28:58 +09:00
Jeremy Fitzhardinge	ecb93d1ccd	x86/paravirt: add register-saving thunks to reduce caller register pressure Impact: Optimization One of the problems with inserting a pile of C calls where previously there were none is that the register pressure is greatly increased. The C calling convention says that the caller must expect a certain set of registers may be trashed by the callee, and that the callee can use those registers without restriction. This includes the function argument registers, and several others. This patch seeks to alleviate this pressure by introducing wrapper thunks that will do the register saving/restoring, so that the callsite doesn't need to worry about it, but the callee function can be conventional compiler-generated code. In many cases (particularly performance-sensitive cases) the callee will be in assembler anyway, and need not use the compiler's calling convention. Standard calling convention is: arguments return scratch x86-32 eax edx ecx eax ? x86-64 rdi rsi rdx rcx rax r8 r9 r10 r11 The thunk preserves all argument and scratch registers. The return register is not preserved, and is available as a scratch register for unwrapped callee code (and of course the return value). Wrapped function pointers are themselves wrapped in a struct paravirt_callee_save structure, in order to get some warning from the compiler when functions with mismatched calling conventions are used. The most common paravirt ops, both statically and dynamically, are interrupt enable/disable/save/restore, so handle them first. This is particularly easy since their calls are handled specially anyway. XXX Deal with VMI. What's their calling convention? Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-01-30 14:51:45 -08:00
Jeremy Fitzhardinge	319f3ba52c	xen: move remaining mmu-related stuff into mmu.c Impact: Cleanup Move remaining mmu-related stuff into mmu.c. A general cleanup, and lay the groundwork for later patches. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-01-30 14:51:14 -08:00
H. Peter Anvin	9b7ed8faa0	Merge branch 'core/percpu' into x86/paravirt	2009-01-30 14:50:57 -08:00
Jeremy Fitzhardinge	ab897d2013	x86/pvops: remove pte_flags pvop pte_flags() was introduced as a new pvop in order to extract just the flags portion of a pte, which is a potentially cheaper operation than extracting the page number as well. It turns out this operation is not needed, because simply using a mask to extract the flags from a pte is sufficient for all current users. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-22 23:35:20 +01:00
Brian Gerst	8ce031972b	x86: remove pda_init() Impact: cleanup Copy the code to cpu_init() to satisfy the requirement that the cpu be reinitialized. Remove all other calls, since the segments are already initialized in head_64.S. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2009-01-20 12:29:19 +09:00
Ingo Molnar	6dbde35308	percpu: add optimized generic percpu accessors It is an optimization and a cleanup, and adds the following new generic percpu methods: percpu_read() percpu_write() percpu_add() percpu_sub() percpu_and() percpu_or() percpu_xor() and implements support for them on x86. (other architectures will fall back to a default implementation) The advantage is that for example to read a local percpu variable, instead of this sequence: return __get_cpu_var(var); ffffffff8102ca2b: 48 8b 14 fd 80 09 74 mov -0x7e8bf680(,%rdi,8),%rdx ffffffff8102ca32: 81 ffffffff8102ca33: 48 c7 c0 d8 59 00 00 mov $0x59d8,%rax ffffffff8102ca3a: 48 8b 04 10 mov (%rax,%rdx,1),%rax We can get a single instruction by using the optimized variants: return percpu_read(var); ffffffff8102ca3f: 65 48 8b 05 91 8f fd mov %gs:0x7efd8f91(%rip),%rax I also cleaned up the x86-specific APIs and made the x86 code use these new generic percpu primitives. tj: * fixed generic percpu_sub() definition as Roel Kluin pointed out * added percpu_and() for completeness's sake * made generic percpu ops atomic against preemption Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Tejun Heo <tj@kernel.org>	2009-01-16 14:20:31 +01:00
Tejun Heo	004aa322f8	x86: misc clean up after the percpu update Do the following cleanups: * kill x86_64_init_pda() which now is equivalent to pda_init() * use per_cpu_offset() instead of cpu_pda() when initializing initial_gs Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-16 14:20:26 +01:00
Rusty Russell	4595f9620c	x86: change flush_tlb_others to take a const struct cpumask Impact: reduce stack usage, use new cpumask API. This is made a little more tricky by uv_flush_tlb_others which actually alters its argument, for an IPI to be sent to the remaining cpus in the mask. I solve this by allocating a cpumask_var_t for this case and falling back to IPI should this fail. To eliminate temporaries in the caller, all flush_tlb_others implementations now do the this-cpu-elimination step themselves. Note also the curious "cpus_or(f->flush_cpumask, cpumask, f->flush_cpumask)" which has been there since pre-git and yet f->flush_cpumask is always zero at this point. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com>	2009-01-11 19:13:06 +01:00
Jeremy Fitzhardinge	ecbf29cdb3	xen: clean up asm/xen/hypervisor.h Impact: cleanup hypervisor.h had accumulated a lot of crud, including lots of spurious #includes. Clean it all up, and go around fixing up everything else accordingly. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-16 21:50:31 +01:00
Tej	f63c2f2489	xen: whitespace/checkpatch cleanup Impact: cleanup Signed-off-by: Tej <bewith.tej@gmail.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-16 21:05:01 +01:00
Jeremy Fitzhardinge	d05fdf3160	xen: make sure stray alias mappings are gone before pinning Xen requires that all mappings of pagetable pages are read-only, so that they can't be updated illegally. As a result, if a page is being turned into a pagetable page, we need to make sure all its mappings are RO. If the page had been used for ioremap or vmalloc, it may still have left over mappings as a result of not having been lazily unmapped. This change makes sure we explicitly mop them all up before pinning the page. Unlike aliases created by kmap, the there can be vmalloc aliases even for non-high pages, so we must do the flush unconditionally. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Linux Memory Management List <linux-mm@kvack.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-07 10:05:59 +01:00
Nick Piggin	db64fe0225	mm: rewrite vmap layer Rewrite the vmap allocator to use rbtrees and lazy tlb flushing, and provide a fast, scalable percpu frontend for small vmaps (requires a slightly different API, though). The biggest problem with vmap is actually vunmap. Presently this requires a global kernel TLB flush, which on most architectures is a broadcast IPI to all CPUs to flush the cache. This is all done under a global lock. As the number of CPUs increases, so will the number of vunmaps a scaled workload will want to perform, and so will the cost of a global TLB flush. This gives terrible quadratic scalability characteristics. Another problem is that the entire vmap subsystem works under a single lock. It is a rwlock, but it is actually taken for write in all the fast paths, and the read locking would likely never be run concurrently anyway, so it's just pointless. This is a rewrite of vmap subsystem to solve those problems. The existing vmalloc API is implemented on top of the rewritten subsystem. The TLB flushing problem is solved by using lazy TLB unmapping. vmap addresses do not have to be flushed immediately when they are vunmapped, because the kernel will not reuse them again (would be a use-after-free) until they are reallocated. So the addresses aren't allocated again until a subsequent TLB flush. A single TLB flush then can flush multiple vunmaps from each CPU. XEN and PAT and such do not like deferred TLB flushing because they can't always handle multiple aliasing virtual addresses to a physical address. They now call vm_unmap_aliases() in order to flush any deferred mappings. That call is very expensive (well, actually not a lot more expensive than a single vunmap under the old scheme), however it should be OK if not called too often. The virtual memory extent information is stored in an rbtree rather than a linked list to improve the algorithmic scalability. There is a per-CPU allocator for small vmaps, which amortizes or avoids global locking. To use the per-CPU interface, the vm_map_ram / vm_unmap_ram interfaces must be used in place of vmap and vunmap. Vmalloc does not use these interfaces at the moment, so it will not be quite so scalable (although it will use lazy TLB flushing). As a quick test of performance, I ran a test that loops in the kernel, linearly mapping then touching then unmapping 4 pages. Different numbers of tests were run in parallel on an 4 core, 2 socket opteron. Results are in nanoseconds per map+touch+unmap. threads vanilla vmap rewrite 1 14700 2900 2 33600 3000 4 49500 2800 8 70631 2900 So with a 8 cores, the rewritten version is already 25x faster. In a slightly more realistic test (although with an older and less scalable version of the patch), I ripped the not-very-good vunmap batching code out of XFS, and implemented the large buffer mapping with vm_map_ram and vm_unmap_ram... along with a couple of other tricks, I was able to speed up a large directory workload by 20x on a 64 CPU system. I believe vmap/vunmap is actually sped up a lot more than 20x on such a system, but I'm running into other locks now. vmap is pretty well blown off the profiles. Before: 1352059 total 0.1401 798784 _write_lock 8320.6667 <- vmlist_lock 529313 default_idle 1181.5022 15242 smp_call_function 15.8771 <- vmap tlb flushing 2472 __get_vm_area_node 1.9312 <- vmap 1762 remove_vm_area 4.5885 <- vunmap 316 map_vm_area 0.2297 <- vmap 312 kfree 0.1950 300 _spin_lock 3.1250 252 sn_send_IPI_phys 0.4375 <- tlb flushing 238 vmap 0.8264 <- vmap 216 find_lock_page 0.5192 196 find_next_bit 0.3603 136 sn2_send_IPI 0.2024 130 pio_phys_write_mmr 2.0312 118 unmap_kernel_range 0.1229 After: 78406 total 0.0081 40053 default_idle 89.4040 33576 ia64_spinlock_contention 349.7500 1650 _spin_lock 17.1875 319 __reg_op 0.5538 281 _atomic_dec_and_lock 1.0977 153 mutex_unlock 1.5938 123 iget_locked 0.1671 117 xfs_dir_lookup 0.1662 117 dput 0.1406 114 xfs_iget_core 0.0268 92 xfs_da_hashname 0.1917 75 d_alloc 0.0670 68 vmap_page_range 0.0462 <- vmap 58 kmem_cache_alloc 0.0604 57 memset 0.0540 52 rb_next 0.1625 50 __copy_user 0.0208 49 bitmap_find_free_region 0.2188 <- vmap 46 ia64_sn_udelay 0.1106 45 find_inode_fast 0.1406 42 memcmp 0.2188 42 finish_task_switch 0.1094 42 __d_lookup 0.0410 40 radix_tree_lookup_slot 0.1250 37 _spin_unlock_irqrestore 0.3854 36 xfs_bmapi 0.0050 36 kmem_cache_free 0.0256 35 xfs_vn_getattr 0.0322 34 radix_tree_lookup 0.1062 33 __link_path_walk 0.0035 31 xfs_da_do_buf 0.0091 30 _xfs_buf_find 0.0204 28 find_get_page 0.0875 27 xfs_iread 0.0241 27 __strncpy_from_user 0.2812 26 _xfs_buf_initialize 0.0406 24 _xfs_buf_lookup_pages 0.0179 24 vunmap_page_range 0.0250 <- vunmap 23 find_lock_page 0.0799 22 vm_map_ram 0.0087 <- vmap 20 kfree 0.0125 19 put_page 0.0330 18 __kmalloc 0.0176 17 xfs_da_node_lookup_int 0.0086 17 _read_lock 0.0885 17 page_waitqueue 0.0664 vmap has gone from being the top 5 on the profiles and flushing the crap out of all TLBs, to using less than 1% of kernel time. [akpm@linux-foundation.org: cleanups, section fix] [akpm@linux-foundation.org: fix build on alpha] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Krzysztof Helt <krzysztof.h1@poczta.fm> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:32 -07:00
Ingo Molnar	365d46dc9b	Merge branch 'linus' into x86/xen Conflicts: arch/x86/kernel/cpu/common.c arch/x86/kernel/process_64.c arch/x86/xen/enlighten.c	2008-10-12 12:37:32 +02:00
Ingo Molnar	d84705969f	Merge branch 'x86/apic' into x86-v28-for-linus-phase4-B Conflicts: arch/x86/kernel/apic_32.c arch/x86/kernel/apic_64.c arch/x86/kernel/setup.c drivers/pci/intel-iommu.c include/asm-x86/cpufeature.h include/asm-x86/dma-mapping.h	2008-10-11 20:17:36 +02:00
Ian Campbell	5dc64a3442	xen: do not reserve 2 pages of padding between hypervisor and fixmap. When reserving space for the hypervisor the Xen paravirt backend adds an extra two pages (this was carried forward from the 2.6.18-xen tree which had them "for safety"). Depending on various CONFIG options this can cause the boot time fixmaps to span multiple PMDs which is not supported and triggers a WARN in early_ioremap_init(). This was exposed by `2216d199b1` which moved the dmi table parsing earlier. x86: fix CONFIG_X86_RESERVE_LOW_64K=y The bad_bios_dmi_table() quirk never triggered because we do DMI setup too late. Move it a bit earlier. There is no real reason to reserve these two extra pages and the fixmap already incorporates FIX_HOLE which serves the same purpose. None of the other callers of reserve_top_address do this. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-10-10 13:00:15 +02:00
Ingo Molnar	e496e3d645	Merge branches 'x86/alternatives', 'x86/cleanups', 'x86/commandline', 'x86/crashdump', 'x86/debug', 'x86/defconfig', 'x86/doc', 'x86/exports', 'x86/fpu', 'x86/gart', 'x86/idle', 'x86/mm', 'x86/mtrr', 'x86/nmi-watchdog', 'x86/oprofile', 'x86/paravirt', 'x86/reboot', 'x86/sparse-fixes', 'x86/tsc', 'x86/urgent' and 'x86/vmalloc' into x86-v28-for-linus-phase1	2008-10-06 18:17:07 +02:00
Jeremy Fitzhardinge	db053b86f4	xen: clean up x86-64 warnings There are a couple of Xen features which rely on directly accessing per-cpu data via a segment register, which is not yet available on x86-64. In the meantime, just disable direct access to the vcpu info structure; this leaves some of the code as dead, but it will come to life in time, and the warnings are suppressed. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-10-03 10:04:10 +02:00
Jeremy Fitzhardinge	6a9e91846b	xen: fix pinning when not using split pte locks We only pin PTE pages when using split PTE locks, so don't do the pin/unpin when attaching/detaching pte pages to a pinned pagetable. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-10 14:05:53 +02:00
Eduardo Habkost	e4a6be4d28	x86, xen: Use native_pte_flags instead of native_pte_val for .pte_flags Using native_pte_val triggers the BUG_ON() in the paravirt_ops version of pte_flags(). Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-06 20:13:58 +02:00
Eduardo Habkost	f86399396c	x86, paravirt_ops: use unsigned long instead of u32 for alloc_p() pfn args This patch changes the pfn args from 'u32' to 'unsigned long' on alloc_p() functions on paravirt_ops, and the corresponding implementations for Xen and VMI. The prototypes for CONFIG_PARAVIRT=n are already using unsigned long, so paravirt.h now matches the prototypes on asm-x86/pgalloc.h. It shouldn't result in any changes on generated code on 32-bit, with or without CONFIG_PARAVIRT. On both cases, 'codiff -f' didn't show any change after applying this patch. On 64-bit, there are (expected) binary changes only when CONFIG_PARAVIRT is enabled, as the patch is really supposed to change the size of the pfn args. [ v2: KVM_GUEST: use the right parameter type on kvm_release_pt() ] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org> Acked-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-22 05:34:44 +02:00
Jeremy Fitzhardinge	6e833587e1	xen: clean up domain mode predicates There are four operating modes Xen code may find itself running in: - native - hvm domain - pv dom0 - pv domU Clean up predicates for testing for these states to make them more consistent. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Xen-devel <xen-devel@lists.xensource.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-20 12:40:07 +02:00
Eduardo Habkost	169ad16bb8	xen_alloc_ptpage: cast PFN_PHYS() argument to unsigned long Currently paravirt_ops alloc_p*() uses u32 for the pfn args. We should change that later, but while the pfn parameter is still u32, we need to cast the PFN_PHYS() argument at xen_alloc_ptpage() to unsigned long, otherwise it will lose bits on the shift. I think PFN_PHYS() should behave better when fed with smaller integers, but a cast to unsigned long won't be enough for all cases on 32-bit PAE, and a cast to u64 would be overkill for most users of PFN_PHYS(). We could have two different flavors of PFN_PHYS: one for low pages only (unsigned long) and another that works for any page (u64)), but while we don't have it, we will need the cast to unsigned long on xen_alloc_ptpage(). Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-31 17:10:35 +02:00
Jeremy Fitzhardinge	cef43bf6b3	xen: fix allocation and use of large ldts, cleanup Add a proper comment for set_aliased_prot() and fix an unsigned long/void * warning. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-31 17:10:35 +02:00
Jeremy Fitzhardinge	0d1edf46ba	xen: compile irq functions without -pg for ftrace For some reason I managed to miss a bunch of irq-related functions which also need to be compiled without -pg when using ftrace. This patch moves them into their own file, and starts a cleanup process I've been meaning to do anyway. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: "Alex Nixon (Intern)" <Alex.Nixon@eu.citrix.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-31 12:39:39 +02:00
Jeremy Fitzhardinge	d89961e2dc	xen: suppress known wrmsrs In general, Xen doesn't support wrmsr from an unprivileged domain; it just ends up ignoring the instruction and printing a message on the console. Given that there are sets of MSRs we know the kernel will try to write to, but we don't care, just eat them in xen_write_msr to cut down on console noise. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-28 16:33:08 +02:00
Jeremy Fitzhardinge	a05d2ebab2	xen: fix allocation and use of large ldts When the ldt gets to more than 1 page in size, the kernel uses vmalloc to allocate it. This means that: - when making the ldt RO, we must update the pages in both the vmalloc mapping and the linear mapping to make sure there are no RW aliases. - we need to use arbitrary_virt_to_machine to compute the machine addr for each update Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-28 14:26:27 +02:00
Eduardo Habkost	b56afe1d41	x86, xen: Use native_pte_flags instead of native_pte_val for .pte_flags Using native_pte_val triggers the BUG_ON() in the paravirt_ops version of pte_flags(). Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-26 17:49:33 +02:00
Ingo Molnar	10a010f695	Merge branch 'linus' into x86/x2apic Conflicts: drivers/pci/dmar.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-25 13:08:16 +02:00
Jeremy Fitzhardinge	38ffbe66d5	x86/paravirt/xen: properly fill out the ldt ops LTP testing showed that Xen does not properly implement sys_modify_ldt(). This patch does the final little bits needed to make the ldt work properly. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-24 12:30:06 +02:00
Jeremy Fitzhardinge	59438c9fc4	x86: rename PTE_MASK to PTE_PFN_MASK Rusty, in his peevish way, complained that macros defining constants should have a name which somewhat accurately reflects the actual purpose of the constant. Aside from the fact that PTE_MASK gives no clue as to what's actually being masked, and is misleadingly similar to the functionally entirely different PMD_MASK, PUD_MASK and PGD_MASK, I don't really see what the problem is. But if this patch silences the incessent noise, then it will have achieved its goal (TODO: write test-case). Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-22 10:43:44 +02:00
Ingo Molnar	76c3bb15d6	Merge branch 'linus' into x86/x2apic	2008-07-22 09:06:21 +02:00
Ingo Molnar	acee709cab	Merge branches 'x86/urgent', 'x86/amd-iommu', 'x86/apic', 'x86/cleanups', 'x86/core', 'x86/cpu', 'x86/fixmap', 'x86/gart', 'x86/kprobes', 'x86/memtest', 'x86/modules', 'x86/nmi', 'x86/pat', 'x86/reboot', 'x86/setup', 'x86/step', 'x86/unify-pci', 'x86/uv', 'x86/xen' and 'xen-64bit' into x86/for-linus	2008-07-21 16:37:17 +02:00
Ingo Molnar	caf43bf7c6	x86, xen: fix apic_ops build on UP fix: arch/x86/xen/enlighten.c:615: error: variable ‘xen_basic_apic_ops’ has initializer but incomplete type arch/x86/xen/enlighten.c:616: error: unknown field ‘read’ specified in initializer [...] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-20 14:05:31 +02:00
Ingo Molnar	453c1404c5	Merge branch 'x86/apic' into x86/x2apic Conflicts: arch/x86/kernel/paravirt.c arch/x86/kernel/smpboot.c arch/x86/kernel/vmi_32.c arch/x86/lguest/boot.c arch/x86/xen/enlighten.c include/asm-x86/apic.h include/asm-x86/paravirt.h Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 23:00:05 +02:00
Ingo Molnar	a208f37a46	Merge branch 'linus' into x86/x2apic	2008-07-18 22:50:34 +02:00
Jeremy Fitzhardinge	95c7c23b06	xen: report hypervisor version Various versions of the hypervisor have differences in what ABIs and features they support. Print some details into the boot log to help with remote debugging. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 13:50:42 +02:00
Maciej W. Rozycki	593f4a788e	x86: APIC: remove apic_write_around(); use alternatives Use alternatives to select the workaround for the 11AP Pentium erratum for the affected steppings on the fly rather than build time. Remove the X86_GOOD_APIC configuration option and replace all the calls to apic_write_around() with plain apic_write(), protecting accesses to the ESR as appropriate due to the 3AP Pentium erratum. Remove apic_read_around() and all its invocations altogether as not needed. Remove apic_write_atomic() and all its implementing backends. The use of ASM_OUTPUT2() is not strictly needed for input constraints, but I have used it for readability's sake. I had the feeling no one else was brave enough to do it, so I went ahead and here it is. Verified by checking the generated assembly and tested with both a 32-bit and a 64-bit configuration, also with the 11AP "feature" forced on and verified with gdb on /proc/kcore to work as expected (as an 11AP machines are quite hard to get hands on these days). Some script complained about the use of "volatile", but apic_write() needs it for the same reason and is effectively a replacement for writel(), so I have disregarded it. I am not sure what the policy wrt defconfig files is, they are generated and there is risk of a conflict resulting from an unrelated change, so I have left changes to them out. The option will get removed from them at the next run. Some testing with machines other than mine will be needed to avoid some stupid mistake, but despite its volume, the change is not really that intrusive, so I am fairly confident that because it works for me, it will everywhere. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 12:51:21 +02:00
Ingo Molnar	b3fe124389	xen64: fix build error on 32-bit + !HIGHMEM fix: arch/x86/xen/enlighten.c: In function 'xen_set_fixmap': arch/x86/xen/enlighten.c:1127: error: 'FIX_KMAP_BEGIN' undeclared (first use in this function) arch/x86/xen/enlighten.c:1127: error: (Each undeclared identifier is reported only once arch/x86/xen/enlighten.c:1127: error: for each function it appears in.) arch/x86/xen/enlighten.c:1127: error: 'FIX_KMAP_END' undeclared (first use in this function) make[1]: * [arch/x86/xen/enlighten.o] Error 1 make: * [arch/x86/xen/enlighten.o] Error 2 FIX_KMAP_BEGIN is only available on HIGHMEM. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:07:02 +02:00
Jeremy Fitzhardinge	1153968a48	xen: implement Xen write_msr operation 64-bit uses MSRs for important things like the base for fs and gs-prefixed addresses. It's more efficient to use a hypercall to update these, rather than go via the trap and emulate path. Other MSR writes are just passed through; in an unprivileged domain they do nothing, but it might be useful later. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:06:20 +02:00
Jeremy Fitzhardinge	bf18bf94dc	xen64: set up userspace syscall patch 64-bit userspace expects the vdso to be mapped at a specific fixed address, which happens to be in the middle of the kernel address space. Because we have split user and kernel pagetables, we need to make special arrangements for the vsyscall mapping to appear in the kernel part of the user pagetable. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:06:06 +02:00
Jeremy Fitzhardinge	6fcac6d305	xen64: set up syscall and sysenter entrypoints for 64-bit We set up entrypoints for syscall and sysenter. sysenter is only used for 32-bit compat processes, whereas syscall can be used in by both 32 and 64-bit processes. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:05:52 +02:00
Jeremy Fitzhardinge	d6182fbf04	xen64: allocate and manage user pagetables Because the x86_64 architecture does not enforce segment limits, Xen cannot protect itself with them as it does in 32-bit mode. Therefore, to protect itself, it runs the guest kernel in ring 3. Since it also runs the guest userspace in ring3, the guest kernel must maintain a second pagetable for its userspace, which does not map kernel space. Naturally, the guest kernel pagetables map both kernel and userspace. The userspace pagetable is attached to the corresponding kernel pagetable via the pgd's page->private field. It is allocated and freed at the same time as the kernel pgd via the paravirt_pgd_alloc/free hooks. Fortunately, the user pagetable is almost entirely shared with the kernel pagetable; the only difference is the pgd page itself. set_pgd will populate all entries in the kernel pagetable, and also set the corresponding user pgd entry if the address is less than STACK_TOP_MAX. The user pagetable must be pinned and unpinned with the kernel one, but because the pagetables are aliased, pgd_walk() only needs to be called on the kernel pagetable. The user pgd page is then pinned/unpinned along with the kernel pgd page. xen_write_cr3 must write both the kernel and user cr3s. The init_mm.pgd pagetable never has a user pagetable allocated for it, because it can never be used while running usermode. One awkward area is that early in boot the page structures are not available. No user pagetable can exist at that point, but it complicates the logic to avoid looking at the page structure. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:05:38 +02:00
Eduardo Habkost	8a95408e18	xen64: Clear %fs on xen_load_tls() We need to do this, otherwise we can get a GPF on hypercall return after TLS descriptor is cleared but %fs is still pointing to it. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:04:55 +02:00
Jeremy Fitzhardinge	b7c3c5c159	xen: make sure the kernel command line is right Point the boot params cmd_line_ptr to the domain-builder-provided command line. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:04:13 +02:00
Eduardo Habkost	a8fc1089e4	xen64: implement xen_load_gs_index() xen-64: implement xen_load_gs_index() Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:03:45 +02:00
Jeremy Fitzhardinge	0725cbb977	xen64: add identity irq->vector map The x86_64 interrupt subsystem is oriented towards vectors, as opposed to a flat irq space as it is in x86-32. This patch adds a simple identity irq->vector mapping so that we can continue to feed irqs into do_IRQ() and get a good result. Ideally x86_32 will unify with the 64-bit code and use vectors too. At that point we can move to mapping event channels to vectors, which will allow us to economise on irqs (so per-cpu event channels can share irqs, rather than having to allocte one per cpu, for example). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:03:16 +02:00
Jeremy Fitzhardinge	952d1d7055	xen64: add pvop for swapgs swapgs is a no-op under Xen, because the hypervisor makes sure the right version of %gs is current when switching between user and kernel modes. This means that the swapgs "implementation" can be inlined and used when the stack is unsafe (usermode). Unfortunately, it means that disabling patching will result in a non-booting kernel... Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:02:46 +02:00
Jeremy Fitzhardinge	997409d3d0	xen64: deal with extra words Xen pushes onto exception frames Xen pushes two extra words containing the values of rcx and r11. This pvop hook copies the words back into their appropriate registers, and cleans them off the stack. This leaves the stack in native form, so the normal handler can run unchanged. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:02:31 +02:00
Eduardo Habkost	e176d367d0	xen64: xen_write_idt_entry() and cvt_gate_to_trap() Changed to use the (to-be-)unified descriptor structs. Signed-off-by: Eduardo Habkost <ehabkost@Rawhide-64.localdomain> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:02:15 +02:00
Jeremy Fitzhardinge	8745f8b0b9	xen64: defer setting pagetable alloc/release ops We need to wait until the page structure is available to use the proper pagetable page alloc/release operations, since they use struct page to determine if a pagetable is pinned. This happened to work in 32bit because nobody allocated new pagetable pages in the interim between xen_pagetable_setup_done and xen_post_allocator_init, but the 64-bit kenrel needs to allocate more pagetable levels. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:01:45 +02:00
Jeremy Fitzhardinge	39dbc5bd34	xen32: create initial mappings like 64-bit Rearrange the pagetable initialization to share code with the 64-bit kernel. Rather than deferring anything to pagetable_setup_start, just set up an initial pagetable in swapper_pg_dir early at startup, and create an additional 8MB of physical memory mappings. This matches the native head_32.S mappings to a large degree, and allows the rest of the pagetable setup to continue without much Xen vs. native difference. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:00:49 +02:00
Jeremy Fitzhardinge	d114e1981c	xen64: map an initial chunk of physical memory Early in boot, map a chunk of extra physical memory for use later on. We need a pool of mapped pages to allocate further pages to construct pagetables mapping all physical memory. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:00:35 +02:00
Jeremy Fitzhardinge	22911b3f1c	xen64: 64-bit starts using set_pte from very early It also doesn't need the 32-bit hack version of set_pte for initial pagetable construction, so just make it use the real thing. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:00:21 +02:00
Jeremy Fitzhardinge	084a2a4e76	xen64: early mapping setup Set up the initial pagetables to map the kernel mapping into the physical mapping space. This makes __va() usable, since it requires physical mappings. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:00:07 +02:00
Jeremy Fitzhardinge	7d087b68d6	xen: cpu_detect is 32-bit only Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:59:38 +02:00
Jeremy Fitzhardinge	15664f968a	xen64: use set_fixmap for shared_info structure Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:59:24 +02:00
Jeremy Fitzhardinge	5b09b2876e	x86_64: add workaround for no %gs-based percpu As a stopgap until Mike Travis's x86-64 gs-based percpu patches are ready, provide workaround functions for x86_read/write_percpu for Xen's use. Specifically, this means that we can't really make use of vcpu placement, because we can't use a single gs-based memory access to get to vcpu fields. So disable all that for now. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:58:13 +02:00
Jeremy Fitzhardinge	a9e7062d73	xen: move smp setup into smp.c Move all the smp_ops setup into smp.c, allowing a lot of things to become static. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:57:59 +02:00
Jeremy Fitzhardinge	f5d36de069	xen64: random ifdefs to mask out 32-bit only code Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:57:30 +02:00
Jeremy Fitzhardinge	f6e587325b	xen64: add extra pv_mmu_ops We need extra pv_mmu_ops for 64-bit, to deal with the extra level of pagetable. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:57:16 +02:00
Jeremy Fitzhardinge	851fa3c4e7	xen: define set_pte from the outset We need set_pte to work from a relatively early point, so enable it from the start. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:56:04 +02:00
Eduardo Habkost	a312b37b2a	x86/paravirt: call paravirt_pagetable_setup_{start, done} Call paravirt_pagetable_setup_{start,done} These paravirt_ops functions were not being called on x86_64. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:53:43 +02:00
Ingo Molnar	1a781a777b	Merge branch 'generic-ipi' into generic-ipi-for-linus Conflicts: arch/powerpc/Kconfig arch/s390/kernel/time.c arch/x86/kernel/apic_32.c arch/x86/kernel/cpu/perfctr-watchdog.c arch/x86/kernel/i8259_64.c arch/x86/kernel/ldt.c arch/x86/kernel/nmi_64.c arch/x86/kernel/smpboot.c arch/x86/xen/smp.c include/asm-x86/hw_irq_32.h include/asm-x86/hw_irq_64.h include/asm-x86/mach-default/irq_vectors.h include/asm-x86/mach-voyager/irq_vectors.h include/asm-x86/smp.h kernel/Makefile Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-15 21:55:59 +02:00
Yinghai Lu	94a8c3c243	x86: let 32bit use apic_ops too - fix fix for pv - clean up the namespace there too. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-14 09:02:54 +02:00
Suresh Siddha	ad66dd340f	x2apic: xen64 paravirt basic apic ops Define the Xen specific basic apic ops, in additon to paravirt apic ops, with some misc warning fixes. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: akpm@linux-foundation.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-12 08:45:10 +02:00
Alok Kataria	e93ef949fd	x86: rename paravirtualized TSC functions Rename the paravirtualized calculate_cpu_khz to calibrate_tsc. In all cases, we actually calibrate_tsc and use that as the cpu_khz value. Signed-off-by: Alok N Kataria <akataria@vmware.com> Signed-off-by: Dan Hecht <dhecht@vmware.com> Cc: Dan Hecht <dhecht@vmware.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-09 07:43:28 +02:00
Jeremy Fitzhardinge	fab58420ac	x86/paravirt, 64-bit: add adjust_exception_frame 64-bit Xen pushes a couple of extra words onto an exception frame. Add a hook to deal with them. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: xen-devel <xen-devel@lists.xensource.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-08 13:15:57 +02:00
Jeremy Fitzhardinge	d75cd22fdd	x86/paravirt: split sysret and sysexit Don't conflate sysret and sysexit; they're different instructions with different semantics, and may be in use at the same time (at least within the same kernel, depending on whether its an Intel or AMD system). sysexit - just return to userspace, does no register restoration of any kind; must explicitly atomically enable interrupts. sysret - reloads flags from r11, so no need to explicitly enable interrupts on 64-bit, responsible for restoring usermode %gs Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citirx.com> Cc: xen-devel <xen-devel@lists.xensource.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-08 13:13:15 +02:00
Jeremy Fitzhardinge	eba0045ff8	x86/paravirt: add a pgd_alloc/free hooks Add hooks which are called at pgd_alloc/free time. The pgd_alloc hook may return an error code, which if non-zero, causes the pgd allocation to be failed. The hooks may be used to allocate/free auxillary per-pgd information. also fix: > * Ingo Molnar <mingo@elte.hu> wrote: > > include/asm/pgalloc.h: In function ‘paravirt_pgd_free': > include/asm/pgalloc.h:14: error: parameter name omitted > arch/x86/kernel/entry_64.S: In file included from > arch/x86/kernel/traps_64.c:51:include/asm/pgalloc.h: In function ‘paravirt_pgd_free': > include/asm/pgalloc.h:14: error: parameter name omitted Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: xen-devel <xen-devel@lists.xensource.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-08 13:11:01 +02:00
Jeremy Fitzhardinge	88a6846c70	xen: set max_pfn_mapped Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: the arch/x86 maintainers <x86@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-08 12:48:30 +02:00
Ingo Molnar	6236af82d8	Merge branch 'x86/fixmap' into x86/devel Conflicts: arch/x86/mm/init_64.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-08 12:24:29 +02:00
Ingo Molnar	3de352bbd8	Merge branch 'x86/mpparse' into x86/devel Conflicts: arch/x86/Kconfig arch/x86/kernel/io_apic_32.c arch/x86/kernel/setup_64.c arch/x86/mm/init_32.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-08 11:14:58 +02:00
Ingo Molnar	896395c290	Merge branch 'linus' into tmp.x86.mpparse.new	2008-07-08 10:32:56 +02:00
Jens Axboe	3b16cf8748	x86: convert to generic helpers for IPI function calls This converts x86, x86-64, and xen to use the new helpers for smp_call_function() and friends, and adds support for smp_call_function_single(). Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-06-26 11:21:54 +02:00
Jeremy Fitzhardinge	e57778a1e3	xen: implement ptep_modify_prot_start/commit Xen has a pte update function which will update a pte while preserving its accessed and dirty bits. This means that ptep_modify_prot_start() can be implemented as a simple read of the pte value. The hardware may update the pte in the meantime, but ptep_modify_prot_commit() updates it while preserving any changes that may have happened in the meantime. The updates in ptep_modify_prot_commit() are batched if we're currently in lazy mmu mode. The mmu_update hypercall can take a batch of updates to perform, but this code doesn't make particular use of that feature, in favour of using generic multicall batching to get them all into the hypervisor. The net effect of this is that each mprotect pte update turns from two expensive trap-and-emulate faults into they hypervisor into a single hypercall whose cost is amortized in a batched multicall. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-06-25 15:17:23 +02:00
Jeremy Fitzhardinge	08b882c627	paravirt: add hooks for ptep_modify_prot_start/commit This patch adds paravirt-ops hooks in pv_mmu_ops for ptep_modify_prot_start and ptep_modify_prot_commit. This allows the hypervisor-specific backends to implement these in some more efficient way. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-06-25 15:16:00 +02:00
Ingo Molnar	8b7ef4ec5b	Merge branch 'linus' into x86/fixmap	2008-06-25 12:30:21 +02:00
Jeremy Fitzhardinge	2849914393	xen: remove support for non-PAE 32-bit Non-PAE operation has been deprecated in Xen for a while, and is rarely tested or used. xen-unstable has now officially dropped non-PAE support. Since Xen/pvops' non-PAE support has also been broken for a while, we may as well completely drop it altogether. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-06-24 17:00:55 +02:00
Jeremy Fitzhardinge	aeaaa59c7e	x86/paravirt/xen: add set_fixmap pv_mmu_ops Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-06-20 15:09:56 +02:00
Jeremy Fitzhardinge	05345b0f00	xen: mask unwanted pte bits in __supported_pte_mask [ Stable: this isn't a bugfix in itself, but it's a pre-requiste for "xen: don't drop NX bit" ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org> Cc: the arch/x86 maintainers <x86@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-06-20 14:56:36 +02:00
Jeremy Fitzhardinge	eb179e443d	xen: mask unwanted pte bits in __supported_pte_mask [ Stable: this isn't a bugfix in itself, but it's a pre-requiste for "xen: don't drop NX bit" ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org> Cc: the arch/x86 maintainers <x86@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-06-20 14:55:11 +02:00
Jeremy Fitzhardinge	9c7a794209	xen: restore vcpu_info mapping If we're using vcpu_info mapping, then make sure its restored on all processors before relasing them from stop_machine. The only complication is that if this fails, we can't continue because we've already made assumptions that the mapping is available (baked in calls to the _direct versions of the functions, for example). Fortunately this can only happen with a 32-bit hypervisor, which may possibly run out of mapping space. On a 64-bit hypervisor, this is a non-issue. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-06-02 13:25:34 +02:00
Jeremy Fitzhardinge	e2426cf85f	xen: avoid hypercalls when updating unpinned pud/pmd When operating on an unpinned pagetable (ie, one under construction or destruction), it isn't necessary to use a hypercall to update a pud/pmd entry. Jan Beulich observed that a similar optimisation avoided many thousands of hypercalls while doing a kernel build. One tricky part is that early in the kernel boot there's no page structure, so we can't check to see if the page is pinned. In that case, we just always use the hypercall. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Jan Beulich <jbeulich@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-06-02 13:24:40 +02:00
Yinghai Lu	f0d43100f1	x86: extend e820 early_res support 32bit -fix #3 introduce init_pg_table_start, so xen PV could specify the value. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-05-31 09:55:47 +02:00
Jeremy Fitzhardinge	0e91398f2a	xen: implement save/restore This patch implements Xen save/restore and migration. Saving is triggered via xenbus, which is polled in drivers/xen/manage.c. When a suspend request comes in, the kernel prepares itself for saving by: 1 - Freeze all processes. This is primarily to prevent any partially-completed pagetable updates from confusing the suspend process. If CONFIG_PREEMPT isn't defined, then this isn't necessary. 2 - Suspend xenbus and other devices 3 - Stop_machine, to make sure all the other vcpus are quiescent. The Xen tools require the domain to run its save off vcpu0. 4 - Within the stop_machine state, it pins any unpinned pgds (under construction or destruction), performs canonicalizes various other pieces of state (mostly converting mfns to pfns), and finally 5 - Suspend the domain Restore reverses the steps used to save the domain, ending when all the frozen processes are thawed. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-05-27 10:11:38 +02:00
Jeremy Fitzhardinge	d5edbc1f75	xen: add p2m mfn_list_list When saving a domain, the Xen tools need to remap all our mfns to portable pfns. In order to remap our p2m table, it needs to know where all its pages are, so maintain the references to the p2m table for it to use. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-05-27 10:11:37 +02:00
Jeremy Fitzhardinge	a0d695c821	xen: make dummy_shared_info non-static Rename dummy_shared_info to xen_dummy_shared_info and make it non-static, in anticipation of users outside of enlighten.c Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-05-27 10:11:37 +02:00
Jeremy Fitzhardinge	d451bb7aa8	xen: make phys_to_machine structure dynamic We now support the use of memory hotplug, so the physical to machine page mapping structure must be dynamic. This is implemented as a two-level radix tree structure, which allows us to efficiently incrementally allocate memory for the p2m table as new pages are added. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-05-27 10:11:37 +02:00
Jeremy Fitzhardinge	83abc70a4c	xen: make earlyprintk=xen work again For some perverse reason, if you call add_preferred_console() it prevents setup_early_printk() from successfully enabling the boot console - unless you make it a preferred console too... Also, make xenboot console output distinct from normal console output, since it gets repeated when the console handover happens, and the duplicated output is confusing without disambiguation. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Markus Armbruster <armbru@redhat.com> Cc: Gerd Hoffmann <kraxel@redhat.com>	2008-05-27 10:11:36 +02:00
Markus Armbruster	9e124fe16f	xen: Enable console tty by default in domU if it's not a dummy Without console= arguments on the kernel command line, the first console to register becomes enabled and the preferred console (the one behind /dev/console). This is normally tty (assuming CONFIG_VT_CONSOLE is enabled, which it commonly is). This is okay as long tty is a useful console. But unless we have the PV framebuffer, and it is enabled for this domain, tty0 in domU is merely a dummy. In that case, we want the preferred console to be the Xen console hvc0, and we want it without having to fiddle with the kernel command line. Commit `b8c2d3dfbc` did that for us. Since we now have the PV framebuffer, we want to enable and prefer tty again, but only when PVFB is enabled. But even then we still want to enable the Xen console as well. Problem: when tty registers, we can't yet know whether the PVFB is enabled. By the time we can know (xenstore is up), the console setup game is over. Solution: enable console tty by default, but keep hvc as the preferred console. Change the preferred console to tty when PVFB probes successfully, unless we've been given console kernel parameters. Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-05-27 10:11:36 +02:00
Jeremy Fitzhardinge	a15af1c9ea	x86/paravirt: add pte_flags to just get pte flags Add pte_flags() to extract the flags from a pte. This is a special case of pte_val() which is only guaranteed to return the pte's flags correctly; the page number may be corrupted or missing. The intent is to allow paravirt implementations to return pte flags without having to do any translation of the page number (most notably, Xen). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-05-27 10:11:36 +02:00
Jeremy Fitzhardinge	239d1fc04e	xen: don't worry about preempt during xen_irq_enable() When enabling interrupts, we don't need to worry about preemption, because we either enter with interrupts disabled - so no preemption - or the caller is confused and is re-enabling interrupts on some indeterminate processor. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-05-27 10:11:35 +02:00
Jeremy Fitzhardinge	2956a3511c	xen: allow some cr4 updates The guest can legitimately change things like cr4.OSFXSR and OSXMMEXCPT, so let it. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-05-27 10:11:35 +02:00
Jeremy Fitzhardinge	349c709f42	xen: use new sched_op Use the new sched_op hypercall, mainly because xenner doesn't support the old one. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-05-27 10:11:35 +02:00
Jeremy Fitzhardinge	7b1333aa4c	xen: use hypercall rather than clts Xen will trap and emulate clts, but its better to use a hypercall. Also, xenner doesn't handle clts. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-05-27 10:11:35 +02:00
Jeremy Fitzhardinge	3843fc2575	xen: remove support for non-PAE 32-bit Non-PAE operation has been deprecated in Xen for a while, and is rarely tested or used. xen-unstable has now officially dropped non-PAE support. Since Xen/pvops' non-PAE support has also been broken for a while, we may as well completely drop it altogether. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-05-22 18:42:49 +02:00
Jeremy Fitzhardinge	af7ae3b9c4	xen: allow compilation with non-flat memory There's no real reason we can't support sparsemem/discontigmem, so do so. This is mostly useful to support hotplug memory. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-24 23:57:33 +02:00
Jeremy Fitzhardinge	41e332b2a2	xen: disable preemption during tlb flush Various places in the kernel flush the tlb even though preemption doens't guarantee the tlb flush is happening on any particular CPU. In many cases this doesn't seem to matter, so don't make a fuss about it. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-24 23:57:33 +02:00
Jeremy Fitzhardinge	dbe9e994c9	xen: no need for domU to worry about MCE/MCA Mask MCE/MCA out of cpu caps. Its harmless to leave them there, but it does prevent the kernel from starting an unnecessary thread. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-24 23:57:32 +02:00
Jeremy Fitzhardinge	e2a81baf66	xen: support sysenter/sysexit if hypervisor does 64-bit Xen supports sysenter for 32-bit guests, so support its use. (sysenter is faster than int $0x80 in 32-on-64.) sysexit is still not supported, so we fake it up using iret. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge	6944a9c894	x86: rename paravirt_alloc_pt etc after the pagetable structure Rename (alloc\|release)_(pt\|pd) to pte/pmd to explicitly match the name of the appropriate pagetable level structure. [ x86.git merge work by Mark McLoughlin <markmc@redhat.com> ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge	81e103f1f1	xen: use iret instruction all the time Change iret implementation to not be dependent on direct-access vcpu structure. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-17 17:40:51 +02:00
Markus Armbruster	b8c2d3dfbc	xen: make hvc0 the preferred console in domU This makes the Xen console just work. Before, you had to ask for it on the kernel command line with console=hvc0 Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-17 17:40:51 +02:00
Mark McLoughlin	c946c7de49	xen: Clear PG_pinned in release_{pt,pd}() Signed-off-by: Mark McLoughlin <markmc@redhat.com> Cc: xen-devel@lists.xensource.com Cc: Mark McLoughlin <markmc@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-04 18:36:48 +02:00
Mark McLoughlin	a684d69d15	xen: Do not pin/unpin PMD pages i.e. with this simple test case: int fd = open("/dev/zero", O_RDONLY); munmap(mmap((void *)0x40000000, 0x1000_LEN, PROT_READ, MAP_PRIVATE, fd, 0), 0x1000); close(fd); we currently get: kernel BUG at arch/x86/xen/enlighten.c:678! ... EIP is at xen_release_pt+0x79/0xa9 ... Call Trace: [<c041da25>] ? __pmd_free_tlb+0x1a/0x75 [<c047a192>] ? free_pgd_range+0x1d2/0x2b5 [<c047a2f3>] ? free_pgtables+0x7e/0x93 [<c047b272>] ? unmap_region+0xb9/0xf5 [<c047c1bd>] ? do_munmap+0x193/0x1f5 [<c047c24f>] ? sys_munmap+0x30/0x3f [<c0408cce>] ? syscall_call+0x7/0xb ======================= and xen complains: (XEN) mm.c:2241:d4 Mfn 1cc37 not pinned Further details at: https://bugzilla.redhat.com/436453 Signed-off-by: Mark McLoughlin <markmc@redhat.com> Cc: xen-devel@lists.xensource.com Cc: Mark McLoughlin <markmc@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-04 18:36:48 +02:00
Mark McLoughlin	f64337062c	xen: refactor xen_{alloc,release}_{pt,pd}() Signed-off-by: Mark McLoughlin <markmc@redhat.com> Cc: xen-devel@lists.xensource.com Cc: Mark McLoughlin <markmc@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-04 18:36:48 +02:00
Jeremy Fitzhardinge	2e8fe719b5	xen: fix UP setup of shared_info We need to set up the shared_info pointer once we've mapped the real shared_info into its fixmap slot. That needs to happen once the general pagetable setup has been done. Previously, the UP shared_info was set up one in xen_start_kernel, but that was left pointing to the dummy shared info. Unfortunately there's no really good place to do a later setup of the shared_info in UP, so just do it once the pagetable setup has been done. [ Stable: needed in 2.6.24.x ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-03-27 16:08:45 +01:00
Jeremy Fitzhardinge	04c44a080d	xen: fix RMW when unmasking events xen_irq_enable_direct and xen_sysexit were using "andw $0x00ff, XEN_vcpu_info_pending(vcpu)" to unmask events and test for pending ones in one instuction. Unfortunately, the pending flag must be modified with a locked operation since it can be set by another CPU, and the unlocked form of this operation was causing the pending flag to get lost, allowing the processor to return to usermode with pending events and ultimately deadlock. The simple fix would be to make it a locked operation, but that's rather costly and unnecessary. The fix here is to split the mask-clearing and pending-testing into two instructions; the interrupt window between them is of no concern because either way pending or new events will be processed. This should fix lingering bugs in using direct vcpu structure access too. [ Stable: needed in 2.6.24.x ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-03-27 16:08:45 +01:00

1 2 3 4 5 ...

274 Commits