linux

Commit Graph

Author	SHA1	Message	Date
Rusty Russell	87c7d57c17	virtio: hand virtio ring alignment as argument to vring_new_virtqueue This allows each virtio user to hand in the alignment appropriate to their virtio_ring structures. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>	2008-12-30 09:26:03 +10:30
Rusty Russell	2966af73e7	virtio: use LGUEST_VRING_ALIGN instead of relying on pagesize This doesn't really matter, since lguest is i386 only at the moment, but we could actually choose a different value. (lguest doesn't have a guarenteed ABI). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-12-30 09:26:02 +10:30
Yinghai Lu	b77b881f21	x86: fix lguest used_vectors breakage, -v2 Impact: fix lguest, clean up 32-bit lguest used used_vectors to record vectors, but that model of allocating vectors changed and got broken, after we changed vector allocation to a per_cpu array. Try enable that for 64bit, and the array is used for all vectors that are not managed by vector_irq per_cpu array. Also kill system_vectors[], that is now a duplication of the used_vectors bitmap. [ merged in cpus4096 due to io_apic.c cpumask changes. ] [ -v2, fix build failure ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-23 22:37:28 +01:00
Rusty Russell	1dc3e3bcbf	lguest: update commentry Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-08-26 00:19:28 +10:00
Rusty Russell	71a3f4edc1	lguest: use get_user_pages_fast() instead of get_user_pages() Using a simple page table thrashing program I measure a slight improvement. The program creates five processes. Each touches 1000 pages then schedules the next process. We repeat this 1000 times. As lguest only caches 4 cr3 values, this rebuilds a lot of shadow page tables requiring virt->phys mappings. Before: 5.93 seconds After: 5.40 seconds (Counts of slow vs fastpath in this usage are 6092 and 2852462 respectively.) And more importantly for lguest, the code is simpler. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-08-12 17:52:53 +10:00
Andrew Morton	cf485e566b	lguest: use cpu capability accessors To support my little make-x86-bitops-use-proper-typechecking projectlet. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrea Arcangeli <andrea@qumranet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-07-29 09:58:34 +10:00
Johannes Weiner	0a707210aa	lguest: fix switcher_page leak on unload map_switcher allocates the array, unmap_switcher has to free it accordingly. Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-07-29 09:58:32 +10:00
Rusty Russell	0c12091d82	lguest: Guest int3 fix Ron Minnich noticed that guest userspace gets a GPF when it tries to int3: we need to copy the privilege level from the guest-supplied IDT to the real IDT. int3 is the only common case where guest userspace expects to invoke an interrupt, so that's the symptom of failing to do this. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-07-29 09:58:31 +10:00
Rusty Russell	e34f872567	virtio: Add transport feature handling stub for virtio_ring. To prepare for virtio_ring transport feature bits, hook in a call in all the users to manipulate them. This currently just clears all the bits, since it doesn't understand any features. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-07-25 12:06:14 +10:00
Rusty Russell	c624896e48	virtio: Rename set_features to finalize_features Rather than explicitly handing the features to the lower-level, we just hand the virtio_device and have it set the features. This make it clear that it has the chance to manipulate the features of the device at this point (and that all feature negotiation is already done). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-07-25 12:06:12 +10:00
Ingo Molnar	1a781a777b	Merge branch 'generic-ipi' into generic-ipi-for-linus Conflicts: arch/powerpc/Kconfig arch/s390/kernel/time.c arch/x86/kernel/apic_32.c arch/x86/kernel/cpu/perfctr-watchdog.c arch/x86/kernel/i8259_64.c arch/x86/kernel/ldt.c arch/x86/kernel/nmi_64.c arch/x86/kernel/smpboot.c arch/x86/xen/smp.c include/asm-x86/hw_irq_32.h include/asm-x86/hw_irq_64.h include/asm-x86/mach-default/irq_vectors.h include/asm-x86/mach-voyager/irq_vectors.h include/asm-x86/smp.h kernel/Makefile Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-15 21:55:59 +02:00
Ingo Molnar	15e551d25e	x86, VisWS: turn into generic arch, eliminate Kconfig specials remove leftover traces of various VISWS related Kconfig specials. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-10 18:55:47 +02:00
Jens Axboe	15c8b6c1aa	on_each_cpu(): kill unused 'retry' parameter It's not even passed on to smp_call_function() anymore, since that was removed. So kill it. Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-06-26 11:24:38 +02:00
Ingo Molnar	d02859ecb3	Merge commit 'v2.6.26-rc8' into x86/xen Conflicts: arch/x86/xen/enlighten.c arch/x86/xen/mmu.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-06-25 12:16:51 +02:00
Suresh Siddha	54481cf88b	x86: fix NULL pointer deref in __switch_to I am able to reproduce the oops reported by Simon in __switch_to() with lguest. My debug showed that there is at least one lguest specific issue (which should be present in 2.6.25 and before aswell) and it got exposed with a kernel oops with the recent fpu dynamic allocation patches. In addition to the previous possible scenario (with fpu_counter), in the presence of lguest, it is possible that the cpu's TS bit it still set and the lguest launcher task's thread_info has TS_USEDFPU still set. This is because of the way the lguest launcher handling the guest's TS bit. (look at lguest_set_ts() in lguest_arch_run_guest()). This can result in a DNA fault while doing unlazy_fpu() in __switch_to(). This will end up causing a DNA fault in the context of new process thats getting context switched in (as opossed to handling DNA fault in the context of lguest launcher/helper process). This is wrong in both pre and post 2.6.25 kernels. In the recent 2.6.26-rc series, this is showing up as NULL pointer dereferences or sleeping function called from atomic context(__switch_to()), as we free and dynamically allocate the FPU context for the newly created threads. Older kernels might show some FPU corruption for processes running inside of lguest. With the appended patch, my test system is running for more than 50 mins now. So atleast some of your oops (hopefully all!) should get fixed. Please give it a try. I will spend more time with this fix tomorrow. Reported-by: Simon Holm Thøgersen <odie@cs.aau.dk> Reported-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-06-20 13:26:18 +02:00
Ingo Molnar	688d22e23a	Merge branch 'linus' into x86/xen	2008-06-16 11:21:27 +02:00
Rusty Russell	b769f57908	virtio: set device index in common code. Anthony Liguori points out that three different transports use the virtio code, but each one keeps its own counter to set the virtio_device's index field. In theory (though not in current practice) this means that names could be duplicated, and that risk grows as more transports are created. So we move the selection of the unique virtio_device.index into the common code in virtio.c, which has the side-benefit of removing duplicate code. The only complexity is that lguest and S/390 use the index to uniquely identify the device in case of catastrophic failure before register_virtio_device() is called: now we use the offset within the descriptor page as a unique identifier for the printks. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Chris Lalancette <clalance@redhat.com> Cc: Anthony Liguori <anthony@codemonkey.ws>	2008-05-30 15:09:42 +10:00
Rusty Russell	e27810f113	lguest: use ioremap_cache, not ioremap Thanks to Jon Corbet & LWN. Only took me a day to join the dots. Host->Guest netcat before (with unnecessily large receive buffers): 1073741824 bytes (1.1 GB) copied, 24.7528 seconds, 43.4 MB/s After: 1073741824 bytes (1.1 GB) copied, 17.6369 seconds, 60.9 MB/s Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-05-30 15:09:41 +10:00
Jeremy Fitzhardinge	a15af1c9ea	x86/paravirt: add pte_flags to just get pte flags Add pte_flags() to extract the flags from a pte. This is a special case of pte_val() which is only guaranteed to return the pte's flags correctly; the page number may be corrupted or missing. The intent is to allow paravirt implementations to return pte flags without having to do any translation of the page number (most notably, Xen). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-05-27 10:11:36 +02:00
Rusty Russell	a007a751d9	lguest: make Launcher see device status updates This brings us closer to Real Life, where we'd examine the device features once it's set the DRIVER_OK status bit. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-05-02 21:50:54 +10:00
Rusty Russell	9f3f746741	lguest: remove bogus NULL cpu check If lg isn't NULL, and cpu_id is sane, &lg->cpus[cpu_id] can't be NULL. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-05-02 21:50:52 +10:00
Rusty Russell	24adf12722	lguest: avoid using NR_CPUS as a bounds check. NR_CPUS (being a host number) is an arbitrary limit for the Guest. Using the array size directly (which currently happes to be NR_CPUS) is more futureproof. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-05-02 21:50:51 +10:00
Rusty Russell	c45a6816c1	virtio: explicit advertisement of driver features A recent proposed feature addition to the virtio block driver revealed some flaws in the API: in particular, we assume that feature negotiation is complete once a driver's probe function returns. There is nothing in the API to require this, however, and even I didn't notice when it was violated. So instead, we require the driver to specify what features it supports in a table, we can then move the feature negotiation into the virtio core. The intersection of device and driver features are presented in a new 'features' bitmap in the struct virtio_device. Note that this highlights the difference between Linux unsigned-long bitmaps where each unsigned long is in native endian, and a straight-forward little-endian array of bytes. Drivers can still remove feature bits in their probe routine if they really have to. API changes: - dev->config->feature() no longer gets and acks a feature. - drivers should advertise their features in the 'feature_table' field - use virtio_has_feature() for extra sanity when checking feature bits Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-05-02 21:50:50 +10:00
Matthew Wilcox	d3135846f6	drivers: Remove unnecessary inclusions of asm/semaphore.h None of these files use any of the functionality promised by asm/semaphore.h. It's possible that they rely on it dragging in some unrelated header file, but I can't build all these files, so we'll have fix any build failures as they come up. Signed-off-by: Matthew Wilcox <willy@linux.intel.com>	2008-04-18 22:16:32 -04:00
Al Viro	74dbf719ed	misc __user misannotations (pointless casts to long) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-30 14:20:23 -07:00
Rusty Russell	a6bd8e1303	lguest: comment documentation update. Took some cycles to re-read the Lguest Journey end-to-end, fix some rot and tighten some phrases. Only comments change. No new jokes, but a couple of recycled old jokes. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-03-28 11:05:54 +11:00
Tim Ansell	b488f22d70	lguest: Add puppies which where previously missing. lguest doesn't have features, it has puppies! Signed-off-by: Timothy R Ansell <mithro@mithis.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-03-28 11:05:52 +11:00
Rusty Russell	4357bd9453	lguest: Revert `1ce70c4fac`, fix real problem. Ahmed managed to crash the Host in release_pgd(), which cannot be a Guest bug, and indeed it wasn't. The bug was that handing a 0 as the address of the toplevel page table being manipulated can cause the lookup code in find_pgdir() to return an uninitialized cache entry (we shadow up to 4 top level page tables for each Guest). Commit `37cc8d7f96` introduced this behaviour in the Guest, uncovering the bug. The patch which he submitted (which removed the /4 from the index calculation) simply ensured that these high-indexed entries hit the early exit path of guest_set_pmd(). But you get lots of segfaults in guest userspace as the PMDs aren't being updated. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-03-11 09:35:58 +11:00
Rusty Russell	f14ae652ba	lguest: fix __get_vm_area usage. Robert Bragg's `5dc3318528` tightened (ie. fixed) the checking in __get_vm_area, and it broke lguest. lguest should pass the exact "end" it wants, not some random constant (it was possible previously that it would actually get an address different from SWITCHER_ADDR). Also, Fabio Checconi pointed out that we should make sure we're not hitting the fixmap area. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Robert Bragg <robert@sixbynine.org>	2008-03-11 09:35:56 +11:00
Eugene Teo	f73d1e6ca6	lguest: make sure cpu is initialized before accessing it If req is LHREQ_INITIALIZE, and the guest has been initialized before (unlikely), it will attempt to access cpu->tsk even though cpu is not yet initialized. Signed-off-by: Eugene Teo <eugeneteo@kernel.sg> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-03-11 09:35:56 +11:00
Ahmed S. Darwish	31f4b46ec6	lguest: accept guest _PAGE_PWT page table entries Beginning from commit `4138cc3418`, ioremap_nocache() sets the _PAGE_PWT flag. Lguest doesn't accept a guest pte with a _PWT flag and reports a "bad page table entry" in that case. Accept guest _PAGE_PWT page table entries. Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-02-09 23:24:09 +01:00
Alexey Dobriyan	25478445c4	Fix container_of() usage Using "attr" twice is not OK, because it effectively prohibits such container_of() on variables not named "attr". Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:32 -08:00
Rusty Russell	6e5aa7efb2	virtio: reset function A reset function solves three problems: 1) It allows us to renegotiate features, eg. if we want to upgrade a guest driver without rebooting the guest. 2) It gives us a clean way of shutting down virtqueues: after a reset, we know that the buffers won't be used by the host, and 3) It helps the guest recover from messed-up drivers. So we remove the ->shutdown hook, and the only way we now remove feature bits is via reset. We leave it to the driver to do the reset before it deletes queues: the balloon driver, for example, needs to chat to the host in its remove function. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-02-04 23:50:03 +11:00
Rusty Russell	18445c4d50	virtio: explicit enable_cb/disable_cb rather than callback return. It seems that virtio_net wants to disable callbacks (interrupts) before calling netif_rx_schedule(), so we can't use the return value to do so. Rename "restart" to "cb_enable" and introduce "cb_disable" hook: callback now returns void, rather than a boolean. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-02-04 23:49:58 +11:00
Rusty Russell	a586d4f601	virtio: simplify config mechanism. Previously we used a type/len pair within the config space, but this seems overkill. We now simply define a structure which represents the layout in the config space: the config space can now only be extended at the end. The main driver-visible changes: 1) We indicate what fields are present with an explicit feature bit. 2) Virtqueues are explicitly numbered, and not in the config space. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-02-04 23:49:57 +11:00
Rusty Russell	e95035c61a	lguest: fix mis-merge against hpa's TSS renaming drivers/lguest/x86/core.c: In function ‘copy_in_guest_info’: drivers/lguest/x86/core.c:97: error: ‘struct x86_hw_tss’ has no member named ‘esp1’ Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-01-31 19:59:44 +11:00
Linus Torvalds	d145c7253c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (27 commits) lguest: use __PAGE_KERNEL instead of _PAGE_KERNEL lguest: Use explicit includes rateher than indirect lguest: get rid of lg variable assignments lguest: change gpte_addr header lguest: move changed bitmap to lg_cpu lguest: move last_pages to lg_cpu lguest: change last_guest to last_cpu lguest: change spte_addr header lguest: per-vcpu lguest pgdir management lguest: make pending notifications per-vcpu lguest: makes special fields be per-vcpu lguest: per-vcpu lguest task management lguest: replace lguest_arch with lg_cpu_arch. lguest: make registers per-vcpu lguest: make emulate_insn receive a vcpu struct. lguest: map_switcher_in_guest() per-vcpu lguest: per-vcpu interrupt processing. lguest: per-vcpu lguest timers lguest: make hypercalls use the vcpu struct lguest: make write() operation smp aware ... Manual conflict resolved (maybe even correctly, who knows) in drivers/lguest/x86/core.c	2008-01-31 09:35:32 +11:00
H. Peter Anvin	faca62273b	x86: use generic register name in the thread and tss structures This changes size-specific register names (eip/rip, esp/rsp, etc.) to generic names in the thread and tss structures. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:31:02 +01:00
Glauber de Oliveira Costa	84f12e39c8	lguest: use __PAGE_KERNEL instead of _PAGE_KERNEL x86_64 don't expose the intermediate representation with one underline, _PAGE_KERNEL, just the double-underlined one. Use it, to get a common ground between 32 and 64-bit Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:19 +11:00
Glauber de Oliveira Costa	ca94f2bdd1	lguest: Use explicit includes rateher than indirect explicitly use ktime.h include explicitly use hrtimer.h include explicitly use sched.h include This patch adds headers explicitly to lguest sources file, to avoid depending on them being included somewhere else. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:19 +11:00
Glauber de Oliveira Costa	382ac6b3fb	lguest: get rid of lg variable assignments We can save some lines of code by getting rid of *lg = cpu... lines of code spread everywhere by now. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:18 +11:00
Glauber de Oliveira Costa	934faab464	lguest: change gpte_addr header gpte_addr() does not depend on any guest information. So we wipe out the lg parameter from it completely. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:18 +11:00
Glauber de Oliveira Costa	ae3749dcd8	lguest: move changed bitmap to lg_cpu events represented in the 'changed' bitmap are per-cpu, not per-guest. move it to the lg_cpu structure Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:17 +11:00
Glauber de Oliveira Costa	f34f8c5fea	lguest: move last_pages to lg_cpu in our new model, pages are assigned to a virtual cpu, not to a guest. We move it to the lg_cpu structure. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:16 +11:00
Glauber de Oliveira Costa	c40a9f4719	lguest: change last_guest to last_cpu in our model, a guest does not run in a cpu anymore: a virtual cpu does. So we change last_guest to last_cpu Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:15 +11:00
Glauber de Oliveira Costa	2092aa277b	lguest: change spte_addr header spte_addr does not depend on any guest information, so we wipe out the lg parameter completely. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:15 +11:00
Glauber de Oliveira Costa	1713608f28	lguest: per-vcpu lguest pgdir management this patch makes the pgdir management per-vcpu. The pgdirs pool is still guest-wide (although it'll probably need to grow when we are really executing more vcpus), but the pgdidx index is gone, since it makes no sense anymore. Instead, we use a per-vcpu index. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:14 +11:00
Glauber de Oliveira Costa	5e232f4f42	lguest: make pending notifications per-vcpu this patch makes the pending_notify field, used to control pending notifications, per-vcpu, instead of per-guest Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:13 +11:00
Glauber de Oliveira Costa	4665ac8e28	lguest: makes special fields be per-vcpu lguest struct have room for some fields, namely, cr2, ts, esp1 and ss1, that are not really guest-wide, but rather, vcpu-wide. This patch puts it in the vcpu struct Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:13 +11:00
Glauber de Oliveira Costa	66686c2ab0	lguest: per-vcpu lguest task management lguest uses tasks to control its running behaviour (like sending breaks, controlling halted state, etc). In a per-vcpu environment, each vcpu will have its own underlying task. So this patch makes the infrastructure for that possible Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:12 +11:00
Glauber de Oliveira Costa	fc708b3e40	lguest: replace lguest_arch with lg_cpu_arch. The fields found in lguest_arch are not really per-guest, but per-cpu (gdt, idt, etc). So this patch turns lguest_arch into lg_cpu_arch. It makes sense to have a per-guest per-arch struct, but this can be addressed later, when the need arrives. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:11 +11:00
Glauber de Oliveira Costa	a53a35a8b4	lguest: make registers per-vcpu This is the most obvious per-vcpu field: registers. So this patch moves it from struct lguest to struct vcpu, and patch the places in which they are used, accordingly Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:11 +11:00
Glauber de Oliveira Costa	a3863f68b0	lguest: make emulate_insn receive a vcpu struct. emulate_insn() needs to know about current eip, which will be, in the future, a per-vcpu thing. So in this patch, the function prototype is modified to receive a vcpu struct Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:10 +11:00
Glauber de Oliveira Costa	0c78441cf4	lguest: map_switcher_in_guest() per-vcpu The switcher needs to be mapped per-vcpu, because different vcpus will potentially have different page tables (they don't have to, because threads will share the same). So our first step is the make the function receive a vcpu struct Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:09 +11:00
Glauber de Oliveira Costa	177e449dc5	lguest: per-vcpu interrupt processing. This patch adapts interrupt processing for using the vcpu struct. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:09 +11:00
Glauber de Oliveira Costa	ad8d8f3bc6	lguest: per-vcpu lguest timers Here, I introduce per-vcpu timers. With this, we can have local expiries, needed for accounting time in smp guests Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:08 +11:00
Glauber de Oliveira Costa	73044f05a4	lguest: make hypercalls use the vcpu struct this patch changes do_hcall() and do_async_hcall() interfaces (and obviously their callers) to get a vcpu struct. Again, a vcpu services the hypercall, not the whole guest Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:08 +11:00
Glauber de Oliveira Costa	7ea07a1500	lguest: make write() operation smp aware This patch makes the write() file operation smp aware. Which means, receiving the vcpu_id value through the offset parameter, and being well aware to which vcpu we're talking to. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:07 +11:00
Glauber de Oliveira Costa	d0953d42c3	lguest: per-cpu run guest This patch makes the run_guest() routine use the lg_cpu struct. This is required since in a smp guest environment, there's no more the notion of "running the guest", but rather, it is "running the vcpu" Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:06 +11:00
Glauber de Oliveira Costa	4dcc53da49	lguest: initialize vcpu this patch initializes the first vcpu in the initialize() routing, which is responsible for starting the process of putting the guest up. right now, as much of the fields are still not per-vcpu, it does not do much. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:06 +11:00
Glauber de Oliveira Costa	badb1e0402	lguest: introduce vcpu struct this patch introduces a vcpu struct for lguest. In upcoming patches, more and more fields will be moved from the lguest struct to the vcpu Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:04 +11:00
Balaji Rao	ec04b13f67	lguest: Reboot support Reboot Implemented (Prevent fd leak, fix style and fix documentation --RR) Signed-off-by: Balaji Rao <balajirrao@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:04 +11:00
Glauber de Oliveira Costa	5c55841d16	lguest: remove pv_info dependency Currently, lguest module can't be compiled without the PARAVIRT flag being on. This is a fake dependency, since the module itself shouldn't need any paravirt override. Reason for that is the reference to pv_info structure in initial loading tests. This patch removes it in favour of a more generic error message. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:03 +11:00
Gautham R Shenoy	86ef5c9a8e	cpu-hotplug: replace lock_cpu_hotplug() with get_online_cpus() Replace all lock_cpu_hotplug/unlock_cpu_hotplug from the kernel and use get_online_cpus and put_online_cpus instead as it highlights the refcount semantics in these operations. The new API guarantees protection against the cpu-hotplug operation, but it doesn't guarantee serialized access to any of the local data structures. Hence the changes needs to be reviewed. In case of pseries_add_processor/pseries_remove_processor, use cpu_maps_update_begin()/cpu_maps_update_done() as we're modifying the cpu_present_map there. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-25 21:08:02 +01:00
Rusty Russell	a7da60f415	Remove bogus duplicate CONFIG_LGUEST_GUEST entry. It was moved to arch/x86/lguest/Kconfig, but I lost the deletion part in a patch suffle. My confused one-liner "fix" to turn it on is also reverted: `84f7466ee2` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-01-19 21:29:39 -08:00
Rusty Russell	84f7466ee2	Selecting LGUEST should turn on Guest support, as in 2.6.23. There's currently no way to turn on Lguest guest support; the planned Kconfig virtualization reorg didn't get into 2.6.25. This was unnoticed because if you already had CONFIG_LGUEST_GUEST=y in your config, it worked. Too bad about new users... Also, the Kconfig help was wrong now the virtio drivers are merged. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-01-18 14:05:48 -08:00
Rusty Russell	74b2553f1d	virtio: fix module/device unloading The virtio code never hooked through the ->remove callback. Although noone supports device removal at the moment, this code is already needed for module unloading. This of course also revealed bugs in virtio_blk, virtio_net and lguest unloading paths. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-11-19 11:20:42 +11:00
Adrian Bunk	43054412db	lguest_user.c: fix memory leak This patch fixes a memory leak spotted by the Coverity checker. Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-14 18:45:38 -08:00
Rusty Russell	42b36cc0ce	virtio: Force use of power-of-two for descriptor ring sizes The virtio descriptor rings of size N-1 were nicely set up to be aligned to an N-byte boundary. But as Anthony Liguori points out, the free-running indices used by virtio require that the sizes be a power of 2, otherwise we get problems on wrap (demonstrated with lguest). So we replace the clever "2^n-1" scheme with a simple "align to page boundary" scheme: this means that all virtio rings take at least two pages, but it's safer than guessing cache alignment. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-11-12 13:59:40 +11:00
Rusty Russell	e1e72965ec	lguest: documentation update Went through the documentation doing typo and content fixes. This patch contains only comment and whitespace changes. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-10-25 15:02:50 +10:00
Rusty Russell	197bff630a	lguest: remove unused "wake" element from struct lguest Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-10-25 14:10:30 +10:00
Rusty Russell	25c47bb353	lguest: use defines from x86 headers instead of magic numbers Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-10-25 14:09:53 +10:00
Rusty Russell	2d37f94a28	generalize lgread_u32/lgwrite_u32. Jes complains that page table code still uses lgread_u32 even though it now uses general kernel pte types. The best thing to do is to generalize lgread_u32 and lgwrite_u32. This means we lose the efficiency of getuser(). We could potentially regain it if we used __copy_from_user instead of copy_from_user, but I'm not certain that our range check is equivalent to access_ok() on all platforms. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Jes Sorensen <jes@sgi.com>	2007-10-23 15:49:56 +10:00
Rusty Russell	19f1537b7b	Lguest support for Virtio This makes lguest able to use the virtio devices. We change the device descriptor page from a simple array to a variable length "type, config_len, status, config data..." format, and implement virtio_config_ops to read from that config data. We use the virtio ring implementation for an efficient Guest <-> Host virtqueue mechanism, and the new LHCALL_NOTIFY hypercall to kick the host when it changes. We also use LHCALL_NOTIFY on kernel addresses for very very early console output. We could have another hypercall, but this hack works quite well. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-10-23 15:49:56 +10:00
Rusty Russell	15045275c3	Remove old lguest I/O infrrasructure. This patch gets rid of the old lguest host I/O infrastructure and replaces it with a single hypercall "LHCALL_NOTIFY" which takes an address. The main change is the removal of io.c: that mainly did inter-guest I/O, which virtio doesn't yet support. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-10-23 15:49:55 +10:00
Rusty Russell	0ca49ca946	Remove old lguest bus and drivers. This gets rid of the lguest bus, drivers and DMA mechanism, to make way for a generic virtio mechanism. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-10-23 15:49:55 +10:00
Rusty Russell	47436aa4ad	Boot with virtual == physical to get closer to native Linux. 1) This allows us to get alot closer to booting bzImages. 2) It means we don't have to know page_offset. 3) The Guest needs to modify the boot pagetables to create the PAGE_OFFSET mapping before jumping to C code. 4) guest_pa() walks the page tables rather than using page_offset. 5) We don't use page_offset to figure out whether to emulate: it was always kinda quesationable, and won't work for instructions done before remapping (bzImage unpacking in particular). 6) We still want the kernel address for tlb flushing: have the initial hypercall give us that, too. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-10-23 15:49:54 +10:00
Rusty Russell	c18acd73ff	Allow guest to specify syscall vector to use. (Based on Ron Minnich's LGUEST_PLAN9_SYSCALL patch). This patch allows Guests to specify what system call vector they want, and we try to reserve it. We only allow one non-Linux system call vector, to try to avoid DoS on the Host. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-10-23 15:49:53 +10:00
Rusty Russell	ee3db0f2b6	Rename "cr3" to "gpgdir" to avoid x86-specific naming. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-10-23 15:49:53 +10:00
Matias Zabaljauregui	df29f43e65	Pagetables to use normal kernel types This is my first step in the migration of page_tables.c to the kernel types and functions/macros (2.6.23-rc3). Seems to be working OK. Signed-off-by: Matias Zabaljauregui <matias.zabaljauregui@cern.ch> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-10-23 15:49:53 +10:00
Jes Sorensen	d612cde060	Move register setup into i386_core.c Move setup_regs() to lguest_arch_setup_regs() in i386_core.c given that this is very architecture specific. Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-10-23 15:49:52 +10:00
Jes Sorensen	511801dc31	Change example launcher to use unsigned long not u32 Apply Clue 2x4 to lguest userland<->kernel handling code and the lguest launcher. Pointers are not to be passed in u32's! Basic rule of thumb: Anything passing u32's back and forth should be passing unsigned longs to be portable to 64 bit archs. For those who forgotten already, I repeat: NO POINTERS IN u32! Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-10-23 15:49:52 +10:00
Jes Sorensen	b410e7b149	Make hypercalls arch-independent. Clean up the hypercall code to make the code in hypercalls.c architecture independent. First process the common hypercalls and then call lguest_arch_do_hcall() if the call hasn't been handled. Rename struct hcall_ring to hcall_args. This patch requires the previous patch which reorganize the layout of struct lguest_regs on i386 so they match the layout of struct hcall_args. Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-10-23 15:49:52 +10:00
Rusty Russell	cc6d4fbcef	Introduce "hcall" pointer to indicate pending hypercall. Currently we look at the "trapnum" to see if the Guest wants a hypercall. But once the hypercall is done we have to reset trapnum to a bogus value, otherwise if we exit to userspace and return, we'd run the same hypercall twice (that was a nasty bug to find!). This has two main effects: 1) When Jes's patch changes the hypercall args to be a generic "struct hcall_args" we simply change the type of "lg->hcall". It's set by arch code, so if it has to copy args or something it can do so, and point "hcall" into lg->arch somewhere. 2) Async hypercalls only get run when an actual hypercall is pending. This simplfies the code a little and is a more logical semantic. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-10-23 15:49:52 +10:00
Jes Sorensen	4614a3a3b6	Reorder guest saved regs to match hyperall order Move eax next to ebx/ecx/edx in struct lguest_regs on i386, so they will be located together and allow it to map directly to a struct hcall_ring entry (which will be renamed struct hcall_args as in a subsequent patch). This is in preparation for making the code hcall code architecture independent. Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-10-23 15:49:51 +10:00
Jes Sorensen	625efab1cd	Move i386 part of core.c to x86/core.c. Separate i386 architecture specific from core.c and move it to x86/core.c and add x86/lguest.h header file to match. Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-10-23 15:49:51 +10:00
Rusty Russell	56adbe9ddc	Make shadow IDT a complete IDT with 256 entries. This simplifies the code a little, in preparation for allowing alternate system call vectors in guests (Plan 9 uses 0x40). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-10-23 15:49:51 +10:00
Rusty Russell	48245cc070	Remove fixed limit on number of guests, and lguests array. Back when we had all the Guest state in the switcher, we had a fixed array of them. This is no longer necessary. If we switch the network code to using random_ether_addr (46 bits is enough to avoid clashes), we can get rid of the concept of "guest id" altogether. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-10-23 15:49:51 +10:00
Rusty Russell	3c6b5bfa3c	Introduce guest mem offset, static link example launcher In order to avoid problematic special linking of the Launcher, we give the Host an offset: this means we can use any memory region in the Launcher as Guest memory rather than insisting on mmap() at 0. The result is quite pleasing: a number of casts are replaced with simple additions. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-10-23 15:49:50 +10:00
Rusty Russell	1f4e1de4f2	Rename switcher.S to x86/switcher_32.S lguest uses a "switcher" shim mapped high to bounce between host and guest. As lguest becomes less i386-centric, we separate this code into a subdir. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-10-23 15:49:50 +10:00
Rusty Russell	34b8867a03	Move lguest guest support to arch/x86. Lguest has two sides: host support (to launch guests) and guest support (replacement boot path and paravirt_ops). This moves the guest side to arch/x86/lguest where it's closer to related code. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de>	2007-10-23 15:49:50 +10:00
Tony Breeds	05aa026a62	Clocksource is continuous regardless of the state of the host's TSC. Currently lguest will spend a lot of of time waking up the host, as it cannot go tickless (if the [host] TSC has been marked unstable). On my laptop I was getting ~40% of wakeups from lguest. With this patch applied, my laptop is much happier! Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-10-23 15:49:49 +10:00
Rusty Russell	ebac52524d	lguest_devices belongs in lguest_bus.c: it's not i386-specific. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-10-23 15:49:49 +10:00
Rusty Russell	141341cdae	Lguest currently depends on 32-bit x86, not just x86. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-10-23 15:49:48 +10:00
Jes Sorensen	891ff65ff5	Use copy_to_user() not put_user for struct timespec Use copy_to_user() when copying a struct timespec to the guest - put_user() cannot handle two long's in one go on a 64bit arch. Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Jes Sorensen <jes@sgi.com> Cc: Al Viro <viro@ftp.linux.org.uk>	2007-10-23 15:49:48 +10:00
Rusty Russell	25e82eba3a	Remove binfmts.h include from lg.h It wasn't needed since a very early prototype of lguest. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-10-23 15:49:47 +10:00
Rusty Russell	d3d1c4bdf1	Normalize config options for guest support 1) Group all the "guest OS" support options together, under a PARAVIRT_GUEST menu. 2) Make those options select CONFIG_PARAVIRT, as suggested by Andi. 3) Make kconfig help titles consistent. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de> Cc: Zach Amsden <zach@vmware.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Chris Wright <chrisw@sous-sol.org>	2007-10-23 15:49:47 +10:00
Linus Torvalds	fb9fc39517	Merge branch 'xen-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen * 'xen-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: xfs: eagerly remove vmap mappings to avoid upsetting Xen xen: add some debug output for failed multicalls xen: fix incorrect vcpu_register_vcpu_info hypercall argument xen: ask the hypervisor how much space it needs reserved xen: lock pte pages while pinning/unpinning xen: deal with stale cr3 values when unpinning pagetables xen: add batch completion callbacks xen: yield to IPI target if necessary Clean up duplicate includes in arch/i386/xen/ remove dead code in pgtable_cache_init paravirt: clean up lazy mode handling paravirt: refactor struct paravirt_ops into smaller pv_*_ops	2007-10-17 11:10:11 -07:00
H. Peter Anvin	30c826451d	[x86] remove uses of magic macros for boot_params access Instead of using magic macros for boot_params access, simply use the boot_params structure. Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2007-10-16 17:38:31 -07:00
Jeremy Fitzhardinge	8965c1c095	paravirt: clean up lazy mode handling Currently, the set_lazy_mode pv_op is overloaded with 5 functions: 1. enter lazy cpu mode 2. leave lazy cpu mode 3. enter lazy mmu mode 4. leave lazy mmu mode 5. flush pending batched operations This complicates each paravirt backend, since it needs to deal with all the possible state transitions, handling flushing, etc. In particular, flushing is quite distinct from the other 4 functions, and seems to just cause complication. This patch removes the set_lazy_mode operation, and adds "enter" and "leave" lazy mode operations on mmu_ops and cpu_ops. All the logic associated with enter and leaving lazy states is now in common code (basically BUG_ONs to make sure that no mode is current when entering a lazy mode, and make sure that the mode is current when leaving). Also, flush is handled in a common way, by simply leaving and re-entering the lazy mode. The result is that the Xen, lguest and VMI lazy mode implementations are much simpler. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Andi Kleen <ak@suse.de> Cc: Zach Amsden <zach@vmware.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Avi Kivity <avi@qumranet.com> Cc: Anthony Liguory <aliguori@us.ibm.com> Cc: "Glauber de Oliveira Costa" <glommer@gmail.com> Cc: Jun Nakajima <jun.nakajima@intel.com>	2007-10-16 11:51:29 -07:00
Jeremy Fitzhardinge	93b1eab3d2	paravirt: refactor struct paravirt_ops into smaller pv_*_ops This patch refactors the paravirt_ops structure into groups of functionally related ops: pv_info - random info, rather than function entrypoints pv_init_ops - functions used at boot time (some for module_init too) pv_misc_ops - lazy mode, which didn't fit well anywhere else pv_time_ops - time-related functions pv_cpu_ops - various privileged instruction ops pv_irq_ops - operations for managing interrupt state pv_apic_ops - APIC operations pv_mmu_ops - operations for managing pagetables There are several motivations for this: 1. Some of these ops will be general to all x86, and some will be i386/x86-64 specific. This makes it easier to share common stuff while allowing separate implementations where needed. 2. At the moment we must export all of paravirt_ops, but modules only need selected parts of it. This allows us to export on a case by case basis (and also choose which export license we want to apply). 3. Functional groupings make things a bit more readable. Struct paravirt_ops is now only used as a template to generate patch-site identifiers, and to extract function pointers for inserting into jmp/calls when patching. It is only instantiated when needed. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de> Cc: Zach Amsden <zach@vmware.com> Cc: Avi Kivity <avi@qumranet.com> Cc: Anthony Liguory <aliguori@us.ibm.com> Cc: "Glauber de Oliveira Costa" <glommer@gmail.com> Cc: Jun Nakajima <jun.nakajima@intel.com>	2007-10-16 11:51:29 -07:00
Rusty Russell	bbbd2bf00b	fix modules oopsing in lguest guests The assembly templates for lguest guest patching are in the .init.text section. This means that modules get patched with "cc cc cc cc" or similar junk. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-09-25 08:51:04 -07:00
Rusty Russell	c413fecc76	lguest: Fix guest crash when CONFIG_X86_USE_3DNOW=y One of the very first things lguest_init() does is a memcpy. On Athlon/Duron/K7 or CyrixIII/VIA-C3 or Geode GX/LX, this tries to use MMX. memcpy -> _mmx_memcpy -> kernel_fpu_begin -> clts -> paravirt_ops.clts But we haven't set paravirt_ops.clts yet, so we do the native version and crash. The simplest solution is to use __memcpy. Thanks to Michael Rasenberger for the bug report. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-09-12 12:19:46 -07:00
Rusty Russell	8057d763ed	Fix lguest page-pinning logic ("lguest: bad stack page 0xc057a000") If the stack pointer is 0xc057a000, then the first stack page is at 0xc0579000 (the stack pointer is decremented before use). Not calculating this correctly caused guests with CONFIG_DEBUG_PAGEALLOC=y to be killed with a "bad stack page" message: the initial kernel stack was just proceeding the .smp_locks section which CONFIG_DEBUG_PAGEALLOC marks read-only when freeing. Thanks to Frederik Deweerdt for the bug report! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-30 09:58:22 -07:00
Alexey Dobriyan	deec595047	lguest should depend on CONFIG_FUTEX It uses get_futex_key(). Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-23 21:23:33 -07:00
Andi Kleen	ab144f5ec6	i386: Make patching more robust, fix paravirt issue Commit `19d36ccdc3` "x86: Fix alternatives and kprobes to remap write-protected kernel text" uses code which is being patched for patching. In particular, paravirt_ops does patching in two stages: first it calls paravirt_ops.patch, then it fills any remaining instructions with nop_out(). nop_out calls text_poke() which calls lookup_address() which calls pgd_val() (aka paravirt_ops.pgd_val): that call site is one of the places we patch. If we always do patching as one single call to text_poke(), we only need make sure we're not patching the memcpy in text_poke itself. This means the prototype to paravirt_ops.patch needs to change, to marshal the new code into a buffer rather than patching in place as it does now. It also means all patching goes through text_poke(), which is known to be safe (apply_alternatives is also changed to make a single patch). AK: fix compilation on x86-64 (bad rusty!) AK: fix boot on x86-64 (sigh) AK: merged with other patches Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-11 15:58:13 -07:00
Jes Sorensen	b1a47190a6	lguest files should explicitly include asm/paravirt.h Files using bits from paravirt.h should explicitly include it rather than relying on it being pulled in by something else. Signed-off-by: Jes Sorensen <jes@sgi.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-11 15:47:42 -07:00
Rusty Russell	0d027c01cd	lguest: Fix Malicious Guest GDT Host Crash If a Guest makes hypercall which sets a GDT entry to not present, we currently set any segment registers using that GDT entry to 0. Unfortunately, this is not sufficient: there are other ways of altering GDT entries which will cause a fault. The correct solution to do what Linux does: let them set any GDT value they want and handle the #GP when popping causes a fault. This has the added benefit of making our Switcher slightly more robust in the case of any other bugs which cause it to fault. We kill the Guest if it causes a fault in the Switcher: it's the Guest's responsibility to make sure it's not using segments when it changes them. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-09 08:14:56 -07:00
Rusty Russell	37250097e1	Fix non-TSC guest clocksource lockup lguest uses a host-supplied wallclock-based clocksource when the TSC is not reliable. As this is already in nanoseconds, I naively used a multiplier of 1 and a shift of 0. But update_wall_time() in its infinite wisdom decides to adjust the clock a little (where does it think it's getting a more accurate time from?) It will happily tweak the multiplier... to 0, then -1. So the "fix" is to use a shift of 22 like everyone else, and a multiplier of 1 << 22. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-09 08:14:56 -07:00
Rusty Russell	cc1ff43b70	Enable lguest drivers in Kconfig Lguest drivers need to default to "Y" otherwise they're never selected for new builds. (We don't bother prompting, because they're less than 4k combined, and implied by selecting lguest support). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-06 18:21:15 -07:00
Rusty Russell	05ff09706b	Make lguest compile with CONFIG_BLOCK=n and CONFIG_NET=n Gabriel C reports lguest doesn't compile with CONFIG_BLOCK=n. Fix this by introducing a config var for the block device, which depends on LGUEST && BLOCK. Do the same for the net driver, rather then depending gratuitously on CONFIG_NET. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Gabriel C <nix.or.die@googlemail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-29 17:37:45 -07:00
Rusty Russell	6c8dca5d53	Provide timespec to guests rather than jiffies clock. A non-periodic clock_event_device and the "jiffies" clock don't mix well: tick_handle_periodic() can go into an infinite loop. Currently lguest guests use the jiffies clock when the TSC is unusable. Instead, make the Host write the current time into the lguest page on every interrupt. This doesn't cost much but is more precise and at least as accurate as the jiffies clock. It also gets rid of the GET_WALLCLOCK hypercall. Also, delay setting sched_clock until our clock is set up, otherwise the early printk timestamps can go backwards (not harmful, just ugly). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-28 19:54:33 -07:00
Rusty Russell	a8a11f0697	Fix lguest bzImage loading with CONFIG_RELOCATABLE=y Jason Yeh sent his crashing .config: bzImages made with CONFIG_RELOCATABLE=y put the relocs where the BSS is expected, and we crash with unusual results such as: lguest: unhandled trap 14 at 0xc0122ae1 (0xa9) Relying on BSS being zero was merely laziness on my part, and unfortunately, lguest doesn't go through the normal startup path (which does this in asm). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-28 19:53:36 -07:00
Rusty Russell	f56a384e98	lguest: documentation VII: FIXMEs Documentation: The FIXMEs Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-26 11:35:17 -07:00
Rusty Russell	f8f0fdcd40	lguest: documentation VI: Switcher Documentation: The Switcher Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-26 11:35:17 -07:00
Rusty Russell	bff672e630	lguest: documentation V: Host Documentation: The Host Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-26 11:35:17 -07:00
Rusty Russell	dde797899a	lguest: documentation IV: Launcher Documentation: The Launcher Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-26 11:35:17 -07:00
Rusty Russell	e2c9784325	lguest: documentation III: Drivers Documentation: The Drivers Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-26 11:35:17 -07:00
Rusty Russell	b2b47c214f	lguest: documentation II: Guest Documentation: The Guest Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-26 11:35:17 -07:00
Rusty Russell	f938d2c892	lguest: documentation I: Preparation The netfilter code had very good documentation: the Netfilter Hacking HOWTO. Noone ever read it. So this time I'm trying something different, using a bit of Knuthiness. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-26 11:35:16 -07:00
Thomas Gleixner	18de5bc4c1	clockevents: fix resume logic We need to make sure, that the clockevent devices are resumed, before the tick is resumed. The current resume logic does not guarantee this. Add CLOCK_EVT_MODE_RESUME and call the set mode functions of the clock event devices before resuming the tick / oneshot functionality. Fixup the existing users. Thanks to Nigel Cunningham for tracking down a long standing thinko, which affected the jinxed VAIO. [akpm@linux-foundation.org: xen build fix] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-21 17:49:15 -07:00
Rusty Russell	9d1ca6f13c	lguest: override sched_clock Guests currently use the default scheduler clock: this means they always use jiffies even if TSC is actually available. It doesn't make any noticeable difference here, but it's a better thing to do. Also remove commented-out asm/sched-clock.h from -mm tree. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-20 09:05:16 -07:00
Rusty Russell	876be9d89e	lguest: trivial: We now have asm/processor-flags.h, so use it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-20 09:05:16 -07:00
Rusty Russell	e5faff45b3	lguest: fix sense if IF flag on interrupt injection The sense of the IF bit is backwards in the host interrupt handling. This means we always save "IF=1" on the stack when injecting an interrupt. It turns out this is almost always correct (unless the guest is taking a page fault in an interrupt due to an unpopulated vmalloc mapping), so went unnoticed. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-20 09:05:16 -07:00
Al Viro	6d14bfe77b	Fix lguest misannotation It's void __user , not void __user... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-20 08:24:49 -07:00
Rusty Russell	709e89266b	lguest: the Makefile and Kconfig This is the Kconfig and Makefile to allow lguest to actually be compiled. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:52 -07:00
Rusty Russell	d7e28ffe6c	lguest: the host code This is the code for the "lg.ko" module, which allows lguest guests to be launched. [akpm@linux-foundation.org: update for futex-new-private-futexes] [akpm@linux-foundation.org: build fix] [jmorris@namei.org: lguest: use hrtimers] [akpm@linux-foundation.org: x86_64 build fix] Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de> Cc: Eric Dumazet <dada1@cosmosbay.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:52 -07:00
Rusty Russell	07ad157f6e	lguest: the guest code lguest is a simple hypervisor for Linux on Linux. Unlike kvm it doesn't need VT/SVM hardware. Unlike Xen it's simply "modprobe and go". Unlike both, it's 5000 lines and self-contained. Performance is ok, but not great (-30% on kernel compile). But given its hackability, I expect this to improve, along with the paravirt_ops code which it supplies a complete example for. There's also a 64-bit version being worked on and other craziness. But most of all, lguest is awesome fun! Too much of the kernel is a big ball of hair. lguest is simple enough to dive into and hack, plus has some warts which scream "fork me!". This patch: This is the code and headers required to make an i386 kernel an lguest guest. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:52 -07:00

1 2 3 4 5

228 Commits