linux_old1

Commit Graph

Author	SHA1	Message	Date
Lee Schermerhorn	846a16bf0f	mempolicy: rename mpol_copy to mpol_dup This patch renames mpol_copy() to mpol_dup() because, well, that's what it does. Like, e.g., strdup() for strings, mpol_dup() takes a pointer to an existing mempolicy, allocates a new one and copies the contents. In a later patch, I want to use the name mpol_copy() to copy the contents from one mempolicy to another like, e.g., strcpy() does for strings. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:23 -07:00
Lee Schermerhorn	f0be3d32b0	mempolicy: rename mpol_free to mpol_put This is a change that was requested some time ago by Mel Gorman. Makes sense to me, so here it is. Note: I retain the name "mpol_free_shared_policy()" because it actually does free the shared_policy, which is NOT a reference counted object. However, ... The mempolicy object[s] referenced by the shared_policy are reference counted, so mpol_put() is used to release the reference held by the shared_policy. The mempolicy might not be freed at this time, because some task attached to the shared object associated with the shared policy may be in the process of allocating a page based on the mempolicy. In that case, the task performing the allocation will hold a reference on the mempolicy, obtained via mpol_shared_policy_lookup(). The mempolicy will be freed when all tasks holding such a reference have called mpol_put() for the mempolicy. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:23 -07:00
Adam Litke	3b11630063	Subject: [PATCH] hugetlb: vmstat events for huge page allocations Allocating huge pages directly from the buddy allocator is not guaranteed to succeed. Success depends on several factors (such as the amount of physical memory available and the level of fragmentation). With the addition of dynamic hugetlb pool resizing, allocations can occur much more frequently. For these reasons it is desirable to keep track of huge page allocation successes and failures. Add two new vmstat entries to track huge page allocations that succeed and fail. The presence of the two entries is contingent upon CONFIG_HUGETLB_PAGE being enabled. [akpm@linux-foundation.org: reduced ifdeffery] Signed-off-by: Adam Litke <agl@us.ibm.com> Signed-off-by: Eric Munson <ebmunson@us.ibm.com> Tested-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Andy Whitcroft <apw@shadowen.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:23 -07:00
Nick Piggin	a08cb629f5	s390: implement pte special bit Convert XIP to support non-struct page backed memory, using VM_MIXEDMAP for the user mappings. This requires the get_xip_page API to be changed to an address based one. Improve the API layering a little bit too, while we're here. This is required in order to support XIP filesystems on memory that isn't backed with struct page (but memory with struct page is still supported too). Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Carsten Otte <cotte@de.ibm.com> Cc: Jared Hulbert <jaredeh@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:23 -07:00
Nick Piggin	70688e4dd1	xip: support non-struct page backed memory Convert XIP to support non-struct page backed memory, using VM_MIXEDMAP for the user mappings. This requires the get_xip_page API to be changed to an address based one. Improve the API layering a little bit too, while we're here. This is required in order to support XIP filesystems on memory that isn't backed with struct page (but memory with struct page is still supported too). Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Carsten Otte <cotte@de.ibm.com> Cc: Jared Hulbert <jaredeh@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:23 -07:00
Jared Hulbert	30afcb4bd2	return pfn from direct_access, for XIP Alter the block device ->direct_access() API to work with the new get_xip_mem() API (that requires both kaddr and pfn are returned). Some architectures will not do the right thing in their virt_to_page() for use by XIP (to translate from the kernel virtual address returned by direct_access(), to a user mappable pfn in XIP's page fault handler. However, we can't switch it to just return the pfn and not the kaddr, because we have no good way to get a kva from a pfn, and XIP requires the kva for its read(2) and write(2) handlers. So we have to return both. Signed-off-by: Jared Hulbert <jaredeh@gmail.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux-mm@kvack.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:23 -07:00
Nick Piggin	423bad6004	mm: add vm_insert_mixed vm_insert_mixed will insert either a raw pfn or a refcounted struct page into the page tables, depending on whether vm_normal_page() will return the page or not. With the introduction of the new pte bit, this is now a too tricky for drivers to be doing themselves. filemap_xip uses this in a subsequent patch. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Jared Hulbert <jaredeh@gmail.com> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:23 -07:00
Nick Piggin	7e675137a8	mm: introduce pte_special pte bit s390 for one, cannot implement VM_MIXEDMAP with pfn_valid, due to their memory model (which is more dynamic than most). Instead, they had proposed to implement it with an additional path through vm_normal_page(), using a bit in the pte to determine whether or not the page should be refcounted: vm_normal_page() { ... if (unlikely(vma->vm_flags & (VM_PFNMAP\|VM_MIXEDMAP))) { if (vma->vm_flags & VM_MIXEDMAP) { #ifdef s390 if (!mixedmap_refcount_pte(pte)) return NULL; #else if (!pfn_valid(pfn)) return NULL; #endif goto out; } ... } This is fine, however if we are allowed to use a bit in the pte to determine refcountedness, we can use that to _completely_ replace all the vma based schemes. So instead of adding more cases to the already complex vma-based scheme, we can have a clearly seperate and simple pte-based scheme (and get slightly better code generation in the process): vm_normal_page() { #ifdef s390 if (!mixedmap_refcount_pte(pte)) return NULL; return pte_page(pte); #else ... #endif } And finally, we may rather make this concept usable by any architecture rather than making it s390 only, so implement a new type of pte state for this. Unfortunately the old vma based code must stay, because some architectures may not be able to spare pte bits. This makes vm_normal_page a little bit more ugly than we would like, but the 2 cases are clearly seperate. So introduce a pte_special pte state, and use it in mm/memory.c. It is currently a noop for all architectures, so this doesn't actually result in any compiled code changes to mm/memory.o. BTW: I haven't put vm_normal_page() into arch code as-per an earlier suggestion. The reason is that, regardless of where vm_normal_page is actually implemented, the abstraction is still exactly the same. Also, while it depends on whether the architecture has pte_special or not, that is the only two possible cases, and it really isn't an arch specific function -- the role of the arch code should be to provide primitive functions and accessors with which to build the core code; pte_special does that. We do not want architectures to know or care about vm_normal_page itself, and we definitely don't want them being able to invent something new there out of sight of mm/ code. If we made vm_normal_page an arch function, then we have to make vm_insert_mixed (next patch) an arch function too. So I don't think moving it to arch code fundamentally improves any abstractions, while it does practically make the code more difficult to follow, for both mm and arch developers, and easier to misuse. [akpm@linux-foundation.org: build fix] Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Carsten Otte <cotte@de.ibm.com> Cc: Jared Hulbert <jaredeh@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:23 -07:00
Jared Hulbert	b379d79019	mm: introduce VM_MIXEDMAP This series introduces some important infrastructure work. The overall result is that: 1. We now support XIP backed filesystems using memory that have no struct page allocated to them. And patches 6 and 7 actually implement this for s390. This is pretty important in a number of cases. As far as I understand, in the case of virtualisation (eg. s390), each guest may mount a readonly copy of the same filesystem (eg. the distro). Currently, guests need to allocate struct pages for this image. So if you have 100 guests, you already need to allocate more memory for the struct pages than the size of the image. I think. (Carsten?) For other (eg. embedded) systems, you may have a very large non- volatile filesystem. If you have to have struct pages for this, then your RAM consumption will go up proportionally to fs size. Even though it is just a small proportion, the RAM can be much more costly eg in terms of power, so every KB less that Linux uses makes it more attractive to a lot of these guys. 2. VM_MIXEDMAP allows us to support mappings where you actually do want to refcount _some_ pages in the mapping, but not others, and support COW on arbitrary (non-linear) mappings. Jared needs this for his NVRAM filesystem in progress. Future iterations of this filesystem will most likely want to migrate pages between pagecache and XIP backing, which is where the requirement for mixed (some refcounted, some not) comes from. 3. pte_special also has a peripheral usage that I need for my lockless get_user_pages patch. That was shown to speed up "oltp" on db2 by 10% on a 2 socket system, which is kind of significant because they scrounge for months to try to find 0.1% improvement on these workloads. I'm hoping we might finally be faster than AIX on pSeries with this :). My reference to lockless get_user_pages is not meant to justify this patchset (which doesn't include lockless gup), but just to show that pte_special is not some s390 specific thing that should be hidden in arch code or xip code: I definitely want to use it on at least x86 and powerpc as well. This patch: Introduce a new type of mapping, VM_MIXEDMAP. This is unlike VM_PFNMAP in that it can support COW mappings of arbitrary ranges including ranges without struct page and ranges with a struct page that we actually want to refcount (PFNMAP can only support COW in those cases where the un-COW-ed translations are mapped linearly in the virtual address, and can only support non refcounted ranges). VM_MIXEDMAP achieves this by refcounting all pfn_valid pages, and not refcounting !pfn_valid pages (which is not an option for VM_PFNMAP, because it needs to avoid refcounting pfn_valid pages eg. for /dev/mem mappings). Signed-off-by: Jared Hulbert <jaredeh@gmail.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Carsten Otte <cotte@de.ibm.com> Cc: Jared Hulbert <jaredeh@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:22 -07:00
Christoph Lameter	e20b8cca76	PAGEFLAGS_EXTENDED and separate page flags for Head and Tail Having separate page flags for the head and the tail of a compound page allows the compiler to use bitops instead of operations on a word to check for a tail page. That is f.e. important for virt_to_head_page() which is used in various critical code paths (kfree for example): Code for PageTail(page) Before: mov (%rdi),%rdx page->flags mov %rdx,%rax 3 bytes and $0x12000,%eax 5 bytes cmp $0x12000,%rax 6 bytes je 897 <kfree+0xa7> After: mov (%rdi),%rax test $0x40,%ah (3 bytes) jne 887 <kfree+0x97> So we go from 14 bytes to 3 bytes and from 3 instructions to one. From the use of 2 registers we go to none. We can only use page flags for this if we have page flags available. This patch introduces CONFIG_PAGEFLAGS_EXTENDED that is set if pageflags are not scarce due to SPARSEMEM using page flags for its sectionid on 32 bit NUMA platforms. Additional page flag definitions can be added to the CONFIG_PAGEFLAGS_EXTENDED section in page-flags.h if the functionality depends on PAGEFLAGS_EXTENDED or if more page flag overlapping tricks are used for the !PAGEFLAGS_EXTENDED fallback (the upcoming virtual compound patch may hook in here and Rik's/Lee's additional page flags to solve the reclaim issues could also be added there [hint... hint... where are these patchsets?]). Avoiding the overlaying of Pg_reclaim also clears the way for possible use of compound pages for the pagecache or on the LRU. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:22 -07:00
Christoph Lameter	97965478a6	mm: Get rid of __ZONE_COUNT It was used to compensate because MAX_NR_ZONES was not available to the #ifdefs. Export MAX_NR_ZONES via the new mechanism and get rid of __ZONE_COUNT. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:22 -07:00
Christoph Lameter	ec7cade8c1	page flags: add PAGEFLAGS_FALSE for flags that are always false Turns out that there are a number of times that a flag is simply always returning 0. Define a macro for that. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:22 -07:00
Christoph Lameter	602c4d112f	page flags: handle PG_uncached like all other flags Remove the special setup for PG_uncached and simply make it part of the enum. The page flag will only be allocated when the kernel build includes the uncached allocator. Acked-by: Dean Nelson <dcn@sgi.com> Cc: Jes Sorensen <jes@trained-monkey.org> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:22 -07:00
Christoph Lameter	0a128b2b1a	pageflags: eliminate PG_xxx aliases Remove aliases of PG_xxx. We can easily drop those now and alias by specifying the PG_xxx flag in the macro that generates the functions. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:22 -07:00
Christoph Lameter	d60cd46bbd	pageflags: use proper page flag functions in Xen Xen uses bitops to manipulate page flags. Make it use proper page flag functions. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:22 -07:00
Christoph Lameter	6a1e7f777f	pageflags: convert to the use of new macros Replace explicit definitions of page flags through the use of macros. Significantly reduces the size of the definitions and removes a lot of opportunity for errors. Additonal page flags can typically be generated with a single line. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:22 -07:00
Christoph Lameter	f94a62e910	pageflags: introduce macros to generate page flag functions Introduce a set of macros that generate functions to handle page flags. A page flag function group typically starts with either SETPAGEFLAG(<part of function name>,<part of PG_ flagname>) to create a set of page flag operations that are atomic. Or __SETPAGEFLAG(<part of function name>,<part of PG_ flagname) to create a set of page flag operations that are not atomic. Then additional operations can be added using the following macros TESTSCFLAG Create additional atomic test-and-set and test-and-clear functions TESTSETFLAG Create additional test and set function TESTCLEARFLAG Create additional test and clear function SETPAGEFLAG Create additional atomic set function CLEARPAGEFLAG Create additional atomic clear function __TESTPAGEFLAG Create additional non atomic set function __SETPAGEFLAG Create additional non atomic clear function Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:22 -07:00
Christoph Lameter	9223b4190f	pageflags: get rid of FLAGS_RESERVED NR_PAGEFLAGS specifies the number of page flags we are using. From that we can calculate the number of bits leftover that can be used for zone, node (and maybe the sections id). There is no need anymore for FLAGS_RESERVED if we use NR_PAGEFLAGS. Use the new methods to make NR_PAGEFLAGS available via the preprocessor. NR_PAGEFLAGS is used to calculate field boundaries in the page flags fields. These field widths have to be available to the preprocessor. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: David Miller <davem@davemloft.net> Cc: Andy Whitcroft <apw@shadowen.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:21 -07:00
Christoph Lameter	e268318149	pageflags: use an enum for the flags Use an enum to ease the maintenance of page flags. This is going to change the numbering from 0 to 18. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:21 -07:00
Andrew Morton	726b801272	page_mapping(): add ifdef around reference to swapper_space This fixes the superh build when the pageflags patches are applied. But it shouldn't unless it's a gcc bug. Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:21 -07:00
Christoph Lameter	308c05e35e	sparsemem: vmemmap does not need section bits A set of patches that attempts to improve page flag handling. First of all a method is introduced to generate the page flag functions using macros. Then the number of page flags used by sparsemem is reduced. All page flag operations will no longer be macros. All flags will use inline function. Then we add a way to export enum constants to the preprocessor which allows us to get rid of __ZONE_COUNT and use the NR_PAGEFLAGS for the dynamic calculation of actually available page flags for fields. This patch: Sparsemem vmemmap does not need any section bits. This patch has the effect of reducing the number of bits used in page->flags by at least 6. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:21 -07:00
Christoph Lameter	2301696932	vmallocinfo: add caller information Add caller information so that /proc/vmallocinfo shows where the allocation request for a slice of vmalloc memory originated. Results in output like this: 0xffffc20000000000-0xffffc20000801000 8392704 alloc_large_system_hash+0x127/0x246 pages=2048 vmalloc vpages 0xffffc20000801000-0xffffc20000806000 20480 alloc_large_system_hash+0x127/0x246 pages=4 vmalloc 0xffffc20000806000-0xffffc20000c07000 4198400 alloc_large_system_hash+0x127/0x246 pages=1024 vmalloc vpages 0xffffc20000c07000-0xffffc20000c0a000 12288 alloc_large_system_hash+0x127/0x246 pages=2 vmalloc 0xffffc20000c0a000-0xffffc20000c0c000 8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap 0xffffc20000c0c000-0xffffc20000c0f000 12288 acpi_os_map_memory+0x13/0x1c phys=cff64000 ioremap 0xffffc20000c10000-0xffffc20000c15000 20480 acpi_os_map_memory+0x13/0x1c phys=cff65000 ioremap 0xffffc20000c16000-0xffffc20000c18000 8192 acpi_os_map_memory+0x13/0x1c phys=cff69000 ioremap 0xffffc20000c18000-0xffffc20000c1a000 8192 acpi_os_map_memory+0x13/0x1c phys=fed1f000 ioremap 0xffffc20000c1a000-0xffffc20000c1c000 8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap 0xffffc20000c1c000-0xffffc20000c1e000 8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap 0xffffc20000c1e000-0xffffc20000c20000 8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap 0xffffc20000c20000-0xffffc20000c22000 8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap 0xffffc20000c22000-0xffffc20000c24000 8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap 0xffffc20000c24000-0xffffc20000c26000 8192 acpi_os_map_memory+0x13/0x1c phys=e0081000 ioremap 0xffffc20000c26000-0xffffc20000c28000 8192 acpi_os_map_memory+0x13/0x1c phys=e0080000 ioremap 0xffffc20000c28000-0xffffc20000c2d000 20480 alloc_large_system_hash+0x127/0x246 pages=4 vmalloc 0xffffc20000c2d000-0xffffc20000c31000 16384 tcp_init+0xd5/0x31c pages=3 vmalloc 0xffffc20000c31000-0xffffc20000c34000 12288 alloc_large_system_hash+0x127/0x246 pages=2 vmalloc 0xffffc20000c34000-0xffffc20000c36000 8192 init_vdso_vars+0xde/0x1f1 0xffffc20000c36000-0xffffc20000c38000 8192 pci_iomap+0x8a/0xb4 phys=d8e00000 ioremap 0xffffc20000c38000-0xffffc20000c3a000 8192 usb_hcd_pci_probe+0x139/0x295 [usbcore] phys=d8e00000 ioremap 0xffffc20000c3a000-0xffffc20000c3e000 16384 sys_swapon+0x509/0xa15 pages=3 vmalloc 0xffffc20000c40000-0xffffc20000c61000 135168 e1000_probe+0x1c4/0xa32 phys=d8a20000 ioremap 0xffffc20000c61000-0xffffc20000c6a000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap 0xffffc20000c6a000-0xffffc20000c73000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap 0xffffc20000c73000-0xffffc20000c7c000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap 0xffffc20000c7c000-0xffffc20000c7f000 12288 e1000e_setup_tx_resources+0x29/0xbe pages=2 vmalloc 0xffffc20000c80000-0xffffc20001481000 8392704 pci_mmcfg_arch_init+0x90/0x118 phys=e0000000 ioremap 0xffffc20001481000-0xffffc20001682000 2101248 alloc_large_system_hash+0x127/0x246 pages=512 vmalloc 0xffffc20001682000-0xffffc20001e83000 8392704 alloc_large_system_hash+0x127/0x246 pages=2048 vmalloc vpages 0xffffc20001e83000-0xffffc20002204000 3674112 alloc_large_system_hash+0x127/0x246 pages=896 vmalloc vpages 0xffffc20002204000-0xffffc2000220d000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap 0xffffc2000220d000-0xffffc20002216000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap 0xffffc20002216000-0xffffc2000221f000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap 0xffffc2000221f000-0xffffc20002228000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap 0xffffc20002228000-0xffffc20002231000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap 0xffffc20002231000-0xffffc20002234000 12288 e1000e_setup_rx_resources+0x35/0x122 pages=2 vmalloc 0xffffc20002240000-0xffffc20002261000 135168 e1000_probe+0x1c4/0xa32 phys=d8a60000 ioremap 0xffffc20002261000-0xffffc2000270c000 4894720 sys_swapon+0x509/0xa15 pages=1194 vmalloc vpages 0xffffffffa0000000-0xffffffffa0022000 139264 module_alloc+0x4f/0x55 pages=33 vmalloc 0xffffffffa0022000-0xffffffffa0029000 28672 module_alloc+0x4f/0x55 pages=6 vmalloc 0xffffffffa002b000-0xffffffffa0034000 36864 module_alloc+0x4f/0x55 pages=8 vmalloc 0xffffffffa0034000-0xffffffffa003d000 36864 module_alloc+0x4f/0x55 pages=8 vmalloc 0xffffffffa003d000-0xffffffffa0049000 49152 module_alloc+0x4f/0x55 pages=11 vmalloc 0xffffffffa0049000-0xffffffffa0050000 28672 module_alloc+0x4f/0x55 pages=6 vmalloc [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Christoph Lameter <clameter@sgi.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:21 -07:00
Christoph Lameter	a10aa57987	vmalloc: show vmalloced areas via /proc/vmallocinfo Implement a new proc file that allows the display of the currently allocated vmalloc memory. It allows to see the users of vmalloc. That is important if vmalloc space is scarce (i386 for example). And it's going to be important for the compound page fallback to vmalloc. Many of the current users can be switched to use compound pages with fallback. This means that the number of users of vmalloc is reduced and page tables no longer necessary to access the memory. /proc/vmallocinfo allows to review how that reduction occurs. If memory becomes fragmented and larger order allocations are no longer possible then /proc/vmallocinfo allows to see which compound page allocations fell back to virtual compound pages. That is important for new users of virtual compound pages. Such as order 1 stack allocation etc that may fallback to virtual compound pages in the future. /proc/vmallocinfo permissions are made readable-only-by-root to avoid possible information leakage. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: CONFIG_MMU=n build fix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:21 -07:00
Andrew Morton	b454456841	mm: make early_pfn_to_nid() a C function Fix this (sparc64) mm/sparse-vmemmap.c: In function `vmemmap_verify': mm/sparse-vmemmap.c:64: warning: unused variable `pfn' by switching to a C function which touches its arg. (reason 3,555 why macros are bad) Also, the `nid' arg was misnamed. Reviewed-by: Christoph Lameter <clameter@sgi.com> Acked-by: Andy Whitcroft <apw@shadowen.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andi Kleen <ak@suse.de> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:20 -07:00
Miklos Szeredi	ac6aadb24b	mm: rotate_reclaimable_page() cleanup Clean up messy conditional calling of test_clear_page_writeback() from both rotate_reclaimable_page() and end_page_writeback(). The only user of rotate_reclaimable_page() is end_page_writeback() so this is OK. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:20 -07:00
Andi Kleen	7edf85aa3c	mm: save some bytes in mm_struct by filling holes on 64bit Save some bytes in mm_struct by filling holes Putting int values together for better packing on 64bit shrinks sizeof(struct mm_struct) from 776 bytes to 764 bytes. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:20 -07:00
David Rientjes	3842b46de6	mempolicy: small header file cleanup Removes forward definition of vm_area_struct in linux/mempolicy.h. We already get it from the linux/slab.h -> linux/gfp.h include. Removes the unused mpol_set_vma_default() macro from linux/mempolicy.h. Removes the extern definition of default_policy since it is only referenced, as it should be, in mm/mempolicy.c. Cc: Paul Jackson <pj@sgi.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:20 -07:00
David Rientjes	4c50bc0116	mempolicy: add MPOL_F_RELATIVE_NODES flag Adds another optional mode flag, MPOL_F_RELATIVE_NODES, that specifies nodemasks passed via set_mempolicy() or mbind() should be considered relative to the current task's mems_allowed. When the mempolicy is created, the passed nodemask is folded and mapped onto the current task's mems_allowed. For example, consider a task using set_mempolicy() to pass MPOL_INTERLEAVE \| MPOL_F_RELATIVE_NODES with a nodemask of 1-3. If current's mems_allowed is 4-7, the effected nodemask is 5-7 (the second, third, and fourth node of mems_allowed). If the same task is attached to a cpuset, the mempolicy nodemask is rebound each time the mems are changed. Some possible rebinds and results are: mems result 1-3 1-3 1-7 2-4 1,5-6 1,5-6 1,5-7 5-7 Likewise, the zonelist built for MPOL_BIND acts on the set of zones assigned to the resultant nodemask from the relative remap. In the MPOL_PREFERRED case, the preferred node is remapped from the currently effected nodemask to the relative nodemask. This mempolicy mode flag was conceived of by Paul Jackson <pj@sgi.com>. Cc: Paul Jackson <pj@sgi.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:19 -07:00
Paul Jackson	7ea931c9fc	mempolicy: add bitmap_onto() and bitmap_fold() operations The following adds two more bitmap operators, bitmap_onto() and bitmap_fold(), with the usual cpumask and nodemask wrappers. The bitmap_onto() operator computes one bitmap relative to another. If the n-th bit in the origin mask is set, then the m-th bit of the destination mask will be set, where m is the position of the n-th set bit in the relative mask. The bitmap_fold() operator folds a bitmap into a second that has bit m set iff the input bitmap has some bit n set, where m == n mod sz, for the specified sz value. There are two substantive changes between this patch and its predecessor bitmap_relative: 1) Renamed bitmap_relative() to be bitmap_onto(). 2) Added bitmap_fold(). The essential motivation for bitmap_onto() is to provide a mechanism for converting a cpuset-relative CPU or Node mask to an absolute mask. Cpuset relative masks are written as if the current task were in a cpuset whose CPUs or Nodes were just the consecutive ones numbered 0..N-1, for some N. The bitmap_onto() operator is provided in anticipation of adding support for the first such cpuset relative mask, by the mbind() and set_mempolicy() system calls, using a planned flag of MPOL_F_RELATIVE_NODES. These bitmap operators (and their nodemask wrappers, in particular) will be used in code that converts the user specified cpuset relative memory policy to a specific system node numbered policy, given the current mems_allowed of the tasks cpuset. Such cpuset relative mempolicies will address two deficiencies of the existing interface between cpusets and mempolicies: 1) A task cannot at present reliably establish a cpuset relative mempolicy because there is an essential race condition, in that the tasks cpuset may be changed in between the time the task can query its cpuset placement, and the time the task can issue the applicable mbind or set_memplicy system call. 2) A task cannot at present establish what cpuset relative mempolicy it would like to have, if it is in a smaller cpuset than it might have mempolicy preferences for, because the existing interface only allows specifying mempolicies for nodes currently allowed by the cpuset. Cpuset relative mempolicies are useful for tasks that don't distinguish particularly between one CPU or Node and another, but only between how many of each are allowed, and the proper placement of threads and memory pages on the various CPUs and Nodes available. The motivation for the added bitmap_fold() can be seen in the following example. Let's say an application has specified some mempolicies that presume 16 memory nodes, including say a mempolicy that specified MPOL_F_RELATIVE_NODES (cpuset relative) nodes 12-15. Then lets say that application is crammed into a cpuset that only has 8 memory nodes, 0-7. If one just uses bitmap_onto(), this mempolicy, mapped to that cpuset, would ignore the requested relative nodes above 7, leaving it empty of nodes. That's not good; better to fold the higher nodes down, so that some nodes are included in the resulting mapped mempolicy. In this case, the mempolicy nodes 12-15 are taken modulo 8 (the weight of the mems_allowed of the confining cpuset), resulting in a mempolicy specifying nodes 4-7. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: Andi Kleen <ak@suse.de> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: <kosaki.motohiro@jp.fujitsu.com> Cc: <ray-lk@madrabbit.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:19 -07:00
David Rientjes	f5b087b52f	mempolicy: add MPOL_F_STATIC_NODES flag Add an optional mempolicy mode flag, MPOL_F_STATIC_NODES, that suppresses the node remap when the policy is rebound. Adds another member to struct mempolicy, nodemask_t user_nodemask, as part of a union with cpuset_mems_allowed: struct mempolicy { ... union { nodemask_t cpuset_mems_allowed; nodemask_t user_nodemask; } w; } that stores the the nodemask that the user passed when he or she created the mempolicy via set_mempolicy() or mbind(). When using MPOL_F_STATIC_NODES, which is passed with any mempolicy mode, the user's passed nodemask intersected with the VMA or task's allowed nodes is always used when determining the preferred node, setting the MPOL_BIND zonelist, or creating the interleave nodemask. This happens whenever the policy is rebound, including when a task's cpuset assignment changes or the cpuset's mems are changed. This creates an interesting side-effect in that it allows the mempolicy "intent" to lie dormant and uneffected until it has access to the node(s) that it desires. For example, if you currently ask for an interleaved policy over a set of nodes that you do not have access to, the mempolicy is not created and the task continues to use the previous policy. With this change, however, it is possible to create the same mempolicy; it is only effected when access to nodes in the nodemask is acquired. It is also possible to mount tmpfs with the static nodemask behavior when specifying a node or nodemask. To do this, simply add "=static" immediately following the mempolicy mode at mount time: mount -o remount mpol=interleave=static:1-3 Also removes mpol_check_policy() and folds its logic into mpol_new() since it is now obsoleted. The unused vma_mpol_equal() is also removed. Cc: Paul Jackson <pj@sgi.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:19 -07:00
David Rientjes	028fec414d	mempolicy: support optional mode flags With the evolution of mempolicies, it is necessary to support mempolicy mode flags that specify how the policy shall behave in certain circumstances. The most immediate need for mode flag support is to suppress remapping the nodemask of a policy at the time of rebind. Both the mempolicy mode and flags are passed by the user in the 'int policy' formal of either the set_mempolicy() or mbind() syscall. A new constant, MPOL_MODE_FLAGS, represents the union of legal optional flags that may be passed as part of this int. Mempolicies that include illegal flags as part of their policy are rejected as invalid. An additional member to struct mempolicy is added to support the mode flags: struct mempolicy { ... unsigned short policy; unsigned short flags; } The splitting of the 'int' actual passed by the user is done in sys_set_mempolicy() and sys_mbind() for their respective syscalls. This is done by intersecting the actual with MPOL_MODE_FLAGS, rejecting the syscall of there are additional flags, and storing it in the new 'flags' member of struct mempolicy. The intersection of the actual with ~MPOL_MODE_FLAGS is stored in the 'policy' member of the struct and all current users of pol->policy remain unchanged. The union of the policy mode and optional mode flags is passed back to the user in get_mempolicy(). This combination of mode and flags within the same actual does not break userspace code that relies on get_mempolicy(&policy, ...) and either switch (policy) { case MPOL_BIND: ... case MPOL_INTERLEAVE: ... }; statements or if (policy == MPOL_INTERLEAVE) { ... } statements. Such applications would need to use optional mode flags when calling set_mempolicy() or mbind() for these previously implemented statements to stop working. If an application does start using optional mode flags, it will need to mask the optional flags off the policy in switch and conditional statements that only test mode. An additional member is also added to struct shmem_sb_info to store the optional mode flags. [hugh@veritas.com: shmem mpol: fix build warning] Cc: Paul Jackson <pj@sgi.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:19 -07:00
David Rientjes	a3b51e0142	mempolicy: convert MPOL constants to enum The mempolicy mode constants, MPOL_DEFAULT, MPOL_PREFERRED, MPOL_BIND, and MPOL_INTERLEAVE, are better declared as part of an enum since they are sequentially numbered and cannot be combined. The policy member of struct mempolicy is also converted from type short to type unsigned short. A negative policy does not have any legitimate meaning, so it is possible to change its type in preparation for adding optional mode flags later. The equivalent member of struct shmem_sb_info is also changed from int to unsigned short. For compatibility, the policy formal to get_mempolicy() remains as a pointer to an int: int get_mempolicy(int policy, unsigned long nmask, unsigned long maxnode, unsigned long addr, unsigned long flags); although the only possible values is the range of type unsigned short. Cc: Paul Jackson <pj@sgi.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:19 -07:00
Pekka Enberg	1b27d05b6e	mm: move cache_line_size() to <linux/cache.h> Not all architectures define cache_line_size() so as suggested by Andrew move the private implementations in mm/slab.c and mm/slob.c to <linux/cache.h>. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:19 -07:00
Mel Gorman	19770b3260	mm: filter based on a nodemask as well as a gfp_mask The MPOL_BIND policy creates a zonelist that is used for allocations controlled by that mempolicy. As the per-node zonelist is already being filtered based on a zone id, this patch adds a version of __alloc_pages() that takes a nodemask for further filtering. This eliminates the need for MPOL_BIND to create a custom zonelist. A positive benefit of this is that allocations using MPOL_BIND now use the local node's distance-ordered zonelist instead of a custom node-id-ordered zonelist. I.e., pages will be allocated from the closest allowed node with available memory. [Lee.Schermerhorn@hp.com: Mempolicy: update stale documentation and comments] [Lee.Schermerhorn@hp.com: Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask] [Lee.Schermerhorn@hp.com: Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask rework] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:19 -07:00
Mel Gorman	dd1a239f6f	mm: have zonelist contains structs with both a zone pointer and zone_idx Filtering zonelists requires very frequent use of zone_idx(). This is costly as it involves a lookup of another structure and a substraction operation. As the zone_idx is often required, it should be quickly accessible. The node idx could also be stored here if it was found that accessing zone->node is significant which may be the case on workloads where nodemasks are heavily used. This patch introduces a struct zoneref to store a zone pointer and a zone index. The zonelist then consists of an array of these struct zonerefs which are looked up as necessary. Helpers are given for accessing the zone index as well as the node index. [kamezawa.hiroyu@jp.fujitsu.com: Suggested struct zoneref instead of embedding information in pointers] [hugh@veritas.com: mm-have-zonelist: fix memcg ooms] [hugh@veritas.com: just return do_try_to_free_pages] [hugh@veritas.com: do_try_to_free_pages gfp_mask redundant] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Christoph Lameter <clameter@sgi.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <clameter@sgi.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:18 -07:00
Mel Gorman	54a6eb5c47	mm: use two zonelist that are filtered by GFP mask Currently a node has two sets of zonelists, one for each zone type in the system and a second set for GFP_THISNODE allocations. Based on the zones allowed by a gfp mask, one of these zonelists is selected. All of these zonelists consume memory and occupy cache lines. This patch replaces the multiple zonelists per-node with two zonelists. The first contains all populated zones in the system, ordered by distance, for fallback allocations when the target/preferred node has no free pages. The second contains all populated zones in the node suitable for GFP_THISNODE allocations. An iterator macro is introduced called for_each_zone_zonelist() that interates through each zone allowed by the GFP flags in the selected zonelist. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <clameter@sgi.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:18 -07:00
Mel Gorman	18ea7e710d	mm: remember what the preferred zone is for zone_statistics On NUMA, zone_statistics() is used to record events like numa hit, miss and foreign. It assumes that the first zone in a zonelist is the preferred zone. When multiple zonelists are replaced by one that is filtered, this is no longer the case. This patch records what the preferred zone is rather than assuming the first zone in the zonelist is it. This simplifies the reading of later patches in this set. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Christoph Lameter <clameter@sgi.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:18 -07:00
Mel Gorman	0e88460da6	mm: introduce node_zonelist() for accessing the zonelist for a GFP mask Introduce a node_zonelist() helper function. It is used to lookup the appropriate zonelist given a node and a GFP mask. The patch on its own is a cleanup but it helps clarify parts of the two-zonelist-per-node patchset. If necessary, it can be merged with the next patch in this set without problems. Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <clameter@sgi.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:18 -07:00
Mel Gorman	dac1d27bc8	mm: use zonelists instead of zones when direct reclaiming pages The following patches replace multiple zonelists per node with two zonelists that are filtered based on the GFP flags. The patches as a set fix a bug with regard to the use of MPOL_BIND and ZONE_MOVABLE. With this patchset, the MPOL_BIND will apply to the two highest zones when the highest zone is ZONE_MOVABLE. This should be considered as an alternative fix for the MPOL_BIND+ZONE_MOVABLE in 2.6.23 to the previously discussed hack that filters only custom zonelists. The first patch cleans up an inconsistency where direct reclaim uses zonelist->zones where other places use zonelist. The second patch introduces a helper function node_zonelist() for looking up the appropriate zonelist for a GFP mask which simplifies patches later in the set. The third patch defines/remembers the "preferred zone" for numa statistics, as it is no longer always the first zone in a zonelist. The forth patch replaces multiple zonelists with two zonelists that are filtered. The two zonelists are due to the fact that the memoryless patchset introduces a second set of zonelists for __GFP_THISNODE. The fifth patch introduces helper macros for retrieving the zone and node indices of entries in a zonelist. The final patch introduces filtering of the zonelists based on a nodemask. Two zonelists exist per node, one for normal allocations and one for __GFP_THISNODE. Performance results varied depending on the machine configuration. In real workloads the gain/loss will depend on how much the userspace portion of the benchmark benefits from having more cache available due to reduced referencing of zonelists. These are the range of performance losses/gains when running against 2.6.24-rc4-mm1. The set and these machines are a mix of i386, x86_64 and ppc64 both NUMA and non-NUMA. loss to gain Total CPU time on Kernbench: -0.86% to 1.13% Elapsed time on Kernbench: -0.79% to 0.76% page_test from aim9: -4.37% to 0.79% brk_test from aim9: -0.71% to 4.07% fork_test from aim9: -1.84% to 4.60% exec_test from aim9: -0.71% to 1.08% This patch: The allocator deals with zonelists which indicate the order in which zones should be targeted for an allocation. Similarly, direct reclaim of pages iterates over an array of zones. For consistency, this patch converts direct reclaim to use a zonelist. No functionality is changed by this patch. This simplifies zonelist iterators in the next patch. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <clameter@sgi.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:18 -07:00
Nick Piggin	3c18ddd160	mm: remove nopage Nothing in the tree uses nopage any more. Remove support for it in the core mm code and documentation (and a few stray references to it in comments). Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:18 -07:00
Harvey Harrison	ddc81ed2c5	remove sparse warning for mmzone.h include/linux/mmzone.h:640:22: warning: potentially expensive pointer subtraction Calculate the offset into the node_zones array rather than the index using casts to (char ) and comparing against the index sizeof(struct zone). On X86_32 this saves a sar, but code size increases by one byte per is_highmem() use due to 32-bit cmps rather than 16 bit cmps. Before: 207: 2b 80 8c 07 00 00 sub 0x78c(%eax),%eax 20d: c1 f8 0b sar $0xb,%eax 210: 83 f8 02 cmp $0x2,%eax 213: 74 16 je 22b <kmap_atomic_prot+0x144> 215: 83 f8 03 cmp $0x3,%eax 218: 0f 85 8f 00 00 00 jne 2ad <kmap_atomic_prot+0x1c6> 21e: 83 3d 00 00 00 00 02 cmpl $0x2,0x0 225: 0f 85 82 00 00 00 jne 2ad <kmap_atomic_prot+0x1c6> 22b: 64 a1 00 00 00 00 mov %fs:0x0,%eax After: 207: 2b 80 8c 07 00 00 sub 0x78c(%eax),%eax 20d: 3d 00 10 00 00 cmp $0x1000,%eax 212: 74 18 je 22c <kmap_atomic_prot+0x145> 214: 3d 00 18 00 00 cmp $0x1800,%eax 219: 0f 85 8f 00 00 00 jne 2ae <kmap_atomic_prot+0x1c7> 21f: 83 3d 00 00 00 00 02 cmpl $0x2,0x0 226: 0f 85 82 00 00 00 jne 2ae <kmap_atomic_prot+0x1c7> 22c: 64 a1 00 00 00 00 mov %fs:0x0,%eax [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:17 -07:00
Christoph Lameter	488514d179	Remove set_migrateflags() Migrate flags must be set on slab creation as agreed upon when the antifrag logic was reviewed. Otherwise some slabs of a slabcache will end up in the unmovable and others in the reclaimable section depending on which flag was active when a new slab page was allocated. This likely slid in somehow when antifrag was merged. Remove it. The buffer_heads are always allocated with __GFP_RECLAIMABLE because the SLAB_RECLAIM_ACCOUNT option is set. The set_migrateflags() never had any effect there. Radix tree allocations are not directly reclaimable but they are allocated with __GFP_RECLAIMABLE set on each allocation. We now set SLAB_RECLAIM_ACCOUNT on radix tree slab creation making sure that radix tree slabs are consistently placed in the reclaimable section. Radix tree slabs will also be accounted as such. There is then no user left of set_migratepages. So remove it. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:17 -07:00
Badari Pulavarty	ea01ea937d	hotplug memory remove: generic __remove_pages() support Generic helper function to remove section mappings and sysfs entries for the section of the memory we are removing. offline_pages() correctly adjusted zone and marked the pages reserved. TODO: Yasunori Goto is working on patches to free up allocations from bootmem. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:17 -07:00
David S. Miller	ceb4e8e44b	sparc64: Kill PIL_RESERVED, unused. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-28 04:03:06 -07:00
Eric Paris	7b41b1733c	SELinux: include/security.h whitespace, syntax, and other cleanups This patch changes include/security.h to fix whitespace and syntax issues. Things that are fixed may include (does not not have to include) whitespace at end of lines spaces followed by tabs spaces used instead of tabs spacing around parenthesis location of { around structs and else clauses location of * in pointer declarations removal of initialization of static data to keep it in the right section useless {} in if statemetns useless checking for NULL before kfree fixing of the indentation depth of switch statements no assignments in if statements include spaces around , in function calls and any number of other things I forgot to mention Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-04-28 09:29:08 +10:00
David S. Miller	90888816ba	sparc64: Clean up handling of pt_regs trap type encoding. If we use this from more than one place, it's better to have helpers instead of twiddling magic constants all over. Add pt_regs_trap_type(), pt_regs_clear_trap_type(), and pt_regs_is_syscall(). Use them in do_signal(). Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-27 14:52:51 -07:00
David L Stevens	dae5029548	ipv4/ipv6 compat: Fix SSM applications on 64bit kernels. Add support on 64-bit kernels for seting 32-bit compatible MCAST* socket options. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-27 14:26:53 -07:00
Linus Torvalds	064922a805	Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (40 commits) [SCSI] jazz_esp, sgiwd93, sni_53c710, sun3x_esp: fix platform driver hotplug/coldplug [SCSI] aic7xxx: add const [SCSI] aic7xxx: add static [SCSI] aic7xxx: Update _shipped files [SCSI] aic7xxx: teach aicasm to not emit unused debug code/data [SCSI] qla2xxx: Update version number to 8.02.01-k2. [SCSI] qla2xxx: Correct regression in relogin code. [SCSI] qla2xxx: Correct misc. endian and byte-ordering issues. [SCSI] qla2xxx: make qla2x00_issue_iocb_timeout() static [SCSI] qla2xxx: qla_os.c, make 2 functions static [SCSI] qla2xxx: Re-register FDMI information after a LIP. [SCSI] qla2xxx: Correct SRB usage-after-completion/free issues. [SCSI] qla2xxx: Correct ISP84XX verify-chip response handling. [SCSI] qla2xxx: Wakeup DPC thread to process any deferred-work requests. [SCSI] qla2xxx: Collapse RISC-RAM retrieval code during a firmware-dump. [SCSI] m68k: new mac_esp scsi driver [SCSI] zfcp: Add some statistics provided by the FCP adapter to the sysfs [SCSI] zfcp: Print some messages only during ERP [SCSI] zfcp: Wait for free SBAL during exchange config [SCSI] scsi_transport_fc: fc_user_scan correction ...	2008-04-27 11:25:00 -07:00
Linus Torvalds	42cadc8600	Merge branch 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm * 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (147 commits) KVM: kill file->f_count abuse in kvm KVM: MMU: kvm_pv_mmu_op should not take mmap_sem KVM: SVM: remove selective CR0 comment KVM: SVM: remove now obsolete FIXME comment KVM: SVM: disable CR8 intercept when tpr is not masking interrupts KVM: SVM: sync V_TPR with LAPIC.TPR if CR8 write intercept is disabled KVM: export kvm_lapic_set_tpr() to modules KVM: SVM: sync TPR value to V_TPR field in the VMCB KVM: ppc: PowerPC 440 KVM implementation KVM: Add MAINTAINERS entry for PowerPC KVM KVM: ppc: Add DCR access information to struct kvm_run ppc: Export tlb_44x_hwater for KVM KVM: Rename debugfs_dir to kvm_debugfs_dir KVM: x86 emulator: fix lea to really get the effective address KVM: x86 emulator: fix smsw and lmsw with a memory operand KVM: x86 emulator: initialize src.val and dst.val for register operands KVM: SVM: force a new asid when initializing the vmcb KVM: fix kvm_vcpu_kick vs __vcpu_run race KVM: add ioctls to save/store mpstate KVM: Rename VCPU_MP_STATE_* to KVM_MP_STATE_* ...	2008-04-27 10:13:52 -07:00
Linus Torvalds	fba5c1af5c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (49 commits) ide-tape: remove tape->merge_stage ide-tape: mv tape->merge_stage_size tape->merge_bh_size ide-tape: mv idetape_empty_write_pipeline ide_tape_flush_merge_buffer ide-tape: mv idetape_discard_read_pipeline ide_tape_discard_merge_buffer ide-tape: make __idetape_discard_read_pipeline() of type void ide: remove now unused ide_pci_create_host_proc() ide: remove /proc/ide/ali ide-tape: improve buffer pages freeing strategy ide-tape: mv tape->pages_per_stage tape->pages_per_buffer ide-tape: mv tape->stage_size tape->buffer_size ide-tape: improve buffer allocation strategy ide: add struct ide_io_ports (take 3) ide: make ide_unregister() take 'ide_hwif_t *' as an argument (take 2) ide: sanitize ide_unregister() usage mpc8xx-ide: use ide_find_port() ide: add "noacpi" / "acpigtf" / "acpionboot" parameters gayle: add "doubler" parameter ide: add "cdrom=" and "chs=" parameters ide: add "nodma\|noflush\|noprobe\|nowerr=" parameters ide: remove obsoleted "hdx=autotune" kernel parameter ...	2008-04-27 10:13:06 -07:00
Linus Torvalds	f222eba0f9	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-idle-fix * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-idle-fix: fix idle (arch, acpi and apm) and lockdep	2008-04-27 10:10:54 -07:00
Linus Torvalds	2d630d1a68	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: mlx4_core: Add helper to move QP to ready-to-send mlx4_core: Add HW queues allocation helpers RDMA/nes: Remove volatile qualifier from struct nes_hw_cq.cq_vbase mlx4_core: CQ resizing should pass a 0 opcode modifier to MODIFY_CQ mlx4_core: Move kernel doorbell management into core IB/ehca: Bump version number to 0026 IB/ehca: Make some module parameters bool, update descriptions IB/ehca: Remove mr_largepage parameter IB/ehca: Move high-volume debug output to higher debug levels IB/ehca: Prevent posting of SQ WQEs if QP not in RTS IPoIB: Handle 4K IB MTU for UD (datagram) mode RDMA/nes: Fix adapter reset after PXE boot RDMA/nes: Print IPv4 addresses in a readable format RDMA/nes: Use print_mac() to format ethernet addresses for printing	2008-04-27 10:10:14 -07:00
Christoph Lameter	65c3376aac	slub: Fallback to minimal order during slab page allocation If any higher order allocation fails then fall back the smallest order necessary to contain at least one object. This enables fallback for all allocations to order 0 pages. The fallback will waste more memory (objects will not fit neatly) and the fallback slabs will be not as efficient as larger slabs since they contain less objects. Note that SLAB also depends on order 1 allocations for some slabs that waste too much memory if forced into PAGE_SIZE'd page. SLUB now can now deal with failing order 1 allocs which SLAB cannot do. Add a new field min that will contain the objects for the smallest possible order for a slab cache. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-04-27 18:28:18 +03:00
Christoph Lameter	205ab99dd1	slub: Update statistics handling for variable order slabs Change the statistics to consider that slabs of the same slabcache can have different number of objects in them since they may be of different order. Provide a new sysfs field total_objects which shows the total objects that the allocated slabs of a slabcache could hold. Add a max field that holds the largest slab order that was ever used for a slab cache. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-04-27 18:28:17 +03:00
Christoph Lameter	834f3d1192	slub: Add kmem_cache_order_objects struct Pack the order and the number of objects into a single word. This saves some memory in the kmem_cache_structure and more importantly allows us to fetch both values atomically. Later the slab orders become runtime configurable and we need to fetch these two items together in order to properly allocate a slab and initialize its objects. Fix the race by fetching the order and the number of objects in one word. [penberg@cs.helsinki.fi: fix memset() page order in new_slab()] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-04-27 18:28:17 +03:00
Christoph Lameter	39b264641a	slub: Store max number of objects in the page struct. Split the inuse field up to be able to store the number of objects in this page in the page struct as well. Necessary if we want to have pages of various orders for a slab. Also avoids touching struct kmem_cache cachelines in __slab_alloc(). Update diagnostic code to check the number of objects and make sure that the number of objects always stays within the bounds of a 16 bit unsigned integer. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2008-04-27 18:28:16 +03:00
Al Viro	66c0b394f0	KVM: kill file->f_count abuse in kvm Use kvm own refcounting instead of playing with ->filp->f_count. That will allow to get rid of a lot of crap in anon_inode_getfd() and kill a race in kvm_dev_ioctl_create_vm() (file might have been closed immediately by another thread, so ->filp might point to already freed struct file when we get around to setting it). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 18:21:46 +03:00
Hollis Blanchard	bbf45ba57e	KVM: ppc: PowerPC 440 KVM implementation This functionality is definitely experimental, but is capable of running unmodified PowerPC 440 Linux kernels as guests on a PowerPC 440 host. (Only tested with 440EP "Bamboo" guests so far, but with appropriate userspace support other SoC/board combinations should work.) See Documentation/powerpc/kvm_440.txt for technical details. [stephen: build fix] Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 18:21:39 +03:00
Hollis Blanchard	b2312f059c	KVM: ppc: Add DCR access information to struct kvm_run Device Control Registers are essentially another address space found on PowerPC 4xx processors, analogous to PIO on x86. DCRs are always 32 bits, and can be identified by a 32-bit number. We forward most DCR accesses to userspace for emulation (with the exception of CPR0 registers, which can be read directly for simplicity in timebase frequency determination). Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 18:21:37 +03:00
Hollis Blanchard	4baacfb0de	ppc: Export tlb_44x_hwater for KVM PowerPC 440 KVM needs to know how many TLB entries are used for the host kernel linear mapping (it does not modify these mappings when switching between guest and host execution). Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 18:21:37 +03:00
Hollis Blanchard	76f7c87902	KVM: Rename debugfs_dir to kvm_debugfs_dir It's a globally exported symbol now. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 18:21:36 +03:00
Marcelo Tosatti	62d9f0dbc9	KVM: add ioctls to save/store mpstate So userspace can save/restore the mpstate during migration. [avi: export the #define constants describing the value] [christian: add s390 stubs] [avi: ditto for ia64] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 18:21:16 +03:00
Alexey Dobriyan	fd0949e6e8	ide: remove now unused ide_pci_create_host_proc() It creates files in proc with obsoleted ->get_info interface. Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Cc: Andrew Morton <akpm@osdl.org> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-27 15:38:34 +02:00
Bartlomiej Zolnierkiewicz	4c3032d8a4	ide: add struct ide_io_ports (take 3) * Add struct ide_io_ports and use it instead of `unsigned long io_ports[]` in ide_hwif_t. * Rename io_ports[] in hw_regs_t to io_ports_array[]. * Use un-named union for 'unsigned long io_ports_array[]' and 'struct ide_io_ports io_ports' in hw_regs_t. * Remove IDE__OFFSET defines. v2: scc_pata.c build fix from Stephen Rothwell. v3: * Fix ctl_adrr typo in Sparc-specific part of ns87415.c. (Noticed by Andrew Morton) Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-27 15:38:32 +02:00
Bartlomiej Zolnierkiewicz	387750c3bf	ide: make ide_unregister() take 'ide_hwif_t ' as an argument (take 2) Make ide_unregister() take 'ide_hwif_t hwif' instead of 'unsigned int index' (hwif->index) as an argument and update all users accordingly. While at it: Remove unnecessary checks for hwif != NULL from ide-pnp.c::idepnp_remove() and delkin_cb.c::delkin_cb_remove(). * Remove needless hwif->chipset assignment from scc_pata.c::scc_remove(). v2: * Fixup ide_unregister() documentation. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-27 15:38:31 +02:00
Bartlomiej Zolnierkiewicz	1dbfeb4bc8	ide: add "noacpi" / "acpigtf" / "acpionboot" parameters * Rename ide_noacpi{tfs,onboot} to ide_acpi{gtf,onboot} (+ reverse logic). * Move ide_acpi variables to ide-acpi.c and remove unnecessary initializers. * Add "noacpi" / "acpigtf" / "acpionboot" parameters. * Obsolete "ide=noacpi" / "ide=acpigtf" / "ide=acpionboot" kernel parameters. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-27 15:38:30 +02:00
Bartlomiej Zolnierkiewicz	207daeaabb	ide: remove obsoleted "hdx=autotune" kernel parameter * Remove obsoleted "hdx=autotune" kernel parameter (we always auto-tune PIO if possible nowadays). * Remove no longer needed ide_drive_t.autotune flag. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-27 15:38:29 +02:00
Bartlomiej Zolnierkiewicz	e160124ff6	ide: remove IDE_HFLAG_NO_AUTOTUNE host flag * Don't set IDE_HFLAG_NO_AUTOTUNE host flag in sgiioc4 and icside host drivers - there is no need for it as they don't implement ->set_pio_mode method. * Remove no longer needed IDE_HFLAG_NO_AUTOTUNE host flag. There should be no functional changes caused by this patch. Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-27 15:38:29 +02:00
Bartlomiej Zolnierkiewicz	ebae41a5a0	ide: add "vlb\|pci_clock=" parameter * Add "vlb_clock=" parameter for specifying VLB clock frequency (in MHz). * Add "pci_clock=" parameter for specifying PCI bus clock frequency (in MHz). While at it: * qd65xx.c: rename {active,recovery}_cycle variables to {act,rec}_cyc. Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-27 15:38:29 +02:00
Bartlomiej Zolnierkiewicz	cc12175ff2	ide: remove obsoleted "hdx=noautotune" kernel parameter Remove obsoleted "hdx=noautotune" kernel parameter (it has been obsoleted since 1 Nov 2004). Then make ide_hwif_t.autotune a single bit flag and remove no longer needed IDE_TUNE_* defines. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-27 15:38:24 +02:00
Bartlomiej Zolnierkiewicz	e460a59751	ide: remove obsoleted "idex=reset" kernel parameter Remove obsoleted "idex=reset" kernel parameter (it has been obsoleted since 1 Nov 2004). Then remove corresponding code from ide_probe_port() and no longer used ->reset field from ide_hwif_t. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-27 15:38:24 +02:00
Bartlomiej Zolnierkiewicz	9fd91d959f	ide: add "ignore_cable" parameter (take 2) Add "ignore_cable" parameter: * "ide_core.ignore_cable=[interface_number]" boot option if IDE is built-in (i.e. "ide_core.ignore_cable=1" to force ignoring cable for "ide1") * "ignore_cable=[interface_number]" module parameter (for ide_core module) if IDE is compiled as module v2: * Add ide_port_apply_params() helper - use it in ide_device_add_all() and ide_scan_port(). * Make it possible to later disable ignoring cable detection by passing "[interface_number]:0" to /sys/module/ide_core/parameters/ignore_cable (however sysfs interface is not enabled yet since it needs some other IDE changes to make it work reliable). Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-27 15:38:23 +02:00
David S. Miller	5526b7e451	sparc: Remove old style signal frame support. Back around the same time we were bootstrapping the first 32-bit sparc Linux kernel with a SunOS userland, we made the signal frame match that of SunOS. By the time we even started putting together a native Linux userland for 32-bit Sparc we realized this layout wasn't sufficient for Linux's needs. Therefore we changed the layout, yet kept support for the old style signal frame layout in there. The detection mechanism is that we had sys_sigaction() start passing in a negative signal number to indicate "new style signal frames please". Anyways, no binaries exist in the world that use the old stuff. In fact, I bet Jakub Jelinek and myself are the only two people who ever had such binaries to be honest. So let's get rid of this stuff. I added an assertion using WARN_ON_ONCE() that makes sure 32-bit applications are passing in that negative signal number still. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-27 02:26:36 -07:00
Avi Kivity	a45352908b	KVM: Rename VCPU_MP_STATE_* to KVM_MP_STATE_* We wish to export it to userspace, so move it into the kvm namespace. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:04:13 +03:00
Marcelo Tosatti	3d80840d96	KVM: hlt emulation should take in-kernel APIC/PIT timers into account Timers that fire between guest hlt and vcpu_block's add_wait_queue() are ignored, possibly resulting in hangs. Also make sure that atomic_inc and waitqueue_active tests happen in the specified order, otherwise the following race is open: CPU0 CPU1 if (waitqueue_active(wq)) add_wait_queue() if (!atomic_read(pit_timer->pending)) schedule() atomic_inc(pit_timer->pending) Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:04:11 +03:00
Feng(Eric) Liu	d4c9ff2d1b	KVM: Add kvm trace userspace interface This interface allows user a space application to read the trace of kvm related events through relayfs. Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:01:22 +03:00
Feng (Eric) Liu	2714d1d3d6	KVM: Add trace markers Trace markers allow userspace to trace execution of a virtual machine in order to monitor its performance. Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:01:19 +03:00
Joerg Roedel	53371b5098	KVM: SVM: add intercept for machine check exception To properly forward a MCE occured while the guest is running to the host, we have to intercept this exception and call the host handler by hand. This is implemented by this patch. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:01:18 +03:00
Anthony Liguori	35149e2129	KVM: MMU: Don't assume struct page for x86 This patch introduces a gfn_to_pfn() function and corresponding functions like kvm_release_pfn_dirty(). Using these new functions, we can modify the x86 MMU to no longer assume that it can always get a struct page for any given gfn. We don't want to eliminate gfn_to_page() entirely because a number of places assume they can do gfn_to_page() and then kmap() the results. When we support IO memory, gfn_to_page() will fail for IO pages although gfn_to_pfn() will succeed. This does not implement support for avoiding reference counting for reserved RAM or for IO memory. However, it should make those things pretty straight forward. Since we're only introducing new common symbols, I don't think it will break the non-x86 architectures but I haven't tested those. I've tested Intel, AMD, NPT, and hugetlbfs with Windows and Linux guests. [avi: fix overflow when shifting left pfns by adding casts] Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:01:15 +03:00
Xiantao Zhang	1a9c1ac469	KVM: ia64: Add header files for kvm/ia64 Three header files are added: asm-ia64/kvm.h asm-ia64/kvm_host.h asm-ia64/kvm_para.h Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:01:02 +03:00
Xiantao Zhang	e235f3450f	KVM: ia64: Prepare some structure and routines for kvm use Register structures are defined per SDM. Add three small routines for kernel: ia64_ttag, ia64_loadrs, ia64_flushrs Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:01:01 +03:00
Heiko Carstens	c71799c1f4	KVM: s390: Improve pgste accesses There is no need to use interlocked updates when the rcp lock is held. Therefore the simple bitops variants can be used. This should improve performance. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:01:00 +03:00
Izik Eidus	d39f13b0da	KVM: add vm refcounting the main purpose of adding this functions is the abilaty to release the spinlock that protect the kvm list while still be able to do operations on a specific kvm in a safe way. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:56 +03:00
Joerg Roedel	9c20456a32	KVM: function declaration parameter name cleanup The kvm_host.h file for x86 declares the functions kvm_set_cr[0348]. In the header file their second parameter is named cr0 in all cases. This patch renames the parameters so that they match the function name. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:55 +03:00
Marcelo Tosatti	3200f405a1	KVM: MMU: unify slots_lock usage Unify slots_lock acquision around vcpu_run(). This is simpler and less error-prone. Also fix some callsites that were not grabbing the lock properly. [avi: drop slots_lock while in guest mode to avoid holding the lock for indefinite periods] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:52 +03:00
Christian Borntraeger	e976a2b997	s390: KVM guest: virtio device support, and kvm hypercalls This patch implements kvm guest kernel support for paravirtualized devices and contains two parts: o a basic virtio stub using virtio_ring and external interrupts and hypercalls o full hypercall implementation in kvm_para.h Currently we dont have PCI on s390. Making virtio_pci usable for s390 seems more complicated that providing an own stub. This virtio stub is similar to the lguest one, the memory for the descriptors and the device detection is made via additional mapped memory on top of the guest storage. We use an external interrupt with extint code 0x2603 for host->guest notification. The hypercall definition uses the diag instruction for issuing a hypercall. The parameters are written in R2-R7, the hypercall number is written in R1. This is similar to the system call ABI (svc) which can use R1 for the number and R2-R6 for the parameters. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:51 +03:00
Carsten Otte	fa5877439d	s390: KVM guest: detect when running on kvm This patch adds functionality to detect if the kernel runs under the KVM hypervisor. A macro MACHINE_IS_KVM is exported for device drivers. This allows drivers to skip device detection if the systems runs non-virtualized. We also define a preferred console to avoid having the ttyS0, which is a line mode only console. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:50 +03:00
Christian Borntraeger	e28acfea5d	KVM: s390: intercepts for diagnose instructions This patch introduces interpretation of some diagnose instruction intercepts. Diagnose is our classic architected way of doing a hypercall. This patch features the following diagnose codes: - vm storage size, that tells the guest about its memory layout - time slice end, which is used by the guest to indicate that it waits for a lock and thus cannot use up its time slice in a useful way - ipl functions, which a guest can use to reset and reboot itself In order to implement ipl functions, we also introduce an exit reason that causes userspace to perform various resets on the virtual machine. All resets are described in the principles of operation book, except KVM_S390_RESET_IPL which causes a reboot of the machine. Acked-by: Martin Schwidefsky <martin.schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:46 +03:00
Christian Borntraeger	5288fbf0ef	KVM: s390: interprocessor communication via sigp This patch introduces in-kernel handling of _some_ sigp interprocessor signals (similar to ipi). kvm_s390_handle_sigp() decodes the sigp instruction and calls individual handlers depending on the operation requested: - sigp sense tries to retrieve information such as existence or running state of the remote cpu - sigp emergency sends an external interrupt to the remove cpu - sigp stop stops a remove cpu - sigp stop store status stops a remote cpu, and stores its entire internal state to the cpus lowcore - sigp set arch sets the architecture mode of the remote cpu. setting to ESAME (s390x 64bit) is accepted, setting to ESA/S390 (s390, 31 or 24 bit) is denied, all others are passed to userland - sigp set prefix sets the prefix register of a remote cpu For implementation of this, the stop intercept indication starts to get reused on purpose: a set of action bits defines what to do once a cpu gets stopped: ACTION_STOP_ON_STOP really stops the cpu when a stop intercept is recognized ACTION_STORE_ON_STOP stores the cpu status to lowcore when a stop intercept is recognized Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:46 +03:00
Christian Borntraeger	453423dce2	KVM: s390: intercepts for privileged instructions This patch introduces in-kernel handling of some intercepts for privileged instructions: handle_set_prefix() sets the prefix register of the local cpu handle_store_prefix() stores the content of the prefix register to memory handle_store_cpu_address() stores the cpu number of the current cpu to memory handle_skey() just decrements the instruction address and retries handle_stsch() delivers condition code 3 "operation not supported" handle_chsc() same here handle_stfl() stores the facility list which contains the capabilities of the cpu handle_stidp() stores cpu type/model/revision and such handle_stsi() stores information about the system topology Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:45 +03:00
Carsten Otte	ba5c1e9b6c	KVM: s390: interrupt subsystem, cpu timer, waitpsw This patch contains the s390 interrupt subsystem (similar to in kernel apic) including timer interrupts (similar to in-kernel-pit) and enabled wait (similar to in kernel hlt). In order to achieve that, this patch also introduces intercept handling for instruction intercepts, and it implements load control instructions. This patch introduces an ioctl KVM_S390_INTERRUPT which is valid for both the vm file descriptors and the vcpu file descriptors. In case this ioctl is issued against a vm file descriptor, the interrupt is considered floating. Floating interrupts may be delivered to any virtual cpu in the configuration. The following interrupts are supported: SIGP STOP - interprocessor signal that stops a remote cpu SIGP SET PREFIX - interprocessor signal that sets the prefix register of a (stopped) remote cpu INT EMERGENCY - interprocessor interrupt, usually used to signal need_reshed and for smp_call_function() in the guest. PROGRAM INT - exception during program execution such as page fault, illegal instruction and friends RESTART - interprocessor signal that starts a stopped cpu INT VIRTIO - floating interrupt for virtio signalisation INT SERVICE - floating interrupt for signalisations from the system service processor struct kvm_s390_interrupt, which is submitted as ioctl parameter when injecting an interrupt, also carrys parameter data for interrupts along with the interrupt type. Interrupts on s390 usually have a state that represents the current operation, or identifies which device has caused the interruption on s390. kvm_s390_handle_wait() does handle waitpsw in two flavors: in case of a disabled wait (that is, disabled for interrupts), we exit to userspace. In case of an enabled wait we set up a timer that equals the cpu clock comparator value and sleep on a wait queue. [christian: change virtio interrupt to 0x2603] Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:44 +03:00
Christian Borntraeger	8f2abe6a1e	KVM: s390: sie intercept handling This path introduces handling of sie intercepts in three flavors: Intercepts are either handled completely in-kernel by kvm_handle_sie_intercept(), or passed to userspace with corresponding data in struct kvm_run in case kvm_handle_sie_intercept() returns -ENOTSUPP. In case of partial execution in kernel with the need of userspace support, kvm_handle_sie_intercept() may choose to set up struct kvm_run and return -EREMOTE. The trivial intercept reasons are handled in this patch: handle_noop() just does nothing for intercepts that don't require our support at all handle_stop() is called when a cpu enters stopped state, and it drops out to userland after updating our vcpu state handle_validity() faults in the cpu lowcore if needed, or passes the request to userland Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:43 +03:00
Heiko Carstens	b0c632db63	KVM: s390: arch backend for the kvm kernel module This patch contains the port of Qumranet's kvm kernel module to IBM zSeries (aka s390x, mainframe) architecture. It uses the mainframe's virtualization instruction SIE to run virtual machines with up to 64 virtual CPUs each. This port is only usable on 64bit host kernels, and can only run 64bit guest kernels. However, running 31bit applications in guest userspace is possible. The following source files are introduced by this patch arch/s390/kvm/kvm-s390.c similar to arch/x86/kvm/x86.c, this implements all arch callbacks for kvm. __vcpu_run calls back into sie64a to enter the guest machine context arch/s390/kvm/sie64a.S assembler function sie64a, which enters guest context via SIE, and switches world before and after that include/asm-s390/kvm_host.h contains all vital data structures needed to run virtual machines on the mainframe include/asm-s390/kvm.h defines kvm_regs and friends for user access to guest register content arch/s390/kvm/gaccess.h functions similar to uaccess to access guest memory arch/s390/kvm/kvm-s390.h header file for kvm-s390 internals, extended by later patches Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:42 +03:00
Christian Borntraeger	8a88ac6183	s390: KVM preparation: address of the 64bit extint parm in lowcore The address 0x11b8 is used by z/VM for pfault and diag 250 I/O to provide a 64 bit extint parameter. virtio uses the same address, so its time to update the lowcore structure. Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:41 +03:00
Christian Borntraeger	5b7baf0578	s390: KVM preparation: host memory management changes for s390 kvm This patch changes the s390 memory management defintions to use the pgste field for dirty and reference bit tracking of host and guest code. Usually on s390, dirty and referenced are tracked in storage keys, which belong to the physical page. This changes with virtualization: The guest and host dirty/reference bits are defined to be the logical OR of the values for the mapping and the physical page. This patch implements the necessary changes in pgtable.h for s390. There is a common code change in mm/rmap.c, the call to page_test_and_clear_young must be moved. This is a no-op for all architecture but s390. page_referenced checks the referenced bits for the physiscal page and for all mappings: o The physical page is checked with page_test_and_clear_young. o The mappings are checked with ptep_test_and_clear_young and friends. Without pgstes (the current implementation on Linux s390) the physical page check is implemented but the mapping callbacks are no-ops because dirty and referenced are not tracked in the s390 page tables. The pgstes introduces guest and host dirty and reference bits for s390 in the host mapping. These mapping must be checked before page_test_and_clear_young resets the reference bit. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:40 +03:00
Carsten Otte	402b08622d	s390: KVM preparation: provide hook to enable pgstes in user pagetable The SIE instruction on s390 uses the 2nd half of the page table page to virtualize the storage keys of a guest. This patch offers the s390_enable_sie function, which reorganizes the page tables of a single-threaded process to reserve space in the page table: s390_enable_sie makes sure that the process is single threaded and then uses dup_mm to create a new mm with reorganized page tables. The old mm is freed and the process has now a page status extended field after every page table. Code that wants to exploit pgstes should SELECT CONFIG_PGSTE. This patch has a small common code hit, namely making dup_mm non-static. Edit (Carsten): I've modified Martin's patch, following Jeremy Fitzhardinge's review feedback. Now we do have the prototype for dup_mm in include/linux/sched.h. Following Martin's suggestion, s390_enable_sie() does now call task_lock() to prevent race against ptrace modification of mm_users. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Carsten Otte <cotte@de.ibm.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:40 +03:00
Izik Eidus	37817f2982	KVM: x86: hardware task switching support This emulates the x86 hardware task switch mechanism in software, as it is unsupported by either vmx or svm. It allows operating systems which use it, like freedos, to run as kvm guests. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:39 +03:00
Izik Eidus	2e4d265349	KVM: x86: add functions to get the cpl of vcpu Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:38 +03:00
Avi Kivity	69a9f69bb2	KVM: Move some x86 specific constants and structures to include/asm-x86 Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:34 +03:00
Christian Borntraeger	97646202bc	KVM: kvm.h: __user requires compiler.h include/linux/kvm.h defines struct kvm_dirty_log to [...] union { void __user dirty_bitmap; / one bit per page */ __u64 padding; }; __user requires compiler.h to compile. Currently, this works on x86 only coincidentally due to other include files. This patch makes kvm.h compile in all cases. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:32 +03:00
Glauber Costa	3c62c62502	x86: make native_machine_shutdown non-static it will allow external users to call it. It is mainly useful for routines that will override its machine_ops field for its own special purposes, but want to call the normal shutdown routine after they're done Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:30 +03:00
Glauber Costa	ed23dc6f5b	x86: allow machine_crash_shutdown to be replaced This patch a llows machine_crash_shutdown to be replaced, just like any of the other functions in machine_ops Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:29 +03:00
Marcelo Tosatti	2f333bcb4e	KVM: MMU: hypercall based pte updates and TLB flushes Hypercall based pte updates are faster than faults, and also allow use of the lazy MMU mode to batch operations. Don't report the feature if two dimensional paging is enabled. [avi: - one mmu_op hypercall instead of one per op - allow 64-bit gpa on hypercall - don't pass host errors (-ENOMEM) to guest] [akpm: warning fix on i386] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:27 +03:00
Avi Kivity	9f81128591	KVM: Provide unlocked version of emulator_write_phys() Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:26 +03:00
Marcelo Tosatti	0cf1bfd273	x86: KVM guest: add basic paravirt support Add basic KVM paravirt support. Avoid vm-exits on IO delays. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:25 +03:00
Marcelo Tosatti	a28e4f5a62	KVM: add basic paravirt support Add basic KVM paravirt support. Avoid vm-exits on IO delays. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:24 +03:00
Sheng Yang	e0f63cb927	KVM: Add save/restore supporting of in kernel PIT Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:22 +03:00
Sheng Yang	7837699fa6	KVM: In kernel PIT model The patch moves the PIT model from userspace to kernel, and increases the timer accuracy greatly. [marcelo: make last_injected_time per-guest] Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Tested-and-Acked-by: Alex Davis <alex14641@yahoo.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 12:00:21 +03:00
Joerg Roedel	71c4dfafc0	KVM: detect if VCPU triple faults In the current inject_page_fault path KVM only checks if there is another PF pending and injects a DF then. But it has to check for a pending DF too to detect a shutdown condition in the VCPU. If this is not detected the VCPU goes to a PF -> DF -> PF loop when it should triple fault. This patch detects this condition and handles it with an KVM_SHUTDOWN exit to userspace. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:27 +03:00
Avi Kivity	2d3ad1f40c	KVM: Prefix control register accessors with kvm_ to avoid namespace pollution Names like 'set_cr3()' look dangerously close to affecting the host. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:26 +03:00
Marcelo Tosatti	05da45583d	KVM: MMU: large page support Create large pages mappings if the guest PTE's are marked as such and the underlying memory is hugetlbfs backed. If the largepage contains write-protected pages, a large pte is not used. Gives a consistent 2% improvement for data copies on ram mounted filesystem, without NPT/EPT. Anthony measures a 4% improvement on 4-way kernbench, with NPT. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:25 +03:00
Marcelo Tosatti	2e53d63acb	KVM: MMU: ignore zapped root pagetables Mark zapped root pagetables as invalid and ignore such pages during lookup. This is a problem with the cr3-target feature, where a zapped root table fools the faulting code into creating a read-only mapping. The result is a lockup if the instruction can't be emulated. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Cc: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:25 +03:00
Amit Shah	f11c3a8d84	KVM: Add stat counter for hypercalls Signed-off-by: Amit Shah <amit.shah@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:24 +03:00
Avi Kivity	ef2979bd98	KVM: Increase the number of user memory slots per vm Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:23 +03:00
Avi Kivity	a988b910ef	KVM: Add API for determining the number of supported memory slots Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:23 +03:00
Avi Kivity	edbe6c325d	KVM: Increase vcpu count to 16 With NPT support, scalability is much improved. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:23 +03:00
Avi Kivity	f725230af9	KVM: Add API to retrieve the number of supported vcpus per vm Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:23 +03:00
Glauber de Oliveira Costa	18068523d3	KVM: paravirtualized clocksource: host part This is the host part of kvm clocksource implementation. As it does not include clockevents, it is a fairly simple implementation. We only have to register a per-vcpu area, and start writing to it periodically. The area is binary compatible with xen, as we use the same shadow_info structure. [marcelo: fix bad_page on MSR_KVM_SYSTEM_TIME] [avi: save full value of the msr, even if enable bit is clear] [avi: clear previous value of time_page] Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:22 +03:00
Joerg Roedel	cc4b6871e7	KVM: export the load_pdptrs() function to modules The load_pdptrs() function is required in the SVM module for NPT support. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:20 +03:00
Joerg Roedel	1855267210	KVM: export information about NPT to generic x86 code The generic x86 code has to know if the specific implementation uses Nested Paging. In the generic code Nested Paging is called Two Dimensional Paging (TDP) to avoid confusion with (future) TDP implementations of other vendors. This patch exports the availability of TDP to the generic x86 code. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:19 +03:00
Joerg Roedel	f2b4b7ddf6	KVM: make EFER_RESERVED_BITS configurable for architecture code This patch give the SVM and VMX implementations the ability to add some bits the guest can set in its EFER register. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:18 +03:00
Hollis Blanchard	31bb117eb4	KVM: Use CONFIG_PREEMPT_NOTIFIERS around struct preempt_notifier This allows kvm_host.h to be #included even when struct preempt_notifier is undefined. This is needed to build ppc asm-offsets.h. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:18 +03:00
Sheng Yang	2384d2b326	KVM: VMX: Enable Virtual Processor Identification (VPID) To allow TLB entries to be retained across VM entry and VM exit, the VMM can now identify distinct address spaces through a new virtual-processor ID (VPID) field of the VMCS. [avi: drop vpid_sync_all()] [avi: add "cc" to asm constraints] Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:17 +03:00
Dong, Eddie	1ae0a13def	KVM: MMU: Simplify hash table indexing Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-04-27 11:53:14 +03:00
David S. Miller	5da496e4b9	sparc64: Kill unused local ISA bus layer. No more drivers use this, and therefore it can die. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-26 21:41:23 -07:00
David S. Miller	0eb78f0b1a	sparc64: Kill ISA_FLOPPY_WORKS code. This never was enabled, I could never get it working, and if anyone wants to try and get it's very easy to reference this code in the history. It's the only thing referencing the silly ISA device layer in the sparc64 tree. OF device layer infrastructure is what should be used for these things. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-26 21:41:19 -07:00
Peter Zijlstra	7f424a8b08	fix idle (arch, acpi and apm) and lockdep OK, so 25-mm1 gave a lockdep error which made me look into this. The first thing that I noticed was the horrible mess; the second thing I saw was hacks like: `71e93d1561` The problem is that arch idle routines are somewhat inconsitent with their IRQ state handling and instead of fixing _that_, we go paper over the problem. So the thing I've tried to do is set a standard for idle routines and fix them all up to adhere to that. So the rules are: idle routines are entered with IRQs disabled idle routines will exit with IRQs enabled Nearly all already did this in one form or another. Merge the 32 and 64 bit bits so they no longer have different bugs. As for the actual lockdep warning; __sti_mwait() did a plainly un-annotated irq-enable. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-27 00:01:45 +02:00
Linus Torvalds	c3bf9bc243	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-bootmem-v3 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-bootmem-v3: x86_64/mm: check and print vmemmap allocation continuous x86_64: fix setup_node_bootmem to support big mem excluding with memmap x86_64: make reserve_bootmem_generic() use new reserve_bootmem() mm: allow reserve_bootmem() cross nodes mm: offset align in alloc_bootmem() mm: fix alloc_bootmem_core to use fast searching for all nodes mm: make mem_map allocation continuous	2008-04-26 14:04:32 -07:00
Yinghai Lu	c2b91e2eec	x86_64/mm: check and print vmemmap allocation continuous On big systems with lots of memory, don't print out too much during bootup, and make it easy to find if it is continuous. on 256G 8 sockets system will get [ffffe20000000000-ffffe20002bfffff] PMD -> [ffff810001400000-ffff810003ffffff] on node 0 [ffffe2001c700000-ffffe2001c7fffff] potential offnode page_structs [ffffe20002c00000-ffffe2001c7fffff] PMD -> [ffff81000c000000-ffff8100255fffff] on node 0 [ffffe20038700000-ffffe200387fffff] potential offnode page_structs [ffffe2001c800000-ffffe200387fffff] PMD -> [ffff810820200000-ffff81083c1fffff] on node 1 [ffffe20040000000-ffffe2007fffffff] PUD ->ffff811027a00000 on node 2 [ffffe20038800000-ffffe2003fffffff] PMD -> [ffff811020200000-ffff8110279fffff] on node 2 [ffffe20054700000-ffffe200547fffff] potential offnode page_structs [ffffe20040000000-ffffe200547fffff] PMD -> [ffff811027c00000-ffff81103c3fffff] on node 2 [ffffe20070700000-ffffe200707fffff] potential offnode page_structs [ffffe20054800000-ffffe200707fffff] PMD -> [ffff811820200000-ffff81183c1fffff] on node 3 [ffffe20080000000-ffffe200bfffffff] PUD ->ffff81202fa00000 on node 4 [ffffe20070800000-ffffe2007fffffff] PMD -> [ffff812020200000-ffff81202f9fffff] on node 4 [ffffe2008c700000-ffffe2008c7fffff] potential offnode page_structs [ffffe20080000000-ffffe2008c7fffff] PMD -> [ffff81202fc00000-ffff81203c3fffff] on node 4 [ffffe200a8700000-ffffe200a87fffff] potential offnode page_structs [ffffe2008c800000-ffffe200a87fffff] PMD -> [ffff812820200000-ffff81283c1fffff] on node 5 [ffffe200c0000000-ffffe200ffffffff] PUD ->ffff813037a00000 on node 6 [ffffe200a8800000-ffffe200bfffffff] PMD -> [ffff813020200000-ffff8130379fffff] on node 6 [ffffe200c4700000-ffffe200c47fffff] potential offnode page_structs [ffffe200c0000000-ffffe200c47fffff] PMD -> [ffff813037c00000-ffff81303c3fffff] on node 6 [ffffe200c4800000-ffffe200e07fffff] PMD -> [ffff813820200000-ffff81383c1fffff] on node 7 instead of a very long print out... Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-26 22:51:09 +02:00
Yinghai Lu	1a27fc0a42	x86_64: fix setup_node_bootmem to support big mem excluding with memmap typical case: four sockets system, every node has 4g ram, and we are using: memmap=10g$4g to mask out memory on node1 and node2 when numa is enabled, early_node_mem is used to get node_data and node_bootmap. if it can not get memory from the same node with find_e820_area(), it will use alloc_bootmem to get buff from previous nodes. so check it and print out some info about it. need to move early_res_to_bootmem into every setup_node_bootmem. and it takes range that node has. otherwise alloc_bootmem could return addr that reserved early. depends on "mm: make reserve_bootmem can crossed the nodes". Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-26 22:51:08 +02:00
Linus Torvalds	9b79ed952b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-generic-bitops-v3 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-generic-bitops-v3: x86, bitops: select the generic bitmap search functions x86: include/asm-x86/pgalloc.h/bitops.h: checkpatch cleanups - formatting only x86: finalize bitops unification x86, UML: remove x86-specific implementations of find_first_bit x86: optimize find_first_bit for small bitmaps x86: switch 64-bit to generic find_first_bit x86: generic versions of find_first_(zero_)bit, convert i386 bitops: use __fls for fls64 on 64-bit archs generic: implement __fls on all 64-bit archs generic: introduce a generic __fls implementation x86: merge the simple bitops and move them to bitops.h x86, generic: optimize find_next_(zero_)bit for small constant-size bitmaps x86, uml: fix uml with generic find_next_bit for x86 x86: change x86 to use generic find_next_bit uml: Kconfig cleanup uml: fix build error	2008-04-26 13:46:11 -07:00
Linus Torvalds	a52b0d25a7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (46 commits) ide: constify struct ide_dma_ops ide: add struct ide_dma_ops (take 3) ide: add IDE_HFLAG_SERIALIZE_DMA host flag sl82c105: check bridge revision in sl82c105_init_one() au1xxx-ide: use ->init_dma method palm_bk3710: use ->init_dma method sgiioc4: use ->init_dma method icside: use ->init_dma method ide-pmac: use ->init_dma method ide: do complete DMA setup in ->init_dma method (take 2) au1xxx-ide: fix MWDMA support ide: cleanup ide_setup_dma() ide: factor out setting PCI bus-mastering from ide_hwif_setup_dma() ide: export ide_allocate_dma_engine() ide: move ide_setup_dma() call out from ->init_dma method alim15x3: skip DMA initialization completely on revs < 0x20 pdc202xx_old: remove init_dma_pdc202xx() ide: don't display "BIOS" settings in ide_setup_dma() ide: remove ->cds field from ide_hwif_t (take 2) ide: remove ide_dma_iobase() ...	2008-04-26 13:44:19 -07:00
Bartlomiej Zolnierkiewicz	f37afdaca7	ide: constify struct ide_dma_ops * Export ide_dma_exec_cmd() and __ide_dma_test_irq(). * Constify struct ide_dma_ops. * Always set hwif->dma_ops to &sff_dma_ops in ide_setup_dma() (it is later overriden by ide_init_port() if needed) and drop 'const struct ide_port_info d' argument. While at it: Rename __ide_dma_test_irq() to ide_dma_test_irq(). Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-26 22:25:24 +02:00
Bartlomiej Zolnierkiewicz	5e37bdc081	ide: add struct ide_dma_ops (take 3) Add struct ide_dma_ops and convert core code + drivers to use it. While at it: * Drop "ide_" prefix from ->ide_dma_end and ->ide_dma_test_irq methods. * Drop "ide_" "infixes" from DMA methods. * au1xxx-ide.c: - use auide_dma_{test_irq,end}() directly in auide_dma_timeout() * pdc202xx_old.c: - drop "old_" "infixes" from DMA methods * siimage.c: - add siimage_dma_test_irq() helper - print SATA warning in siimage_init_one() * Remove no longer needed ->init_hwif implementations. v2: * Changes based on review from Sergei: - s/siimage_ide_dma_test_irq/siimage_dma_test_irq/ - s/drive->hwif/hwif/ in idefloppy_pc_intr(). - fix patch description w.r.t. au1xxx-ide changes - fix au1xxx-ide build - fix naming for cmd64_dma_ops - drop "ide_" and "old_" infixes - s/hpt3xxx_dma_ops/hpt37x_dma_ops/ - s/hpt370x_dma_ops/hpt370_dma_ops/ - use correct DMA ops for HPT302/N, HPT371/N and HPT374 - s/it821x_smart_dma_ops/it821x_pass_through_dma_ops/ v3: Two bugs slipped in v2 (noticed by Sergei): - use correct DMA ops for HPT374 (for real this time) - handle HPT370/HPT370A properly Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-26 22:25:24 +02:00
Bartlomiej Zolnierkiewicz	1fd1890594	ide: add IDE_HFLAG_SERIALIZE_DMA host flag * Add IDE_HFLAG_SERIALIZE_DMA host flag to serialize ports if DMA is available and handle it in ide_init_port(). * Convert sl82c105 host driver to use this new flag. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-26 22:25:24 +02:00
Bartlomiej Zolnierkiewicz	b123f56e04	ide: do complete DMA setup in ->init_dma method (take 2) * Make ide_hwif_setup_dma() return an error value. * Pass 'const struct ide_port_info d' instead of 'unsigned long dmabase' to ->init_dma method and make it return an error value. Rename ide_get_or_set_dma_base() to ide_pci_dma_base(), change ordering of its arguments and then export it. * Export ide_pci_set_master(). * Do complete DMA setup inside ->init_dma method and update ->init_dma users accordingly. * Sanitize code for DMA setup in ide_init_port(). v2: * Fix for CONFIG_BLK_DEV_IDEDMA_PCI=n configs (from Jiri Slaby <jirislaby@gmail.com>): Fix following compiler warning by returning EINVAL: In file included from ANYTHING-INCLUDING-IDE.H:45: include/linux/ide.h: In function ‘ide_hwif_setup_dma’: include/linux/ide.h:1022: warning: no return statement in function returning non-void Cc: Jiri Slaby <jirislaby@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-26 22:25:22 +02:00
Bartlomiej Zolnierkiewicz	f629b38bed	au1xxx-ide: fix MWDMA support Always use "fast" MWDMA support and remove dma_{black,white}_list (they were based on completely bogus ->ide_dma_check implementation which didn't set neither the host controller timings nor the device for the desired transfer mode). Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-26 22:25:22 +02:00
Bartlomiej Zolnierkiewicz	b8e73fba60	ide: export ide_allocate_dma_engine() Export ide_allocate_dma_engine() and use it in trm290 host driver. Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-26 22:25:21 +02:00
Bartlomiej Zolnierkiewicz	5e59c23684	ide: remove ->cds field from ide_hwif_t (take 2) * Use hwif->name instead of cds->name in ide_allocate_dma_engine(). * Use pci_name(dev) instead of cds->name in init_dma_pdc202xx(). * Remove no longer needed ->cds field from ide_hwif_t. v2: * scc_pata.c also needs to be updated now (noticed by Stephen Rothwell). There should be no functional changes caused by this patch. Cc: Kou Ishizaki <kou.ishizaki@toshiba.co.jp> Cc: Akira Iguchi <akira2.iguchi@toshiba.co.jp> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-26 22:25:20 +02:00
Bartlomiej Zolnierkiewicz	21a3387ddd	ide: remove ->extra field from struct ide_port_info Always setup hwif->extra_base in ide_iomio_dma() and remove no longer needed ->extra field from struct ide_port_info. There should be no functional changes caused by this patch. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-26 22:25:20 +02:00
Bartlomiej Zolnierkiewicz	5add222417	ide: remove ide_hwif_request_regions() Remove no longer used ide_hwif_request_regions() and hwif_request_region(). Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-26 22:25:19 +02:00
Bartlomiej Zolnierkiewicz	0d1bad216c	ide: manage resources for PCI devices in ide_pci_enable() (take 3) * Reserve PCI BARs 0-3 (0-1 for single port controllers) in ide_pci_enable() and remove ide_hwif_request_regions() call from ide_device_add_all() (also cleanup resource management in scc_pata host driver). * Fix handling of PCI BAR 4 in ide_pci_enable(), then cleanup ide_iomio_dma() (+ init_hwif_trm290() in trm290 host driver) and remove ide_release[_iomio]_dma(). v2: * Fixup trm290 host driver. v3: * Because of scc_pata host driver changes we need to call pci_request_selected_regions() also in setup_mmio_scc(). Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-26 22:25:19 +02:00
Bartlomiej Zolnierkiewicz	d083c03f25	ide: remove ide_hwif_release_regions() All host drivers using ide_unregister()/module_exit() have been fixed to manage resources themselves so this function can be removed now. There should be no functional changes caused by this patch. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-26 22:25:17 +02:00
Bartlomiej Zolnierkiewicz	0bfeee7d41	ide: use ide_legacy_device_add() for qd65xx (take 2) * Add 'unsigned long config' argument to ide_legacy_device_add() for setting hwif->config_data. * Use ide_find_port_slot() instead of ide_find_port() in ide_legacy_device_add(). * Handle IDE_HFLAG_QD_2ND_PORT and IDE_HFLAG_SINGLE host flags in ide_legacy_device_add(). * Convert qd65xx host driver to use ide_legacy_device_add(). v2: * Update ali14xx, dtc2278, ht6560b and umc8672 host drivers. There should be no functional changes caused by this patch. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-26 22:25:16 +02:00
Bartlomiej Zolnierkiewicz	3b36f66b81	ide: add ide_legacy_device_add() helper Add ide_legacy_device_add() helper for use by legacy VLB host drivers (+ convert them to use it). There should be no functional changes caused by this patch. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-26 22:25:16 +02:00
Bartlomiej Zolnierkiewicz	e53cd458d5	ide: remove ->noprobe field from ide_hwif_t Update IDE PMAC host driver to use drive->noprobe instead of hwif->noprobe and remove hwif->noprobe completely (it is always set to zero now). There should be no functional changes caused by this patch. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-26 22:25:16 +02:00
Bartlomiej Zolnierkiewicz	ac95beedf8	ide: add struct ide_port_ops (take 2) * Move hooks for port/host specific methods from ide_hwif_t to 'struct ide_port_ops'. * Add 'const struct ide_port_ops port_ops' to 'struct ide_port_info' and ide_hwif_t. Update host drivers and core code accordingly. While at it: * Rename ata66_() cable detect functions to _cable_detect() to match the standard naming. (Suggested by Sergei Shtylyov) v2: * Fix build for bast-ide. (Noticed by Andrew Morton) Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-26 22:25:14 +02:00
Huang, Ying	8b664aa66e	x86, boot: add linked list of struct setup_data This patch adds a field of 64-bit physical pointer to NULL terminated single linked list of struct setup_data to real-mode kernel header. This is used as a more extensible boot parameters passing mechanism. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-26 21:34:42 +02:00
Huang, Ying	50eae2a7c9	x86, boot: add free_early to early reservation machanism Add free_early to early reservation mechanism - this way early bootup failure paths can stop wasting memory. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-26 21:34:42 +02:00
Joe Perches	f19dcf4a61	x86: include/asm-x86/pgalloc.h/bitops.h: checkpatch cleanups - formatting only Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-26 19:21:17 +02:00
Alexander van Heukelum	d66462f531	x86: finalize bitops unification include/asm-x86/bitops_32.h and include/asm-x86/bitops_64.h are now almost identical. The 64-bit version sets ARCH_HAS_FAST_MULTIPLIER and has an extra inline function set_bit_string. The define currently has no influence on the generated code, but it can be argued that setting it on i386 is the right thing to do anyhow. The addition of the extra inline function on i386 does not hurt either. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-26 19:21:17 +02:00
Alexander van Heukelum	5245698f66	x86, UML: remove x86-specific implementations of find_first_bit x86 has been switched to the generic versions of find_first_bit and find_first_zero_bit, but the original versions were retained. This patch just removes the now unused x86-specific versions. also update UML. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-26 19:21:17 +02:00
Alexander van Heukelum	3a48305028	x86: optimize find_first_bit for small bitmaps Avoid a call to find_first_bit if the bitmap size is know at compile time and small enough to fit in a single long integer. Modeled after an optimization in the original x86_64-specific code. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-26 19:21:17 +02:00
Alexander van Heukelum	2aba6925fd	x86: switch 64-bit to generic find_first_bit Switch x86_64 to generic find_first_bit. The x86_64-specific implementation is not removed. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-26 19:21:16 +02:00
Alexander van Heukelum	77b9bd9c49	x86: generic versions of find_first_(zero_)bit, convert i386 Generic versions of __find_first_bit and __find_first_zero_bit are introduced as simplified versions of __find_next_bit and __find_next_zero_bit. Their compilation and use are guarded by a new config variable GENERIC_FIND_FIRST_BIT. The generic versions of find_first_bit and find_first_zero_bit are implemented in terms of the newly introduced __find_first_bit and __find_first_zero_bit. This patch does not remove the i386-specific implementation, but it does switch i386 to use the generic functions by setting GENERIC_FIND_FIRST_BIT=y for X86_32. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-26 19:21:16 +02:00
Alexander van Heukelum	d57594c203	bitops: use __fls for fls64 on 64-bit archs Use __fls for fls64 on 64-bit archs. The implementation for 64-bit archs is moved from x86_64 to asm-generic. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-26 19:21:16 +02:00
Alexander van Heukelum	56a6b1eb7b	generic: implement __fls on all 64-bit archs Implement __fls on all 64-bit archs: alpha has an implementation of fls64. Added __fls(x) = fls64(x) - 1. ia64 has fls, but not __fls. Added __fls based on code of fls. mips and powerpc have __ilog2, which is the same as __fls. Added __fls = __ilog2. parisc, s390, sh and sparc64: Include generic __fls. x86_64 already has __fls. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-26 19:21:16 +02:00
Alexander van Heukelum	7d9dff22e8	generic: introduce a generic __fls implementation Add a generic __fls implementation in the same spirit as the generic __ffs one. It finds the last (most significant) set bit in the given long value. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-26 19:21:16 +02:00
Alexander van Heukelum	12d9c8420b	x86: merge the simple bitops and move them to bitops.h Some of those can be written in such a way that the same inline assembly can be used to generate both 32 bit and 64 bit code. For ffs and fls, x86_64 unconditionally used the cmov instruction and i386 unconditionally used a conditional branch over a mov instruction. In the current patch I chose to select the version based on the availability of the cmov instruction instead. A small detail here is that x86_64 did not previously set CONFIG_X86_CMOV=y. Improved comments for ffs, ffz, fls and variations. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-26 19:21:16 +02:00
Alexander van Heukelum	64970b68d2	x86, generic: optimize find_next_(zero_)bit for small constant-size bitmaps This moves an optimization for searching constant-sized small bitmaps form x86_64-specific to generic code. On an i386 defconfig (the x86#testing one), the size of vmlinux hardly changes with this applied. I have observed only four places where this optimization avoids a call into find_next_bit: In the functions return_unused_surplus_pages, alloc_fresh_huge_page, and adjust_pool_surplus, this patch avoids a call for a 1-bit bitmap. In __next_cpu a call is avoided for a 32-bit bitmap. That's it. On x86_64, 52 locations are optimized with a minimal increase in code size: Current #testing defconfig: 146 x bsf, 27 x find_next_bit text data bss dec hex filename 5392637 846592 724424 6963653 6a41c5 vmlinux After removing the x86_64 specific optimization for find_next_bit: 94 x bsf, 79 x find_next_bit text data bss dec hex filename 5392358 846592 724424 6963374 6a40ae vmlinux After this patch (making the optimization generic): 146 x bsf, 27 x find_next_bit text data bss dec hex filename 5392396 846592 724424 6963412 6a40d4 vmlinux [ tglx@linutronix.de: build fixes ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-26 19:21:16 +02:00
Alexander van Heukelum	6fd92b63d0	x86: change x86 to use generic find_next_bit The versions with inline assembly are in fact slower on the machines I tested them on (in userspace) (Athlon XP 2800+, p4-like Xeon 2.8GHz, AMD Opteron 270). The i386-version needed a fix similar to `06024f21` to avoid crashing the benchmark. Benchmark using: gcc -fomit-frame-pointer -Os. For each bitmap size 1...512, for each possible bitmap with one bit set, for each possible offset: find the position of the first bit starting at offset. If you follow ;). Times include setup of the bitmap and checking of the results. Athlon Xeon Opteron 32/64bit x86-specific: 0m3.692s 0m2.820s 0m3.196s / 0m2.480s generic: 0m2.622s 0m1.662s 0m2.100s / 0m1.572s If the bitmap size is not a multiple of BITS_PER_LONG, and no set (cleared) bit is found, find_next_bit (find_next_zero_bit) returns a value outside of the range [0, size]. The generic version always returns exactly size. The generic version also uses unsigned long everywhere, while the x86 versions use a mishmash of int, unsigned (int), long and unsigned long. Using the generic version does give a slightly bigger kernel, though. defconfig: text data bss dec hex filename x86-specific: `4738555` 481232 626688 5846475 5935cb vmlinux (32 bit) generic: 4738621 481232 626688 5846541 59360d vmlinux (32 bit) x86-specific: 5392395 846568 724424 6963387 6a40bb vmlinux (64 bit) generic: 5392458 846568 724424 6963450 6a40fa vmlinux (64 bit) Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-26 19:21:16 +02:00
Linus Torvalds	d485cb9aa2	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-optimized-inlining * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-optimized-inlining: generic: make optimized inlining arch-opt-in x86: add optimized inlining	2008-04-26 09:48:52 -07:00
Linus Torvalds	1292ebb82c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (61 commits) ide: sanitize handling of IDE_HFLAG_NO_SET_MODE host flag sis5513: fail early for unsupported chipsets it821x: fix kzalloc() failure handling qd65xx: use IDE_HFLAG_SINGLE host flag qd65xx: always use ->selectproc method ide-cd: put proc-related functions together under single ifdef ide-cd: Replace __FUNCTION__ with __func__ IDE: Coding Style fixes to drivers/ide/ide-cd.c IDE: Coding Style fixes to drivers/ide/pci/cy82c693.c IDE: Coding Style fixes to drivers/ide/pci/it8213.c IDE: Coding Style fixes to drivers/ide/ide-floppy.c IDE: Coding Style fixes to drivers/ide/legacy/ali14xx.c IDE: Coding Style fixes to drivers/ide/legacy/hd.c IDE: Coding Style fixes to drivers/ide/pci/cmd640.c IDE: Coding Style fixes to drivers/ide/pci/opti621.c IDE: Coding Style fixes to drivers/ide/ide-pnp.c IDE: Coding Style fixes to drivers/ide/ide-proc.c IDE: Coding Style fixes to drivers/ide/legacy/ide-4drives.c IDE: Coding Style fixes to drivers/ide/legacy/umc8672.c IDE: Coding Style fixes to drivers/ide/pci/generic.c ...	2008-04-26 09:48:00 -07:00
Ingo Molnar	765c68bd54	generic: make optimized inlining arch-opt-in Stephen Rothwell reported that linux-next did not build on powerpc64. make optimized inlining dependent on architecture opt-in. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-26 17:44:55 +02:00
Ingo Molnar	60a3cdd063	x86: add optimized inlining add CONFIG_OPTIMIZE_INLINING=y. allow gcc to optimize the kernel image's size by uninlining functions that have been marked 'inline'. Previously gcc was forced by Linux to always-inline these functions via a gcc attribute: #define inline inline __attribute__((always_inline)) Especially when the user has already selected CONFIG_OPTIMIZE_FOR_SIZE=y this can make a huge difference in kernel image size (using a standard Fedora .config): text data bss dec hex filename 5613924 562708 3854336 10030968 990f78 vmlinux.before 5486689 562708 3854336 9903733 971e75 vmlinux.after that's a 2.3% text size reduction (!). Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-26 17:44:55 +02:00
Bartlomiej Zolnierkiewicz	05230e23cf	ide: remove hwif->straight8 flag All host drivers now either set hwif->mmio or reserve continuous I/O resources so remove no longer needed hwif->straight8 flag and never reached code for 'hwif->straight8 == 0' case. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-26 17:36:39 +02:00
Bartlomiej Zolnierkiewicz	951784b667	ide: remove IDE_HFLAG_CY82C693 host flag Sergei suggested that it shouldn't be necessary + it had no effect anyway since ide_id_dma_bug() is called earlier in ide_tune_dma(). Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-26 17:36:38 +02:00
Adrian Bunk	b3a37f1284	remove include/linux/hdsmart.h include/linux/hdsmart.h is not used by the kernel and should therefore be removed. Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: "Robert P. J. Day" <rpjday@crashcourse.ca>, Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-26 17:36:37 +02:00
Bartlomiej Zolnierkiewicz	e277f91fef	ide: use ide_find_port() in legacy VLB host drivers (take 2) * Add IDE_HFLAG_QD_2ND_PORT host flag to indicate the need of skipping first ide_hwifs[] slot for the second port of QD65xx controller. * Handle this new host flag in ide_find_port_slot(). * Convert legacy VLB host drivers to use ide_find_port(). While at it: * Fix couple of printk()-s in qd65xx host driver to not use hwif->name. v2: * Fix qd65xx. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-26 17:36:36 +02:00
Bartlomiej Zolnierkiewicz	fe80b937c9	ide: merge ide_match_hwif() and ide_find_port() * Change ide_match_hwif() argument from 'u8 bootable' to 'struct ide_port_info d'. Move ide_match_hwif() to ide-probe.c from setup-pci.c and rename it to ide_find_port_slot(). Update some comments while at it. * ide_find_port() can be now just a wrapper for ide_find_port_slot(). There should be no functional changes caused by this patch. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-26 17:36:36 +02:00
Bartlomiej Zolnierkiewicz	078fdf789c	ide: remove PIO "downgrade" quirk No need for it nowadays so remove quirk code from ide_get_best_pio_mode() and IDE_HFLAG_PIO_DOWNGRADE host flag. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-26 17:36:36 +02:00
Bartlomiej Zolnierkiewicz	5e71d9c5a5	ide: IDE_HFLAG_BOOTABLE -> IDE_HFLAG_NON_BOOTABLE "bootable" should be the default behavior so replace IDE_HFLAG_BOOTABLE host flag with IDE_HFLAG_NON_BOOTABLE. There should be no functional changes caused by this patch. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-26 17:36:35 +02:00
Bartlomiej Zolnierkiewicz	59bff5ba55	ide: cleanup ide_find_port() Remove no longer needed matching against I/O base and 'base' argument. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-04-26 17:36:32 +02:00
Jacek Luczak	b6dbf334b0	x86: section mismatch fixes, #2 This patch fixes section mismatch warnings in smpboot_setup_io_apic(). WARNING: arch/x86/kernel/built-in.o(.text+0x11781): Section mismatch in reference from the function smpboot_setup_io_apic() to the function .init.text:setup_IO_APIC() The function smpboot_setup_io_apic() references the function __init setup_IO_APIC(). This is often because smpboot_setup_io_apic lacks a __init annotation or the annotation of setup_IO_APIC is wrong. Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-26 17:35:48 +02:00
Jacek Luczak	6d4ce04326	x86: pgtable_32.h - prototype and section mismatch fixes This patch adds extern to native_pagetable_setup_[start,done]() protypes and fixes following section mismatch warning: WARNING: arch/x86/mm/built-in.o(.text+0xf2): Section mismatch in reference from the function paravirt_pagetable_setup_start() paravirt_pagetable_setup_[start,done]() is used by __init pagetable_init(). Annotate both functions with __init. Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-26 17:35:48 +02:00
Akinobu Mita	ae5830a6f8	x86: remove duplicate get_bios_ebda() from rio.h get_bios_ebda() exists in asm/rio.h and asm/bios_ebda.h. This patch removes the one in asm/rio.h. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-26 17:35:47 +02:00
Akinobu Mita	2c0903f2ab	x86: get_bios_ebda() requires asm/io.h include <asm/io.h> for phys_to_virt() Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-26 17:35:47 +02:00
Akinobu Mita	a1a33fa315	x86: use bitmap library for pin_programmed Use bitmap library for pin_programmed rather than reinvent bitmaps. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-26 17:35:47 +02:00
Roland McGrath	562b80baff	x86_64 ia32 ptrace: convert to compat_arch_ptrace Now that there are no more special cases in sys32_ptrace, we can convert to using the generic compat_sys_ptrace entry point. The sys32_ptrace function gets simpler and becomes compat_arch_ptrace. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-26 17:35:47 +02:00
Dmitri Vorobiev	f7f17a67c5	x86: remove NexGen support It is claimed that NexGen CPUs were never shipped: http://lkml.org/lkml/2008/4/20/179 Also, the kernel support for these chips has been broken for a long time, the code intended to support NexGen thereby being essentially dead. As an outcome of the discussion that can be found using the URL above, this patch removes the NexGen support altogether. The changes in this patch survived a defconfig build for i386, a couple of successful randconfig builds, as well as a runtime test, which consisted in booting a 32-bit x86 box up to the shell prompt. Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-26 17:35:47 +02:00
Adrian Bunk	b11caa7c70	fix asm-x86/{posix_types,unistd}.h Jeff Chua reported that: Commit `e40c0fe6b0` (x86: cleanup duplicate includes) turned the userspace asm-x86/posix_types.h and asm-x86/unistd.h headers into empty files. This patch reverts these bogus changes. Reported-by: Jeff Chua <jeff.chua.linux@gmail.com> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-26 17:35:46 +02:00
Linus Torvalds	bc84e0a160	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: [PATCH] sanitize locate_fd() [PATCH] sanitize unshare_files/reset_files_struct [PATCH] sanitize handling of shared descriptor tables in failing execve() [PATCH] close race in unshare_files() [PATCH] restore sane ->umount_begin() API cifs: timeout dfs automounts +little fix.	2008-04-25 19:05:55 -07:00
Yevgeny Petrilin	ed4d3c1061	mlx4_core: Add helper to move QP to ready-to-send Avoid duplicating code in ethernet and FC modules. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-04-25 14:52:32 -07:00
Yevgeny Petrilin	38ae6a5354	mlx4_core: Add HW queues allocation helpers Wrap doorbell, buffer and MTT allocation in helper functions for ethernet and FC modules to use. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-04-25 14:27:08 -07:00
Linus Torvalds	b9fa38f75e	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (49 commits) [POWERPC] Add zImage.iseries to arch/powerpc/boot/.gitignore [POWERPC] bootwrapper: fix build error on virtex405-head.S [POWERPC] 4xx: Fix 460GT support to not enable FPU [POWERPC] 4xx: Add NOR FLASH entries to Canyonlands and Glacier dts [POWERPC] Xilinx: of_serial support for Xilinx uart 16550. [POWERPC] Xilinx: boot support for Xilinx uart 16550. [POWERPC] celleb: Add support for PCI Express [POWERPC] celleb: Move miscellaneous files for Beat [POWERPC] celleb: Move a file for SPU on Beat [POWERPC] celleb: Move files for Beat mmu and iommu [POWERPC] celleb: Move files for Beat hvcall interfaces [POWERPC] celleb: Move the SCC related code for celleb [POWERPC] celleb: Move the files for celleb base support [POWERPC] celleb: Consolidate io-workarounds code [POWERPC] cell: Generalize io-workarounds code [POWERPC] Add CONFIG_PPC_PSERIES_DEBUG to enable debugging for platforms/pseries [POWERPC] Convert from DBG() to pr_debug() in platforms/pseries/ [POWERPC] Register udbg console early on pseries LPAR [POWERPC] Mark udbg console as CON_ANYTIME, ie. callable early in boot [POWERPC] Set udbg_console index to 0 ...	2008-04-25 12:52:16 -07:00
Linus Torvalds	eb855fd8e7	Merge branch 'for-linus' of git://git.o-hand.com/linux-rpurdie-leds * 'for-linus' of git://git.o-hand.com/linux-rpurdie-leds: leds: Add default-on trigger leds: Document the context brightness_set needs leds: Add new driver for the LEDs on the Freecom FSG-3 leds: Add support to leds with readable status leds: enable support for blink_set() platform hook in leds-gpio leds: Cleanup various whitespace and code style issues leds: disable triggers on brightness set leds: Add mail LED support for "Clevo D400P"	2008-04-25 12:48:44 -07:00
Linus Torvalds	bf16ae2509	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-pat * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-pat: generic: add ioremap_wc() interface wrapper /dev/mem: make promisc the default pat: cleanups x86: PAT use reserve free memtype in mmap of /dev/mem x86: PAT phys_mem_access_prot_allowed for dev/mem mmap x86: PAT avoid aliasing in /dev/mem read/write devmem: add range_is_allowed() check to mmap of /dev/mem x86: introduce /dev/mem restrictions with a config option	2008-04-25 12:48:08 -07:00
Linus Torvalds	0b79dada97	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-fixes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-fixes: sched: fix share (re)distribution softlockup: fix NOHZ wakeup seqlock: livelock fix	2008-04-25 12:47:56 -07:00
Linus Torvalds	50be4917ee	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: pata_bf54x: decrease count first. sata_mv: re-enable hotplug, update TODO list sata_mv: leave SError bits untouched in mv_err_intr sata_mv: more interrupt handling rework sata_mv: tidy host controller interrupt handling sata_mv: simplify request/response queue handling sata_mv: simplify freeze/thaw bit-shift calculations sata_mv mask all interrupt coalescing bits sata_mv more cosmetics ata_piix: add Asus Eee 701 controller to short cable list libata-eh set tf flags in NCQ EH result_tf make sata_set_spd_needed() static make sata_print_link_status() static libata-acpi.c: remove unneeded #if's sata_nv: make hardreset return -EAGAIN on success ahci: retry enabling AHCI a few times before spitting out WARN_ON() libata: make WARN_ON conditions in ata_sff_hsm_move() more strict ATA/IDE: fix platform driver hotplug/coldplug sata_sis: SCR accessors return -EINVAL when requested SCR isn't available libata: functions with definition should not be extern	2008-04-25 12:41:55 -07:00
Linus Torvalds	37b05b1798	Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: (120 commits) usb: don't update devnum for wusb devices wusb: make ep0_reinit available for modules wusb: devices dont use a set address wusb: teach choose_address() about wireless devices wusb: add link wusb-usb device wusb: add authenticathed bit to usb_dev USB: remove unnecessary type casting of urb->context usb serial: more fixes and groundwork for tty changes USB: replace remaining __FUNCTION__ occurrences USB: usbfs: export the URB_NO_INTERRUPT flag to userspace USB: fix compile problems in ehci-hcd USB: ehci: qh_completions cleanup and bugfix USB: cdc-acm: signedness fix USB: add documentation about callbacks USB: don't explicitly reenable root-hub status interrupts USB: OHCI: turn off RD when remote wakeup is disabled USB: HCDs use the do_remote_wakeup flag USB: g_file_storage: ignore bulk-out data after invalid CBW USB: serial: remove endpoints setting checks from core and header USB: serial: remove unneeded number endpoints settings ...	2008-04-25 12:40:57 -07:00
Linus Torvalds	ce1d5b23a8	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (40 commits) Input: wacom - add support for Cintiq 20WSX Input: ucb1400_ts - IRQ probe fix Input: at32psif - update MODULE_AUTHOR with new email Input: mac_hid - add lockdep annotation to emumousebtn Input: i8042 - fix incorrect usage of strncpy and strncat Input: bf54x-keys - add infrastructure for keypad wakeups Input: add MODULE_ALIAS() to hotpluggable platform modules Input: drivers/char/keyboard.c - use time_after Input: fix ordering in joystick Makefile Input: wm97xx-core - support use as a wakeup source Input: wm97xx-core - use IRQF_SAMPLE_RANDOM Input: wm97xx-core - only schedule interrupt handler if not already scheduled Input: add Zhen Hua driver Input: aiptek - add support for Genius G-PEN 560 tablet Input: wacom - implement suspend and autosuspend Input: xpad - set proper buffer length for outgoing requests Input: omap-keypad - fix build warning Input: gpio_keys - irq handling cleanup Input: add PS/2 serio driver for AVR32 devices Input: put ledstate in the keyboard notifier ...	2008-04-25 12:38:14 -07:00
Linus Torvalds	6f97b220f4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm * git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm: (24 commits) dm crypt: add documentation dm: remove md argument from specific_minor dm table: remove unused dm_create_error_table dm table: drop void suspend_targets return dm: unplug queues in threads dm raid1: use timer dm: move include files dm kcopyd: rename dm: expose macros dm kcopyd: remove redundant client counting dm kcopyd: private mempool dm kcopyd: per device dm log: make module use tracking internal dm log: move register functions dm log: clean interface dm kcopyd: clean interface dm io: clean interface dm io: rename error to error_bits dm snapshot: store pointer to target instance dm log: move dirty region log code into separate module ...	2008-04-25 12:33:49 -07:00
Linus Torvalds	4b7227ca32	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-xen-next * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-xen-next: (52 commits) xen: add balloon driver xen: allow compilation with non-flat memory xen: fold xen_sysexit into xen_iret xen: allow set_pte_at on init_mm to be lockless xen: disable preemption during tlb flush xen pvfb: Para-virtual framebuffer, keyboard and pointer driver xen: Add compatibility aliases for frontend drivers xen: Module autoprobing support for frontend drivers xen blkfront: Delay wait for block devices until after the disk is added xen/blkfront: use bdget_disk xen: Make xen-blkfront write its protocol ABI to xenstore xen: import arch generic part of xencomm xen: make grant table arch portable xen: replace callers of alloc_vm_area()/free_vm_area() with xen_ prefixed one xen: make include/xen/page.h portable moving those definitions under asm dir xen: add resend_irq_on_evtchn() definition into events.c Xen: make events.c portable for ia64/xen support xen: move events.c to drivers/xen for IA64/Xen support xen: move features.c from arch/x86/xen/features.c to drivers/xen xen: add missing definitions in include/xen/interface/vcpu.h which ia64/xen needs ...	2008-04-25 12:32:10 -07:00
Linus Torvalds	2e561c7b7e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (48 commits) net: Fix wrong interpretation of some copy_to_user() results. xfrm: alg_key_len & alg_icv_len should be unsigned [netdrvr] tehuti: move ioctl perm check closer to function start ipv6: Fix typo in net/ipv6/Kconfig via-velocity: fix vlan receipt tg3: sparse cleanup forcedeth: realtek phy crossover detection ibm_newemac: Increase MDIO timeouts gianfar: Fix skb allocation strategy netxen: reduce stack usage of netxen_nic_flash_print smc911x: test after postfix decrement fails in smc911x_{reset,drop_pkt} net drivers: fix platform driver hotplug/coldplug forcedeth: new backoff implementation ehea: make things static phylib: Add support for board-level PHY fixups [netdrvr] atlx: code movement: move atl1 parameter parsing atlx: remove flash vendor parameter korina: misc cleanup korina: fix misplaced return statement WAN: Fix confusing insmod error code for C101 too. ...	2008-04-25 12:28:28 -07:00
Linus Torvalds	7e97b28309	Merge git://git.infradead.org/mtd-2.6 * git://git.infradead.org/mtd-2.6: (82 commits) [MTD] m25p80: Add Support for ATMEL AT25DF641 64-Megabit SPI Flash [MTD] m25p80: add FAST_READ access support to M25Pxx [MTD] [NAND] bf5xx_nand: Avoid crash if bfin_mac is installed. [MTD] [NAND] at91_nand: control NCE signal [MTD] [NAND] AT91 hardware ECC compile fix for at91sam9263 / at91sam9260 [MTD] [NAND] Hardware ECC controller on at91sam9263 / at91sam9260 [JFFS2] Introduce dbg_readinode2 log level, use it to shut read_dnode() up [JFFS2] Fix jffs2_reserve_space() when all blocks are pending erasure. [JFFS2] Add erase_checking_list to hold blocks being marked. UBI: add a message [JFFS2] Return values of jffs2_block_check_erase error paths [MTD] Clean up AR7 partition map support [MTD] [NOR] Fix Intel CFI driver for collie flash [JFFS2] Finally remove redundant ref->__totlen field. [JFFS2] Honour TEST_TOTLEN macro in debugging code. ref->__totlen is going! [JFFS2] Add paranoia debugging for superblock counts [JFFS2] Fix free space leak with in-band cleanmarkers [JFFS2] Self-sufficient #includes in jffs2_fs_i.h: include <linux/mutex.h> [MTD] [NAND] Verify probe by retrying to checking the results match [MTD] [NAND] S3C2410 Allow ECC disable to be specified by the board ...	2008-04-25 12:25:48 -07:00
Linus Torvalds	dd0e101d63	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-x86-fixes4 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-x86-fixes4: x86: harden kernel code patching x86: clean up text_poke() x86: fix text_poke() x86: remove set_fixmap() warning x86: make __set_fixmap() non-init x86: make clear_fixmap() available on 64-bit as well	2008-04-25 12:03:36 -07:00
Ingo Molnar	3ec96783e3	x86: make clear_fixmap() available on 64-bit as well Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-25 19:54:07 +02:00
J. Bruce Fields	e36cd4a287	nfsd: don't allow setting ctime over v4 Presumably this is left over from earlier drafts of v4, which listed TIME_METADATA as writeable. It's read-only in rfc 3530, and shouldn't be modifiable anyway. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-04-25 13:00:11 -04:00
J. Bruce Fields	1a747ee0cc	locks: don't call ->copy_lock methods on return of conflicting locks The file_lock structure is used both as a heavy-weight representation of an active lock, with pointers to reference-counted structures, etc., and as a simple container for parameters that describe a file lock. The conflicting lock returned from __posix_lock_file is an example of the latter; so don't call the filesystem or lock manager callbacks when copying to it. This also saves the need for an unnecessary locks_init_lock in the nfsv4 server. Thanks to Trond for pointing out the error. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-04-25 13:00:11 -04:00
Wendy Cheng	17efa372cf	lockd: unlock lockd locks held for a certain filesystem Add /proc/fs/nfsd/unlock_filesystem, which allows e.g.: shell> echo /mnt/sfs1 > /proc/fs/nfsd/unlock_filesystem so that a filesystem can be unmounted before allowing a peer nfsd to take over nfs service for the filesystem. Signed-off-by: S. Wendy Cheng <wcheng@redhat.com> Cc: Lon Hohberger <lhh@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> fs/lockd/svcsubs.c \| 66 +++++++++++++++++++++++++++++++++++++++----- fs/nfsd/nfsctl.c \| 65 +++++++++++++++++++++++++++++++++++++++++++ include/linux/lockd/lockd.h \| 7 ++++ 3 files changed, 131 insertions(+), 7 deletions(-)	2008-04-25 13:00:11 -04:00
Wendy Cheng	4373ea84c8	lockd: unlock lockd locks associated with a given server ip For high-availability NFS service, we generally need to be able to drop file locks held on the exported filesystem before moving clients to a new server. Currently the only way to do that is by shutting down lockd entirely, which is often undesireable (for example, if you want to continue exporting other filesystems). This patch allows the administrator to release all locks held by clients accessing the client through a given server ip address, by echoing that address to a new file, /proc/fs/nfsd/unlock_ip, as in: shell> echo 10.1.1.2 > /proc/fs/nfsd/unlock_ip The expected sequence of events can be: 1. Tear down the IP address 2. Unexport the path 3. Write IP to /proc/fs/nfsd/unlock_ip to unlock files 4. Signal peer to begin take-over. For now we only support IPv4 addresses and NFSv2/v3 (NFSv4 locks are not affected). Also, if unmounting the filesystem is required, we assume at step 3 that clients using the given server ip are the only clients holding locks on the given filesystem; otherwise, an additional patch is required to allow revoking all locks held by lockd on a given filesystem. Signed-off-by: S. Wendy Cheng <wcheng@redhat.com> Cc: Lon Hohberger <lhh@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> fs/lockd/svcsubs.c \| 66 +++++++++++++++++++++++++++++++++++++++----- fs/nfsd/nfsctl.c \| 65 +++++++++++++++++++++++++++++++++++++++++++ include/linux/lockd/lockd.h \| 7 ++++ 3 files changed, 131 insertions(+), 7 deletions(-)	2008-04-25 13:00:10 -04:00
Al Viro	3b1253880b	[PATCH] sanitize unshare_files/reset_files_struct * let unshare_files() give caller the displaced files_struct * don't bother with grabbing reference only to drop it in the caller if it hadn't been shared in the first place * in that form unshare_files() is trivially implemented via unshare_fd(), so we eliminate the duplicate logics in fork.c * reset_files_struct() is not just only called for current; it will break the system if somebody ever calls it for anything else (we can't modify ->files of somebody else). Lose the task_struct * argument. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-04-25 09:23:59 -04:00
Al Viro	42faad9965	[PATCH] restore sane ->umount_begin() API Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-04-25 09:23:25 -04:00
Adrian Bunk	4fdfe401e9	dm table: remove unused dm_create_error_table dm_create_error_table() was added in kernel 2.6.18 and never used... Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-04-25 13:27:00 +01:00
Alasdair G Kergon	a765e20eeb	dm: move include files Publish the dm-io, dm-log and dm-kcopyd headers in include/linux. Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-04-25 13:26:55 +01:00
Alasdair G Kergon	0da336e5fa	dm: expose macros Make dm.h macros and inlines available in include/linux/device-mapper.h Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-04-25 13:26:53 +01:00
Heinz Mauelshagen	416cd17b19	dm log: clean interface Clean up the dm-log interface to prepare for publishing it in include/linux. Signed-off-by: Heinz Mauelshagen <hjm@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-04-25 13:26:46 +01:00
David S. Miller	cc93d7d77d	Merge branch 'upstream-davem' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6	2008-04-25 00:31:07 -07:00
Eric Dumazet	461e6c856f	xfrm: alg_key_len & alg_icv_len should be unsigned In commit `ba749ae98d` ([XFRM]: alg_key_len should be unsigned to avoid integer divides <http://git2.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commitdiff;h=ba749ae98d5aa9d2ce9a7facde0deed454f92230>) alg_key_len field of struct xfrm_algo was converted to unsigned int to avoid integer divides. Then Herbert in commit `1a6509d991` ([IPSEC]: Add support for combined mode algorithms) added a new structure xfrm_algo_aead, that resurrected a signed int for alg_key_len and re-introduce integer divides. This patch avoids these divides and saves 64 bytes of text on i386. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-25 00:29:34 -07:00
Andy Fleming	f62220d3a9	phylib: Add support for board-level PHY fixups Sometimes the specific interaction between the platform and the PHY requires special handling. For instance, to change where the PHY's clock input is, or to add a delay to account for latency issues in the data path. We add a mechanism for registering a callback with the PHY Lib to be called on matching PHYs when they are brought up, or reset. Signed-off-by: Andy Fleming <afleming@freescale.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-04-25 02:08:52 -04:00
Adrian Bunk	6bdb4fc9f9	make sata_print_link_status() static sata_print_link_status() can now become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-04-25 00:46:09 -04:00
Inaky Perez-Gonzalez	b1d8dfb0e5	wusb: add link wusb-usb device We need to tie the WUSB and USB devices; the USB stack doesn't need to know the details about the WUSB device, but needs to have a link to it. This is needed so that the notify call back for Remove Device can tie both and undo the device setup (sysfs files). We connect the devices together at the Add Device notifier callback (the wusb_dev references the usb_dev and stores it, the usb_dev references the wusb_dev and stores it); then we do create the WUSB sysfs files at the usb_dev sysfs directory. At Remove Device, we undo that (thus we need the usb_dev reference). Cross reference to functions in the WUSB substack: wusb_dev_{add,rm}_ncb(). Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-04-24 21:16:57 -07:00
Inaky Perez-Gonzalez	3b52f128ae	wusb: add authenticathed bit to usb_dev This bit indicates the system that the WUSB device has been crypto authenticated and thus can operate as normal. Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-04-24 21:16:55 -07:00
Alan Stern	14722ef4ac	USB: usbfs: export the URB_NO_INTERRUPT flag to userspace This patch (as1079) cleans up the way URB_* flags are exported in usbfs. The URB_NO_INTERRUPT flag is now exported (this is the only behavioral change). USBDEVFS_URB_* macros are added for URB_NO_FSBR, URB_ZERO_PACKET, and URB_NO_INTERRUPT, making explicit the fact that the kernel accepts them. The flag matching takes into account that the URB_* values may change as the kernel evolves, whereas the USBDEVFS_URB_* values must remain fixed since they are a user API. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-04-24 21:16:54 -07:00
Greg Kroah-Hartman	9aebfd6bda	USB: serial: remove endpoints setting checks from core and header Remove the unused check for num_interrupt and friends as well as remove them from the header file because no usb-serial drivers no longer reference them. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-04-24 21:16:52 -07:00
Oliver Neukum	7ef4f0600d	USB: update comments about usb driver's header Comments here are so outdated that they are plain wrong. We cannot expect people to write correct drivers if the headers have incorrect comments. Signed-off-by: Oliver Neukum <oneukum@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-04-24 21:16:50 -07:00
Oliver Neukum	eda769593b	USB: add extension of anchor API, usb_unlink_anchored_urbs This adds the ability to trigger asynchronous unlinks of anchored URBs. This is needed for error handling in the comntext of completion handlers, which cannot sleep. Signed-off-by: Oliver Neukum <oneukum@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-04-24 21:16:50 -07:00
matthieu castet	d277064e7e	USB: mass storage: emulation of sat scsi_pass_thru with ATACB I have got a cypress usb-ide bridge and I would like to tune or monitor my disk with tools like hdparm, hddtemp or smartctl. My controller support a way to send raw ATA command to the disk with something call atacb (see http://download.cypress.com.edgesuite.net/design_resources/datasheets/contents/cy7c68300c_8.pdf). Atacb support can be added for each application, but there is some disadvantages : - all application need to be patched - A race is possible if there other accesses, because the emulation can be split in 2 atacb scsi transactions. One for sending the command, one for reading the register (if ck_cond is set). I have implemented the emulation in usb-storage with a special proto_handler, and an unsual entry. Signed-off-by: Matthieu CASTET <castet.matthieu@free.fr> Signed-off-by: Matthew Dharm <mdharm-usb@one-eyed-alien.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-04-24 21:16:42 -07:00
Robert P. J. Day	dda43a0e03	USB: Standardize inclusion protection and add where missing. For the header files in include/linux/usb, add missing multiple inclusion protection and standardize what's already there. The apparent standards: * macro name of __LINUX_USB_headerfile_H * inclusion protection placed after leading comment block * macro name added as a comment on the final #endif * any obvious trivial whitespace cleanup associated with the above Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-04-24 21:16:42 -07:00
Tilman Schmidt	f66396b55d	USB: usb.h: reduce syslog clutter [v3] The the err() / info() / warn() macros in usb.h inserted __FILE__ at the beginning of the message, which expands to the complete pathname of the source file within the kernel tree, frequently taking up half of an 80 character screen line before the actual message even begins. Use the module name instead. Signed-off-by: Tilman Schmidt <tilman@imap.cc> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-04-24 21:16:42 -07:00
David Brownell	dbe0dbb7df	USB: defines for USB "Link Power Management" (LPM) ECN There's a new PM-related change notice for the USB 2.0 specification called "Link Power Management" (LPM). It defines a new "L1 Suspend" state which resembles the current (L2) suspend state, except that it can be entered and exited much more quickly. It should thus be more useful for runtime PM, even though it doesn't mandate reduced power draw from VBUS. This patch provides the relevant #defines for usbcore. Actually implementing these mechanisms requires host silicon that can generate new USB packets, plus hubs handling some new requests and peripherals which understand the new packets. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-04-24 21:16:34 -07:00
Randy Dunlap	f476fbaba7	USB: convert usb.h struct usb_device to kernel-doc Convert struct usb_device to use kernel-doc notation. Please especially check the @filelist and @usb_classdev descriptions. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-04-24 21:16:33 -07:00
Greg Kroah-Hartman	c27a4b717c	USB: add USB_DT_CS_RADIO_CONTROL define to ch9.h This is needed by the wireless usb developers, and is part of the USB spec. Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-04-24 21:16:33 -07:00
Alan Stern	feccc30d90	USB: remove CONFIG_USB_PERSIST setting This patch (as1047) removes the USB_PERSIST Kconfig option, enabling it permanently. It also prevents the power/persist attribute from being created for hub devices; there's no point in having it since USB-PERSIST is always turned on for hubs. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-04-24 21:16:32 -07:00
Richard Purdie	2e214e0fa2	leds: Document the context brightness_set needs Make sure there is no confusion about the contexts brightness_set can be called under by documenting it. Signed-off-by: Richard Purdie <rpurdie@rpsys.net>	2008-04-24 23:49:30 +01:00
Henrique de Moraes Holschuh	29d76dfa29	leds: Add support to leds with readable status Some led hardware allows drivers to query the led state, and this patch adds a hook to let the led class take advantage of that information when available. Without this functionality, when access to the led hardware is not exclusive (i.e. firmware or hardware might change its state behind the kernel's back), reality goes out of sync with the led class' idea of what the led is doing, which is annoying at best. Behaviour for drivers that do not or cannot read the led status is unchanged. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Signed-off-by: Richard Purdie <rpurdie@rpsys.net>	2008-04-24 23:37:42 +01:00
Herbert Valerio Riedel	ca3259b360	leds: enable support for blink_set() platform hook in leds-gpio Enhance leds-gpio to provide hardware-based led flashing by passing through the blink_set() call to a optionally set platform-specific function pointer. Signed-off-by: Herbert Valerio Riedel <hvr@gnu.org> Signed-off-by: Richard Purdie <rpurdie@rpsys.net>	2008-04-24 23:37:42 +01:00
Ingo Molnar	88a411c07b	seqlock: livelock fix Thomas Gleixner debugged a particularly ugly seqlock related livelock: do not process the seq-read section if we know it beforehand that the test at the end of the section will fail ... Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-25 00:25:08 +02:00
Jeremy Fitzhardinge	1775826cee	xen: add balloon driver The balloon driver allows memory to be dynamically added or removed from the domain, in order to allow host memory to be balanced between multiple domains. This patch introduces the Xen balloon driver, though it currently only allows a domain to be shrunk from its initial size (and re-grown back to that size). A later patch will add the ability to grow a domain beyond its initial size. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-24 23:57:33 +02:00
Markus Armbruster	4ee36dc08e	xen pvfb: Para-virtual framebuffer, keyboard and pointer driver This is a pair of Xen para-virtual frontend device drivers: drivers/video/xen-fbfront.c provides a framebuffer, and drivers/input/xen-kbdfront provides keyboard and mouse. The backends run in dom0 user space. The two drivers are not in two separate patches, because the intermediate step (one driver, not the other) is somewhat problematic: the backend in dom0 needs both drivers, and will refuse to complete device initialization unless they're both present. Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-24 23:57:33 +02:00
Christian Limpach	1d78d70556	xen blkfront: Delay wait for block devices until after the disk is added When the xen block frontend driver is built as a module the module load is only synchronous up to the point where the frontend and the backend become connected rather than when the disk is added. This means that there can be a race on boot between loading the module and loading the dm-* modules and doing the scan for LVM physical volumes (all in the initrd). In the failure case the disk is not present until after the scan for physical volumes is complete. Taken from: http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/11483a00c017 Signed-off-by: Christian Limpach <Christian.Limpach@xensource.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-24 23:57:33 +02:00
Markus Armbruster	3e334239d8	xen: Make xen-blkfront write its protocol ABI to xenstore Frontends are expected to write their protocol ABI to xenstore. Since the protocol ABI defaults to the backend's native ABI, things work fine without that as long as the frontend's native ABI is identical to the backend's native ABI. This is not the case for xen-blkfront running 32-on-64, because its ABI differs between 32 and 64 bit, and thus needs this fix. Based on http://xenbits.xensource.com/xen-unstable.hg?rev/c545932a18f3 and http://xenbits.xensource.com/xen-unstable.hg?rev/ffe52263b430 by Gerd Hoffmann <kraxel@suse.de> Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Jeremy Fitzhardinge <Jeremy.Fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-24 23:57:32 +02:00
Isaku Yamahata	b15993fcc1	xen: import arch generic part of xencomm On xen/ia64 and xen/powerpc hypercall arguments are passed by pseudo physical address (guest physical address) so that it's necessary to convert from virtual address into pseudo physical address. The frame work is called xencomm. Import arch generic part of xencomm. Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-24 23:57:32 +02:00
Isaku Yamahata	8d3d2106c1	xen: make grant table arch portable split out x86 specific part from grant-table.c and allow ia64/xen specific initialization. ia64/xen grant table is based on pseudo physical address (guest physical address) unlike x86/xen. On ia64 init_mm doesn't map identity straight mapped area. ia64/xen specific grant table initialization is necessary. Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-24 23:57:32 +02:00
Isaku Yamahata	5f0ababbf4	xen: replace callers of alloc_vm_area()/free_vm_area() with xen_ prefixed one Don't use alloc_vm_area()/free_vm_area() directly, instead define xen_alloc_vm_area()/xen_free_vm_area() and use them. alloc_vm_area()/free_vm_area() are used to allocate/free area which are for grant table mapping. Xen/x86 grant table is based on virtual address so that alloc_vm_area()/free_vm_area() are suitable. On the other hand Xen/ia64 (and Xen/powerpc) grant table is based on pseudo physical address (guest physical address) so that allocation should be done differently. The original version of xenified Linux/IA64 have its own allocate_vm_area()/free_vm_area() definitions which don't allocate vm area contradictory to those names. Now vanilla Linux already has its definitions so that it's impossible to have IA64 definitions of allocate_vm_area()/free_vm_area(). Instead introduce xen_allocate_vm_area()/xen_free_vm_area() and use them. Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-24 23:57:32 +02:00
Isaku Yamahata	20e71f2edb	xen: make include/xen/page.h portable moving those definitions under asm dir The definitions in include/asm/xen/page.h are arch specific. ia64/xen wants to define its own version. So move them to arch specific directory and keep include/xen/page.h in order not to break compilation. Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-24 23:57:32 +02:00
Isaku Yamahata	642e0c882c	xen: add resend_irq_on_evtchn() definition into events.c Define resend_irq_on_evtchn() which ia64/xen uses. Although it isn't used by current x86/xen code, it's arch generic so that put it into common code. Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-24 23:57:32 +02:00
Isaku Yamahata	e849c3e9e0	Xen: make events.c portable for ia64/xen support Remove x86 dependency in drivers/xen/events.c for ia64/xen support introducing include/asm/xen/events.h. Introduce xen_irqs_disabled() to hide regs->flags Introduce xen_do_IRQ() to hide regs->orig_ax. make enum ipi_vector definition arch specific. ia64/xen needs four vectors. Add one rmb() because on ia64 xchg() isn't barrier. Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-24 23:57:32 +02:00
Isaku Yamahata	e04d0d0767	xen: move events.c to drivers/xen for IA64/Xen support move arch/x86/xen/events.c undedr drivers/xen to share codes with x86 and ia64. And minor adjustment to compile. ia64/xen also uses events.c Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-24 23:57:32 +02:00
Isaku Yamahata	2724426924	xen: add missing definitions in include/xen/interface/vcpu.h which ia64/xen needs Add xen handles realted definitions for xen vcpu which ia64/xen needs. Pointer argumsnts for ia64/xen hypercall are passed in pseudo physical address (guest physical address) so that it is required to convert guest kernel virtual address into pseudo physical address. The xen guest handle represents such arguments. Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-24 23:57:32 +02:00
Isaku Yamahata	87e27cf628	xen: add missing definitions for xen grant table which ia64/xen needs Add xen handles realted definitions for grant table which ia64/xen needs. Pointer argumsnts for ia64/xen hypercall are passed in pseudo physical address (guest physical address) so that it is required to convert guest kernel virtual address into pseudo physical address right before issuing hypercall. The xen guest handle represents such arguments. Define necessary handles and helper functions. Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-24 23:57:32 +02:00
Isaku Yamahata	2eb6d5eb48	xen: definitions which ia64/xen needs Add xen VIRQ numbers defined for arch specific use. ia64/xen domU uses VIRQ_ARCH_0 for virtual itc timer. Although all those constants aren't used yet by ia64 at this moment, add all arch specific VIRQ numbers. Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-24 23:57:32 +02:00
Isaku Yamahata	9a9db275b0	xen: definisions which ia64 needs Add xen hypercall numbers defined for arch specific use. ia64/xen domU uses __HYPERVISOR_arch_1 to manipulate paravirtualized IOSAPIC. Although all those constants aren't used yet by IA64 at this moment, add all arch specific hypercall numbers. Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-24 23:57:32 +02:00
Jeremy Fitzhardinge	aa380c82b8	xen: add support for callbackops hypercall Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge	85958b465c	x86: unify pgd ctor/dtor All pagetables need fundamentally the same setup and destruction, so just use the same code for everything. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge	68db065c84	x86: unify KERNEL_PGD_PTRS Make KERNEL_PGD_PTRS common, as previously it was only being defined for 32-bit. There are a couple of follow-on changes from this: - KERNEL_PGD_PTRS was being defined in terms of USER_PGD_PTRS. The definition of USER_PGD_PTRS doesn't really make much sense on x86-64, since it can have two different user address-space configurations. I renamed USER_PGD_PTRS to KERNEL_PGD_BOUNDARY, which is meaningful for all of 32/32, 32/64 and 64/64 process configurations. - USER_PTRS_PER_PGD was also defined and was being used for similar purposes. Converting its users to KERNEL_PGD_BOUNDARY left it completely unused, and so I removed it. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Andi Kleen <ak@suse.de> Cc: Zach Amsden <zach@vmware.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge	9666e9d44b	xen: unify pte operations on machine frames Xen's pte operations on mfns can be unified like the kernel's pfn operations. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge	3b4724b0e6	xen: use phys_addr_t when referring to physical addresses Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge	286cd49456	x86: demacro pgalloc paravirt stubs Turn paravirt stubs into inline functions, so that the arguments are still typechecked. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge	c20311e165	x86/pgtable.h: demacro ptep_clear_flush_young Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-24 23:57:31 +02:00

... 3 4 5 6 7 ...

22604 Commits