linux

Commit Graph

Author	SHA1	Message	Date
Benjamin Herrenschmidt	cec08e7a94	[POWERPC] vmemmap fixes to use smaller pages This changes vmemmap to use a different region (region 0xf) of the address space, and to configure the page size of that region dynamically at boot. The problem with the current approach of always using 16M pages is that it's not well suited to machines that have small amounts of memory such as small partitions on pseries, or PS3's. In fact, on the PS3, failure to allocate the 16M page backing vmmemmap tends to prevent hotplugging the HV's "additional" memory, thus limiting the available memory even more, from my experience down to something like 80M total, which makes it really not very useable. The logic used by my match to choose the vmemmap page size is: - If 16M pages are available and there's 1G or more RAM at boot, use that size. - Else if 64K pages are available, use that - Else use 4K pages I've tested on a POWER6 (16M pages) and on an iSeries POWER3 (4K pages) and it seems to work fine. Note that I intend to change the way we organize the kernel regions & SLBs so the actual region will change from 0xf back to something else at one point, as I simplify the SLB miss handler, but that will be for a later patch. Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-05-15 20:49:25 +10:00
Paul Mackerras	3b5750644b	[POWERPC] Bolt in SLB entry for kernel stack on secondary cpus This fixes a regression reported by Kamalesh Bulabel where a POWER4 machine would crash because of an SLB miss at a point where the SLB miss exception was unrecoverable. This regression is tracked at: http://bugzilla.kernel.org/show_bug.cgi?id=10082 SLB misses at such points shouldn't happen because the kernel stack is the only memory accessed other than things in the first segment of the linear mapping (which is mapped at all times by entry 0 of the SLB). The context switch code ensures that SLB entry 2 covers the kernel stack, if it is not already covered by entry 0. None of entries 0 to 2 are ever replaced by the SLB miss handler. Where this went wrong is that the context switch code assumes it doesn't have to write to SLB entry 2 if the new kernel stack is in the same segment as the old kernel stack, since entry 2 should already be correct. However, when we start up a secondary cpu, it calls slb_initialize, which doesn't set up entry 2. This is correct for the boot cpu, where we will be using a stack in the kernel BSS at this point (i.e. init_thread_union), but not necessarily for secondary cpus, whose initial stack can be allocated anywhere. This doesn't cause any immediate problem since the SLB miss handler will just create an SLB entry somewhere else to cover the initial stack. In fact it's possible for the cpu to go quite a long time without SLB entry 2 being valid. Eventually, though, the entry created by the SLB miss handler will get overwritten by some other entry, and if the next access to the stack is at an unrecoverable point, we get the crash. This fixes the problem by making slb_initialize create a suitable entry for the kernel stack, if we are on a secondary cpu and the stack isn't covered by SLB entry 0. This requires initializing the get_paca()->kstack field earlier, so I do that in smp_create_idle where the current field is initialized. This also abstracts a bit of the computation that mk_esid_data in slb.c does so that it can be used in slb_initialize. Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-05-02 15:00:45 +10:00
Geoff Levand	bbea346062	[POWERPC] Fix slb.c compile warnings Arrange for a syntax check to always be done on the powerpc/mm/slb.c DBG() macro by defining it to pr_debug() for non-debug builds. Also, fix these related compile warnings: slb.c:273: warning: format '%04x' expects type 'unsigned int', but argument 2 has type 'long unsigned int slb.c:274: warning: format '%04x' expects type 'unsigned int', but argument 2 has type 'long unsigned int' Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-05-02 15:00:44 +10:00
Anton Blanchard	44387e9ff2	[POWERPC] Fix PMU + soft interrupt disable bug Since the PMU is an NMI now, it can come at any time we are only soft disabled. We must hard disable around the two places we allow the kernel stack SLB and r1 to go out of sync. Otherwise the PMU exception can force a kernel stack SLB into another slot, which can lead to it getting evicted, which can lead to a nasty unrecoverable SLB miss in the exception entry code. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Olof Johansson <olof@lixom.net> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-03-20 10:14:55 +11:00
Paul Mackerras	9156ad4833	Merge branch 'linux-2.6'	2008-01-24 10:07:21 +11:00
Paul Mackerras	dfbe0d3b6b	[POWERPC] Fix boot failure on POWER6 Commit `473980a993` added a call to clear the SLB shadow buffer before registering it. Unfortunately this means that we clear out the entries that slb_initialize has previously set in there. On POWER6, the hypervisor uses the SLB shadow buffer when doing partition switches, and that means that after the next partition switch, each non-boot CPU has no SLB entries to map the kernel text and data, which causes it to crash. This fixes it by reverting most of `473980a9` and instead clearing the 3rd entry explicitly in slb_initialize. This fixes the problem that `473980a9` was trying to solve, but without breaking POWER6. Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-01-15 17:30:58 +11:00
Michael Neuling	473980a993	[POWERPC] Fix CPU hotplug when using the SLB shadow buffer Before we register the SLB shadow buffer, we need to invalidate the entries in the buffer, otherwise we can end up stale entries from when we previously offlined the CPU. This does this invalidate as well as unregistering the buffer with PHYP before we offline the cpu. Tested and fixes crashes seen on 970MP (thanks to tonyb) and POWER5. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-01-11 16:33:55 +11:00
Michael Neuling	584f8b71a2	[POWERPC] Use SLB size from the device tree Currently we hardwire the number of SLBs to 64, but PAPR says we should use the ibm,slb-size property to obtain the number of SLB entries. This uses this property instead of assuming 64. If no property is found, we assume 64 entries as before. This soft patches the SLB handler, so it shouldn't change performance at all. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-12-11 13:45:56 +11:00
will schmidt	465ccab9eb	[POWERPC] Fix switch_slb handling of 1T ESID values Now that we have 1TB segment size support, we need to be using the GET_ESID_1T macro when comparing ESID values for pc, stack, and unmapped_base within switch_slb(). A new helper function called esids_match() contains the logic for deciding when to call GET_ESID and GET_ESID_1T. This fixes a duplicate-slb-entry inspired machine-check exception I was seeing when trying to run java on a power6 partition. Tested on power6 and power5. Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-11-08 14:15:31 +11:00
will schmidt	aa39be09df	[POWERPC] Include udbg.h when using udbg_printf This fixes the error error: implicit declaration of function "udbg_printf" We have a few spots where we reference udbg_printf() without #including udbg.h. These are within #ifdef DEBUG blocks, so unnoticed until we do a #define DEBUG or #define DEBUG_LOW nearby. Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-11-08 14:15:31 +11:00
Olof Johansson	f66bce5e6a	[POWERPC] Add 1TB workaround for PA6T PA6T has a bug where the slbie instruction does not honor the large segment bit. As a result, we have to always use slbia when switching context. We don't have to worry about changing the slbie's during fault processing, since they should never be replacing one VSID with another using the same ESID. I.e. there's no risk for inserting duplicate entries due to a failed slbie of the old entry. So as long as we clear it out on context switch we should be fine. Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-10-17 22:30:09 +10:00
Paul Mackerras	1189be6508	[POWERPC] Use 1TB segments This makes the kernel use 1TB segments for all kernel mappings and for user addresses of 1TB and above, on machines which support them (currently POWER5+, POWER6 and PA6T). We detect that the machine supports 1TB segments by looking at the ibm,processor-segment-sizes property in the device tree. We don't currently use 1TB segments for user addresses < 1T, since that would effectively prevent 32-bit processes from using huge pages unless we also had a way to revert to using 256MB segments. That would be possible but would involve extra complications (such as keeping track of which segment size was used when HPTEs were inserted) and is not addressed here. Parts of this patch were originally written by Ben Herrenschmidt. Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-10-12 14:05:17 +10:00
Michael Neuling	00efee7d5d	[POWERPC] Remove barriers from the SLB shadow buffer update After talking to an IBM POWER hypervisor (PHYP) design and development guy, there seems to be no need for memory barriers when updating the SLB shadow buffer provided we only update it from the current CPU, which we do. Also, these guys see no need in the future for these barriers. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-09-19 14:40:54 +10:00
Paul Mackerras	175587cca7	[POWERPC] Fix SLB initialization at boot time This partially reverts `edd0622bd2`. It turns out that the part of that commit that aimed to ensure that we created an SLB entry for the kernel stack on secondary CPUs when starting the CPU didn't achieve its aim, and in fact caused a regression, because get_paca()->kstack is not initialized at the point where slb_initialize is called. This therefore just reverts that part of that commit, while keeping the change to slb_flush_and_rebolt, which is correct and necessary. Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-08-25 16:58:43 +10:00
Paul Mackerras	edd0622bd2	[POWERPC] Fix potential duplicate entry in SLB shadow buffer We were getting a duplicate entry in the SLB shadow buffer in slb_flush_and_rebolt() if the kernel stack was in the same segment as PAGE_OFFSET, which on POWER6 causes the hypervisor to terminate the partition with an error. This fixes it. Also we were not creating an SLB entry (or an SLB shadow buffer entry) for the kernel stack on secondary CPUs when starting the CPU. This isn't a major problem, since an appropriate entry will be created on demand, but this fixes that also for consistency. Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-08-10 21:04:07 +10:00
Michael Neuling	67439b76f2	[POWERPC] Fixes for the SLB shadow buffer code On a machine with hardware 64kB pages and a kernel configured for a 64kB base page size, we need to change the vmalloc segment from 64kB pages to 4kB pages if some driver creates a non-cacheable mapping in the vmalloc area. However, we never updated with SLB shadow buffer. This fixes it. Thanks to paulus for finding this. Also added some write barriers to ensure the shadow buffer contents are always consistent. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-08-03 19:36:01 +10:00
Benjamin Herrenschmidt	d0f13e3c20	[POWERPC] Introduce address space "slices" The basic issue is to be able to do what hugetlbfs does but with different page sizes for some other special filesystems; more specifically, my need is: - Huge pages - SPE local store mappings using 64K pages on a 4K base page size kernel on Cell - Some special 4K segments in 64K-page kernels for mapping a dodgy type of powerpc-specific infiniband hardware that requires 4K MMU mappings for various reasons I won't explain here. The main issues are: - To maintain/keep track of the page size per "segment" (as we can only have one page size per segment on powerpc, which are 256MB divisions of the address space). - To make sure special mappings stay within their allotted "segments" (including MAP_FIXED crap) - To make sure everybody else doesn't mmap/brk/grow_stack into a "segment" that is used for a special mapping Some of the necessary mechanisms to handle that were present in the hugetlbfs code, but mostly in ways not suitable for anything else. The patch relies on some changes to the generic get_unmapped_area() that just got merged. It still hijacks hugetlb callbacks here or there as the generic code hasn't been entirely cleaned up yet but that shouldn't be a problem. So what is a slice ? Well, I re-used the mechanism used formerly by our hugetlbfs implementation which divides the address space in "meta-segments" which I called "slices". The division is done using 256MB slices below 4G, and 1T slices above. Thus the address space is divided currently into 16 "low" slices and 16 "high" slices. (Special case: high slice 0 is the area between 4G and 1T). Doing so simplifies significantly the tracking of segments and avoids having to keep track of all the 256MB segments in the address space. While I used the "concepts" of hugetlbfs, I mostly re-implemented everything in a more generic way and "ported" hugetlbfs to it. Slices can have an associated page size, which is encoded in the mmu context and used by the SLB miss handler to set the segment sizes. The hash code currently doesn't care, it has a specific check for hugepages, though I might add a mechanism to provide per-slice hash mapping functions in the future. The slice code provide a pair of "generic" get_unmapped_area() (bottomup and topdown) functions that should work with any slice size. There is some trickiness here so I would appreciate people to have a look at the implementation of these and let me know if I got something wrong. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-05-09 16:35:00 +10:00
Stephen Rothwell	56291e19e3	[POWERPC] iSeries: fix slb.c for combined build Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-12-04 20:39:19 +11:00
Michael Neuling	2f6093c847	[POWERPC] Implement SLB shadow buffer This adds a shadow buffer for the SLBs and regsiters it with PHYP. Only the bolted SLB entries (top 3) are shadowed. The SLB shadow buffer tells the hypervisor what the kernel needs to have in the SLB for the kernel to be able to function. The hypervisor can use this information to speed up partition context switches. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-08-08 17:08:56 +10:00
Jörn Engel	6ab3d5624e	Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-06-30 19:25:36 +02:00
Paul Mackerras	bf72aeba2f	powerpc: Use 64k pages without needing cache-inhibited large pages Some POWER5+ machines can do 64k hardware pages for normal memory but not for cache-inhibited pages. This patch lets us use 64k hardware pages for most user processes on such machines (assuming the kernel has been configured with CONFIG_PPC_64K_PAGES=y). User processes start out using 64k pages and get switched to 4k pages if they use any non-cacheable mappings. With this, we use 64k pages for the vmalloc region and 4k pages for the imalloc region. If anything creates a non-cacheable mapping in the vmalloc region, the vmalloc region will get switched to 4k pages. I don't know of any driver other than the DRM that would do this, though, and these machines don't have AGP. When a region gets switched from 64k pages to 4k pages, we do not have to clear out all the 64k HPTEs from the hash table immediately. We use the _PAGE_COMBO bit in the Linux PTE to indicate whether the page was hashed in as a 64k page or a set of 4k pages. If hash_page is trying to insert a 4k page for a Linux PTE and it sees that it has already been inserted as a 64k page, it first invalidates the 64k HPTE before inserting the 4k HPTE. The hash invalidation routines also use the _PAGE_COMBO bit, to determine whether to look for a 64k HPTE or a set of 4k HPTEs to remove. With those two changes, we can tolerate a mix of 4k and 64k HPTEs in the hash table, and they will all get removed when the address space is torn down. Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-06-15 10:45:18 +10:00
Paul Mackerras	4306443128	powerpc: Remove unused paca->pgdir field The pgdir field in the paca was a leftover from the dynamic VSIDs patch, and is not used in the current kernel code. This removes it. Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-06-12 18:38:21 +10:00
David Gibson	14c89e7fc8	[PATCH] powerpc: Replace VMALLOCBASE with VMALLOC_START On ppc64, we independently define VMALLOCBASE and VMALLOC_START to be the same thing: the start of the vmalloc() area at 0xd000000000000000. VMALLOC_START is used much more widely, including in generic code, so this patch gets rid of the extraneous VMALLOCBASE. This does require moving the definitions of region IDs from page_64.h to pgtable.h, but they don't clearly belong in the former rather than the latter, anyway. While we're moving them, clean up the definitions of the REGION_IDs: - Abolish REGION_SIZE, it was only used once, to define REGION_MASK anyway - Define the specific region ids in terms of the REGION_ID() macro. - Define KERNEL_REGION_ID in terms of PAGE_OFFSET rather than KERNELBASE. It amounts to the same thing, but conceptually this is about the region of the linear mapping (which starts at PAGE_OFFSET) rather than of the kernel text itself (which is at KERNELBASE). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-01-09 15:05:47 +11:00
Michael Ellerman	b5666f7039	[PATCH] powerpc: Separate usage of KERNELBASE and PAGE_OFFSET This patch separates usage of KERNELBASE and PAGE_OFFSET. I haven't looked at any of the PPC32 code, if we ever want to support Kdump on PPC we'll have to do another audit, ditto for iSeries. This patch makes PAGE_OFFSET the constant, it'll always be 0xC * 1 gazillion for 64-bit. To get a physical address from a virtual one you subtract PAGE_OFFSET, _not_ KERNELBASE. KERNELBASE is the virtual address of the start of the kernel, it's often the same as PAGE_OFFSET, but _might not be_. If you want to know something's offset from the start of the kernel you should subtract KERNELBASE. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-01-09 14:51:54 +11:00
Michael Ellerman	51fae6de24	[PATCH] powerpc: Add a is_kernel_addr() macro There's a bunch of code that compares an address with KERNELBASE to see if it's a "kernel address", ie. >= KERNELBASE. The proper test is actually to compare with PAGE_OFFSET, since we're going to change KERNELBASE soon. So replace all of them with an is_kernel_addr() macro that does that. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-01-09 14:51:50 +11:00
Benjamin Herrenschmidt	3c726f8dee	[PATCH] ppc64: support 64k pages Adds a new CONFIG_PPC_64K_PAGES which, when enabled, changes the kernel base page size to 64K. The resulting kernel still boots on any hardware. On current machines with 4K pages support only, the kernel will maintain 16 "subpages" for each 64K page transparently. Note that while real 64K capable HW has been tested, the current patch will not enable it yet as such hardware is not released yet, and I'm still verifying with the firmware architects the proper to get the information from the newer hypervisors. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-11-06 16:56:47 -08:00
Paul Mackerras	ab1f9dac6e	powerpc: Merge arch/ppc64/mm to arch/powerpc/mm This moves the remaining files in arch/ppc64/mm to arch/powerpc/mm, and arranges that we use them when compiling with ARCH=ppc64. Signed-off-by: Paul Mackerras <paulus@samba.org>	2005-10-10 21:58:35 +10:00

27 Commits