linux

Commit Graph

Author	SHA1	Message	Date
KAMEZAWA Hiroyuki	3089aa1b0c	kcore: use registerd physmem information For /proc/kcore, each arch registers its memory range by kclist_add(). In usual, - range of physical memory - range of vmalloc area - text, etc... are registered but "range of physical memory" has some troubles. It doesn't updated at memory hotplug and it tend to include unnecessary memory holes. Now, /proc/iomem (kernel/resource.c) includes required physical memory range information and it's properly updated at memory hotplug. Then, it's good to avoid using its own code(duplicating information) and to rebuild kclist for physical memory based on /proc/iomem. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-23 07:39:41 -07:00
KAMEZAWA Hiroyuki	908eedc616	walk system ram range Originally, walk_memory_resource() was introduced to traverse all memory of "System RAM" for detecting memory hotplug/unplug range. For doing so, flags of IORESOUCE_MEM\|IORESOURCE_BUSY was used and this was enough for memory hotplug. But for using other purpose, /proc/kcore, this may includes some firmware area marked as IORESOURCE_BUSY \| IORESOUCE_MEM. This patch makes the check strict to find out busy "System RAM". Note: PPC64 keeps their own walk_memory_resouce(), which walk through ppc64's lmb informaton. Because old kclist_add() is called per lmb, this patch makes no difference in behavior, finally. And this patch removes CONFIG_MEMORY_HOTPLUG check from this function. Because pfn_valid() just show "there is memmap or not* and cannot be used for "there is physical memory or not", this function is useful in generic to scan physical memory range. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: Américo Wang <xiyou.wangcong@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-23 07:39:41 -07:00
KAMEZAWA Hiroyuki	a0614da88b	kcore: register vmalloc area in generic way For /proc/kcore, vmalloc areas are registered per arch. But, all of them registers same range of [VMALLOC_START...VMALLOC_END) This patch unifies them. By this. archs which have no kclist_add() hooks can see vmalloc area correctly. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-23 07:39:41 -07:00
KAMEZAWA Hiroyuki	c30bb2a25f	kcore: add kclist types Presently, kclist_add() only eats start address and size as its arguments. Considering to make kclist dynamically reconfigulable, it's necessary to know which kclists are for System RAM and which are not. This patch add kclist types as KCORE_RAM KCORE_VMALLOC KCORE_TEXT KCORE_OTHER This "type" is used in a patch following this for detecting KCORE_RAM. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-23 07:39:41 -07:00
Geert Uytterhoeven	cc013a8890	arches: drop superfluous casts in nr_free_pages() callers Commit `9617729941` ("Drop free_pages()") modified nr_free_pages() to return 'unsigned long' instead of 'unsigned int'. This made the casts to 'unsigned long' in most callers superfluous, so remove them. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: WANG Cong <xiyou.wangcong@gmail.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Cc: Mikael Starvik <starvik@axis.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Howells <dhowells@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Chris Zankel <zankel@tensilica.com> Cc: Michal Simek <monstr@monstr.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-22 07:17:34 -07:00
Ingo Molnar	cdd6c482c9	perf: Do the big rename: Performance Counters -> Performance Events Bye-bye Performance Counters, welcome Performance Events! In the past few months the perfcounters subsystem has grown out its initial role of counting hardware events, and has become (and is becoming) a much broader generic event enumeration, reporting, logging, monitoring, analysis facility. Naming its core object 'perf_counter' and naming the subsystem 'perfcounters' has become more and more of a misnomer. With pending code like hw-breakpoints support the 'counter' name is less and less appropriate. All in one, we've decided to rename the subsystem to 'performance events' and to propagate this rename through all fields, variables and API names. (in an ABI compatible fashion) The word 'event' is also a bit shorter than 'counter' - which makes it slightly more convenient to write/handle as well. Thanks goes to Stephane Eranian who first observed this misnomer and suggested a rename. User-space tooling and ABI compatibility is not affected - this patch should be function-invariant. (Also, defconfigs were not touched to keep the size down.) This patch has been generated via the following script: FILES=$(find * -type f \| grep -vE 'oprofile\|[^K]config') sed -i \ -e 's/PERF_EVENT_/PERF_RECORD_/g' \ -e 's/PERF_COUNTER/PERF_EVENT/g' \ -e 's/perf_counter/perf_event/g' \ -e 's/nb_counters/nb_events/g' \ -e 's/swcounter/swevent/g' \ -e 's/tpcounter_event/tp_event/g' \ $FILES for N in $(find . -name perf_counter.[ch]); do M=$(echo $N \| sed 's/perf_counter/perf_event/g') mv $N $M done FILES=$(find . -name perf_event.*) sed -i \ -e 's/COUNTER_MASK/REG_MASK/g' \ -e 's/COUNTER/EVENT/g' \ -e 's/\<event\>/event_id/g' \ -e 's/counter/event/g' \ -e 's/Counter/Event/g' \ $FILES ... to keep it as correct as possible. This script can also be used by anyone who has pending perfcounters patches - it converts a Linux kernel tree over to the new naming. We tried to time this change to the point in time where the amount of pending patches is the smallest: the end of the merge window. Namespace clashes were fixed up in a preparatory patch - and some stylistic fallout will be fixed up in a subsequent patch. ( NOTE: 'counters' are still the proper terminology when we deal with hardware registers - and these sed scripts are a bit over-eager in renaming them. I've undone some of that, but in case there's something left where 'counter' would be better than 'event' we can undo that on an individual basis instead of touching an otherwise nicely automated patch. ) Suggested-by: Stephane Eranian <eranian@google.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Paul Mackerras <paulus@samba.org> Reviewed-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Howells <dhowells@redhat.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <linux-arch@vger.kernel.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-21 14:28:04 +02:00
Linus Torvalds	723e9db7a4	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (134 commits) powerpc/nvram: Enable use Generic NVRAM driver for different size chips powerpc/iseries: Fix oops reading from /proc/iSeries/mf/*/cmdline powerpc/ps3: Workaround for flash memory I/O error powerpc/booke: Don't set DABR on 64-bit BookE, use DAC1 instead powerpc/perf_counters: Reduce stack usage of power_check_constraints powerpc: Fix bug where perf_counters breaks oprofile powerpc/85xx: Fix SMP compile error and allow NULL for smp_ops powerpc/irq: Improve nanodoc powerpc: Fix some late PowerMac G5 with PCIe ATI graphics powerpc/fsl-booke: Use HW PTE format if CONFIG_PTE_64BIT powerpc/book3e: Add missing page sizes powerpc/pseries: Fix to handle slb resize across migration powerpc/powermac: Thermal control turns system off too eagerly powerpc/pci: Merge ppc32 and ppc64 versions of phb_scan() powerpc/405ex: support cuImage via included dtb powerpc/405ex: provide necessary fixup function to support cuImage powerpc/40x: Add support for the ESTeem 195E (PPC405EP) SBC powerpc/44x: Add Eiger AMCC (AppliedMicro) PPC460SX evaluation board support. powerpc/44x: Update Arches defconfig powerpc/44x: Update Arches dts ... Fix up conflicts in drivers/char/agp/uninorth-agp.c	2009-09-15 09:51:09 -07:00
Linus Torvalds	ada3fa1505	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits) powerpc64: convert to dynamic percpu allocator sparc64: use embedding percpu first chunk allocator percpu: kill lpage first chunk allocator x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA percpu: update embedding first chunk allocator to handle sparse units percpu: use group information to allocate vmap areas sparsely vmalloc: implement pcpu_get_vm_areas() vmalloc: separate out insert_vmalloc_vm() percpu: add chunk->base_addr percpu: add pcpu_unit_offsets[] percpu: introduce pcpu_alloc_info and pcpu_group_info percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward percpu: add @align to pcpu_fc_alloc_fn_t percpu: make @dyn_size mandatory for pcpu_setup_first_chunk() percpu: drop @static_size from first chunk allocators percpu: generalize first chunk allocator selection percpu: build first chunk allocators selectively percpu: rename 4k first chunk allocator to page percpu: improve boot messages percpu: fix pcpu_reclaim() locking ... Fix trivial conflict as by Tejun Heo in kernel/sched.c	2009-09-15 09:39:44 -07:00
Brian King	46db2f86a3	powerpc/pseries: Fix to handle slb resize across migration The SLB can change sizes across a live migration, which was not being handled, resulting in possible machine crashes during migration if migrating to a machine which has a smaller max SLB size than the source machine. Fix this by first reducing the SLB size to the minimum possible value, which is 32, prior to migration. Then during the device tree update which occurs after migration, we make the call to ensure the SLB gets updated. Also add the slb_size to the lparcfg output so that the migration tools can check to make sure the kernel has this capability before allowing migration in scenarios where the SLB size will change. BenH: Fixed #include <asm/mmu-hash64.h> -> <asm/mmu.h> to avoid breaking ppc32 build Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-09-02 16:19:01 +10:00
Kumar Gala	df5d6ecf81	powerpc/mm: Add MMU features for TLB reservation & Paired MAS registers Support for TLB reservation (or TLB Write Conditional) and Paired MAS registers are optional for a processor implementation so we handle them via MMU feature sections. We currently only used paired MAS registers to access the full RPN + perm bits that are kept in MAS7\|\|MAS3. We assume that if an implementation has hardware page table at this time it also implements in TLB reservations. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-28 14:24:12 +10:00
Benjamin Herrenschmidt	3c2ee2d9f4	Merge commit 'kumar/next' into next	2009-08-27 13:13:41 +10:00
Benjamin Herrenschmidt	ea3cc330ac	powerpc/mm: Cleanup handling of execute permission This is an attempt at cleaning up a bit the way we handle execute permission on powerpc. _PAGE_HWEXEC is gone, _PAGE_EXEC is now only defined by CPUs that can do something with it, and the myriad of #ifdef's in the I$/D$ coherency code is reduced to 2 cases that hopefully should cover everything. The logic on BookE is a little bit different than what it was though not by much. Since now, _PAGE_EXEC will be set by the generic code for executable pages, we need to filter out if they are unclean and recover it. However, I don't expect the code to be more bloated than it already was in that area due to that change. I could boast that this brings proper enforcing of per-page execute permissions to all BookE and 40x but in fact, we've had that now for some time as a side effect of my previous rework in that area (and I didn't even know it :-) We would only enable execute permission if the page was cache clean and we would only cache clean it if we took and exec fault. Since we now enforce that the later only work if VM_EXEC is part of the VMA flags, we de-fact already enforce per-page execute permissions... Unless I missed something Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-27 13:12:51 +10:00
Kumar Gala	fc4bdb35fb	powerpc/booke: Move MMUCSR definition into mmu-book3e.h The MMUCSR is now defined as part of the Book-3E architecture so we can move it into mmu-book3e.h and add some of the additional bits defined by the architecture specs. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-08-24 20:48:05 -05:00
Benjamin Herrenschmidt	4f0dbc2781	Merge commit 'paulus-perf/master' into next	2009-08-20 11:07:56 +10:00
Kumar Gala	797a747a82	powerpc/mm: Fix assert_pte_locked to work properly on uniprocessor Since the pte_lockptr is a spinlock it gets optimized away on uniprocessor builds so using spin_is_locked is not correct. We can use assert_spin_locked instead and get the proper behavior between UP and SMP builds. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:28:32 +10:00
Roel Kluin	8dcd038a13	powerpc/fsl-booke: read buffer overflow cam[tlbcam_index] is checked before tlbcam_index < ARRAY_SIZE(cam) Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:27:12 +10:00
Kumar Gala	67050b5c3e	powerpc/mm: Fix switch_mmu_context to iterate of the proper list of cpus Introduced a temporary variable into our iterating over the list cpus that are threads on the same core. For some reason Ben forgot how for loops work. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:25:12 +10:00
Benjamin Herrenschmidt	2d27cfd328	powerpc: Remaining 64-bit Book3E support This contains all the bits that didn't fit in previous patches :-) This includes the actual exception handlers assembly, the changes to the kernel entry, other misc bits and wiring it all up in Kconfig. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:25:11 +10:00
Benjamin Herrenschmidt	32a74949b7	powerpc/mm: Add support for SPARSEMEM_VMEMMAP on 64-bit Book3E The base TLB support didn't include support for SPARSEMEM_VMEMMAP, though we did carve out some virtual space for it, the necessary support code wasn't there. This implements it by using 16M pages for now, though the page size could easily be changed at runtime if necessary. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:25:10 +10:00
Benjamin Herrenschmidt	25d21ad6e7	powerpc: Add TLB management code for 64-bit Book3E This adds the TLB miss handler assembly, the low level TLB flush routines along with the necessary hook for dealing with our virtual page tables or indirect TLB entries that need to be flushes when PTE pages are freed. There is currently no support for hugetlbfs Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:25:09 +10:00
Benjamin Herrenschmidt	a8f7758c1c	powerpc/mm: Move around mmu_gathers definition on 64-bit The definition for the global structure mmu_gathers, used by generic code, is currently defined in multiple places not including anything used by 64-bit Book3E. This changes it by moving to one place common to all processors. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:25:09 +10:00
Benjamin Herrenschmidt	57e2a99f74	powerpc: Add memory management headers for new 64-bit BookE This adds the PTE and pgtable format definitions, along with changes to the kernel memory map and other definitions related to implementing support for 64-bit Book3E. This also shields some asm-offset bits that are currently only relevant on 32-bit We also move the definition of the "linux" page size constants to the common mmu.h file and add a few sizes that are relevant to embedded processors. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:25:06 +10:00
Benjamin Herrenschmidt	c7cc58a1ad	powerpc/mm: Rework & cleanup page table freeing code path That patch used to just add a hook to page table flushing but pulling that string brought out a whole bunch of issues, so it now does that and more: - We now make the RCU batching of page freeing SMP only, as I believe it was intended initially. We make a few more things compile to nothing on !CONFIG_SMP - Some macros are turned into functions, though that forced me to out of line a few stuffs due to unsolvable include depenencies, however it's probably better that way anyway, it's not -that- critical code path. - 32-bit didn't call pte_free_finish() on tlb_flush() which means that it wouldn't push out the batch to RCU for delayed freeing when a bunch of page tables have been freed, they would just stay in there until the batch gets full. 64-bit BookE will use that hook to maintain the virtually linear page tables or the indirect entries in the TLB when using the HW loader. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:24:56 +10:00
Benjamin Herrenschmidt	d4e167da4c	powerpc/mm: Make low level TLB flush ops on BookE take additional args We need to pass down whether the page is direct or indirect and we'll need to pass the page size to _tlbil_va and _tlbivax_bcast We also add a new low level _tlbil_pid_noind() which does a TLB flush by PID but avoids flushing indirect entries if possible This implements those new prototypes but defines them with inlines or macros so that no additional arguments are actually passed on current processors. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:12:41 +10:00
Benjamin Herrenschmidt	a245067e20	powerpc/mm: Add support for early ioremap on non-hash 64-bit processors This adds some code to do early ioremap's using page tables instead of bolting entries in the hash table. This will be used by the upcoming 64-bits BookE port. The patch also changes the test for early vs. late ioremap to use slab_is_available() instead of our old hackish mem_init_done. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:12:40 +10:00
Benjamin Herrenschmidt	fcce810986	powerpc/mm: Add HW threads support to no_hash TLB management The current "no hash" MMU context management code is written with the assumption that one CPU == one TLB. This is not the case on implementations that support HW multithreading, where several linux CPUs can share the same TLB. This adds some basic support for this to our context management and our TLB flushing code. It also cleans up the optional debugging output a bit Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:12:37 +10:00
Benjamin Herrenschmidt	ee43eb788b	powerpc: Use names rather than numbers for SPRGs (v2) The kernel uses SPRG registers for various purposes, typically in low level assembly code as scratch registers or to hold per-cpu global infos such as the PACA or the current thread_info pointer. We want to be able to easily shuffle the usage of those registers as some implementations have specific constraints realted to some of them, for example, some have userspace readable aliases, etc.. and the current choice isn't always the best. This patch should not change any code generation, and replaces the usage of SPRN_SPRGn everywhere in the kernel with a named replacement and adds documentation next to the definition of the names as to what those are used for on each processor family. The only parts that still use the original numbers are bits of KVM or suspend/resume code that just blindly needs to save/restore all the SPRGs. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:12:27 +10:00
Anton Blanchard	de4376c284	powerpc: Preload application text segment instead of TASK_UNMAPPED_BASE TASK_UNMAPPED_BASE is not used with the new top down mmap layout. We can reuse this preload slot by loading in the segment at 0x10000000, where almost all PowerPC binaries are linked at. On a microbenchmark that bounces a token between two 64bit processes over pipes and calls gettimeofday each iteration (to access the VDSO), both the 32bit and 64bit context switch rate improves (tested on a 4GHz POWER6): 32bit: 273k/sec -> 283k/sec 64bit: 277k/sec -> 284k/sec Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:12:26 +10:00
Anton Blanchard	5eb9bac040	powerpc: Rearrange SLB preload code With the new top down layout it is likely that the pc and stack will be in the same segment, because the pc is most likely in a library allocated via a top down mmap. Right now we bail out early if these segments match. Rearrange the SLB preload code to sanity check all SLB preload addresses are not in the kernel, then check all addresses for conflicts. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:12:25 +10:00
Paul Mackerras	9c1e105238	powerpc: Allow perf_counters to access user memory at interrupt time This provides a mechanism to allow the perf_counters code to access user memory in a PMU interrupt routine. Such an access can cause various kinds of interrupt: SLB miss, MMU hash table miss, segment table miss, or TLB miss, depending on the processor. This commit only deals with 64-bit classic/server processors, which use an MMU hash table. 32-bit processors are already able to access user memory at interrupt time. Since we don't soft-disable on 32-bit, we avoid the possibility of reentering hash_page or the TLB miss handlers, since they run with interrupts disabled. On 64-bit processors, an SLB miss interrupt on a user address will update the slb_cache and slb_cache_ptr fields in the paca. This is OK except in the case where a PMU interrupt occurs in switch_slb, which also accesses those fields. To prevent this, we hard-disable interrupts in switch_slb. Interrupts are already soft-disabled at this point, and will get hard-enabled when they get soft-enabled later. This also reworks slb_flush_and_rebolt: to avoid hard-disabling twice, and to make sure that it clears the slb_cache_ptr when called from other callers than switch_slb, the existing routine is renamed to __slb_flush_and_rebolt, which is called by switch_slb and the new version of slb_flush_and_rebolt. Similarly, switch_stab (used on POWER3 and RS64 processors) gets a hard_irq_disable() to protect the per-cpu variables used there and in ste_allocate. If a MMU hashtable miss interrupt occurs, normally we would call hash_page to look up the Linux PTE for the address and create a HPTE. However, hash_page is fairly complex and takes some locks, so to avoid the possibility of deadlock, we check the preemption count to see if we are in a (pseudo-)NMI handler, and if so, we don't call hash_page but instead treat it like a bad access that will get reported up through the exception table mechanism. An interrupt whose handler runs even though the interrupt occurred when soft-disabled (such as the PMU interrupt) is considered a pseudo-NMI handler, which should use nmi_enter()/nmi_exit() rather than irq_enter()/irq_exit(). Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2009-08-18 14:48:43 +10:00
Tejun Heo	384be2b18a	Merge branch 'percpu-for-linus' into percpu-for-next Conflicts: arch/sparc/kernel/smp_64.c arch/x86/kernel/cpu/perf_counter.c arch/x86/kernel/setup_percpu.c drivers/cpufreq/cpufreq_ondemand.c mm/percpu.c Conflicts in core and arch percpu codes are mostly from commit ed78e1e078dd44249f88b1dd8c76dafb39567161 which substituted many num_possible_cpus() with nr_cpu_ids. As for-next branch has moved all the first chunk allocators into mm/percpu.c, the changes are moved from arch code to mm/percpu.c. Signed-off-by: Tejun Heo <tj@kernel.org>	2009-08-14 14:45:31 +09:00
Kumar Gala	5156ddce6c	powerpc/mm: Fix SMP issue with MMU context handling code In switch_mmu_context() if we call steal_context_smp() to get a context to use we shouldn't fall through and than call steal_context_up(). Doing so can be problematic in that the 'mm' that steal_context_up() ends up using will not get marked dirty in the stale_map[] for other CPUs that might have used that mm. Thus we could end up with stale TLB entries in the other CPUs that can cause all kinda of havoc. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-07-29 23:05:43 -05:00
Benjamin Herrenschmidt	9e1b32caa5	mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() Upcoming paches to support the new 64-bit "BookE" powerpc architecture will need to have the virtual address corresponding to PTE page when freeing it, due to the way the HW table walker works. Basically, the TLB can be loaded with "large" pages that cover the whole virtual space (well, sort-of, half of it actually) represented by a PTE page, and which contain an "indirect" bit indicating that this TLB entry RPN points to an array of PTEs from which the TLB can then create direct entries. Thus, in order to invalidate those when PTE pages are deleted, we need the virtual address to pass to tlbilx or tlbivax instructions. The old trick of sticking it somewhere in the PTE page struct page sucks too much, the address is almost readily available in all call sites and almost everybody implemets these as macros, so we may as well add the argument everywhere. I added it to the pmd and pud variants for consistency. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: David Howells <dhowells@redhat.com> [MN10300 & FRV] Acked-by: Nick Piggin <npiggin@suse.de> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [s390] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-27 12:10:38 -07:00
Michael Ellerman	30c5af435b	powerpc: Use pr_devel() in do_dcache_icache_coherency() pr_debug() can now result in code being generated even when DEBUG is not defined. That's not really desirable in some places. With CONFIG_DYNAMIC_DEBUG=y: size before: text data bss dec hex filename 2036 368 8 2412 96c arch/powerpc/mm/pgtable.o size after: text data bss dec hex filename 1677 248 8 1933 78d arch/powerpc/mm/pgtable.o Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-07-08 13:50:24 +10:00
Michael Ellerman	29e5fa59e5	powerpc: Use pr_devel() in arch/powerpc/mm/gup.c pr_debug() can now result in code being generated even when DEBUG is not defined. That's not really desirable in some places. With CONFIG_DYNAMIC_DEBUG=y: size before: text data bss dec hex filename 3252 384 0 3636 e34 arch/powerpc/mm/gup.o size after: text data bss dec hex filename 2576 96 0 2672 a70 arch/powerpc/mm/gup.o Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-07-08 13:50:23 +10:00
Michael Ellerman	651e2dd2a1	powerpc: Cleanup & use pr_devel() in arch/powerpc/mm/slb.c pr_debug() can now result in code being generated even when DEBUG is not defined. That's not really desirable in some places. With CONFIG_DYNAMIC_DEBUG=y: size before: text data bss dec hex filename 3261 416 4 3681 e61 arch/powerpc/mm/slb.o size after: text data bss dec hex filename 2861 248 4 3113 c29 arch/powerpc/mm/slb.o Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-07-08 13:50:22 +10:00
Michael Ellerman	a1ac38ab98	powerpc: Use pr_devel() in arch/powerpc/mm/mmu_context_nohash.c pr_debug() can now result in code being generated even when DEBUG is not defined. That's not really desirable in some places. With CONFIG_DYNAMIC_DEBUG=y: size before: text data bss dec hex filename 1508 48 28 1584 630 powerpc/mm/mmu_context_nohash.o size after: text data bss dec hex filename 1088 0 28 1116 45c powerpc/mm/mmu_context_nohash.o Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-07-08 13:50:22 +10:00
Joe Perches	d258e64ef5	powerpc: Remove unnecessary semicolons Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Geoff Levand <geoffrey.levand@am.sony.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-07-08 13:50:21 +10:00
Tejun Heo	c43768cbb7	Merge branch 'master' into for-next Pull linus#master to merge PER_CPU_DEF_ATTRIBUTES and alpha build fix changes. As alpha in percpu tree uses 'weak' attribute instead of inline assembly, there's no need for __used attribute. Conflicts: arch/alpha/include/asm/percpu.h arch/mn10300/kernel/vmlinux.lds.S include/linux/percpu-defs.h	2009-07-04 07:13:18 +09:00
Benjamin Herrenschmidt	850f6ac316	powerpc/mm: Make k(un)map_atomic out of line Those functions are way too big to be inline, besides, kmap_atomic() wants to call debug_kmap_atomic() which isn't exported for modules and causes module link failures. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-06-26 14:37:25 +10:00
Tejun Heo	204fba4aa3	percpu: cleanup percpu array definitions Currently, the following three different ways to define percpu arrays are in use. 1. DEFINE_PER_CPU(elem_type[array_len], array_name); 2. DEFINE_PER_CPU(elem_type, array_name[array_len]); 3. DEFINE_PER_CPU(elem_type, array_name)[array_len]; Unify to #1 which correctly separates the roles of the two parameters and thus allows more flexibility in the way percpu variables are defined. [ Impact: cleanup ] Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Tony Luck <tony.luck@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: linux-mm@kvack.org Cc: Christoph Lameter <cl@linux-foundation.org> Cc: David S. Miller <davem@davemloft.net>	2009-06-24 15:13:45 +09:00
Linus Torvalds	d06063cc22	Move FAULT_FLAG_xyz into handle_mm_fault() callers This allows the callers to now pass down the full set of FAULT_FLAG_xyz flags to handle_mm_fault(). All callers have been (mechanically) converted to the new calling convention, there's almost certainly room for architectures to clean up their code and then add FAULT_FLAG_RETRY when that support is added. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-21 13:08:22 -07:00
Michael Ellerman	ba55bd7436	powerpc: Add configurable -Werror for arch/powerpc Add the option to build the code under arch/powerpc with -Werror. The intention is to make it harder for people to inadvertantly introduce warnings in the arch/powerpc code. It needs to be configurable so that if a warning is introduced, people can easily work around it while it's being fixed. The option is a negative, ie. don't enable -Werror, so that it will be turned on for allyes and allmodconfig builds. The default is n, in the hope that developers will build with -Werror, that will probably lead to some build breaks, I am prepared to be flamed. It's not enabled for math-emu, which is a steaming pile of warnings. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-06-16 14:15:45 +10:00
Benjamin Herrenschmidt	7dafd239ab	Merge commit 'origin/master' into next	2009-06-15 10:36:54 +10:00
Sankar P	5cdcd9d691	trivial: spelling fix in ppc code comments Fixes a trivial spelling error in powerpc code comments. Signed-off-by: Sankar P <sankar.curiosity@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-06-12 18:01:47 +02:00
Benjamin Herrenschmidt	bc47ab0241	Merge commit 'origin/master' into next Manual merge of: arch/powerpc/kernel/asm-offsets.c	2009-06-12 16:53:38 +10:00
Peter Zijlstra	f4dbfa8f31	perf_counter: Standardize event names Pure renames only, to PERF_COUNT_HW_* and PERF_COUNT_SW_*. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-11 17:54:15 +02:00
Benjamin Herrenschmidt	944916858a	powerpc: Shield code specific to 64-bit server processors This is a random collection of added ifdef's around portions of code that only mak sense on server processors. Using either CONFIG_PPC_STD_MMU_64 or CONFIG_PPC_BOOK3S as seems appropriate. This is meant to make the future merging of Book3E 64-bit support easier. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-06-09 16:47:38 +10:00
Benjamin Herrenschmidt	d3f6204a7d	powerpc: Set init_bootmem_done on NUMA platforms as well For some obscure reason, we only set init_bootmem_done after initializing bootmem when NUMA isn't enabled. We even document this next to the declaration of that global in system.h which of course I didn't read before I had to debug why some WIP code wasn't working properly... This patch changes it so that we always set it after bootmem is initialized which should have always been the case... go figure ! Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-06-09 16:43:04 +10:00
Benjamin Herrenschmidt	b46b6942b3	powerpc/mm: Fix a AB->BA deadlock scenario with nohash MMU context lock The MMU context_lock can be taken from switch_mm() while the rq->lock is held. The rq->lock can also be taken from interrupts, thus if we get interrupted in destroy_context() with the context lock held and that interrupt tries to take the rq->lock, there's a possible deadlock scenario with another CPU having the rq->lock and calling switch_mm() which takes our context lock. The fix is to always ensure interrupts are off when taking our context lock. The switch_mm() path is already good so this fixes the destroy_context() path. While at it, turn the context lock into a new style spinlock. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-06-09 16:43:04 +10:00
Benjamin Herrenschmidt	3035c8634f	powerpc/mm: Fix some SMP issues with MMU context handling This patch fixes a couple of issues that can happen as a result of steal_context() dropping the context_lock when all possible PIDs are ineligible for stealing (hopefully an extremely hard to hit occurence). This case exposes the possibility of a stale context_mm[] entry to be seen since destroy_context() doesn't clear it and the free map isn't re-tested. It also means steal_context() will not notice a context freed while the lock was help, thus possibly trying to steal a context when a free one was available. This fixes it by always returning to the caller from steal_context when it dropped the lock with a return value that causes the caller to re-samble the number of free contexts, along with properly clearing the context_mm[] array for destroyed contexts. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-06-09 16:42:21 +10:00
Ingo Molnar	23db9f430b	Merge branch 'linus' into perfcounters/core Merge reason: merge almost-rc8 into perfcounters/core, which was -rc6 based - to pick up the latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-01 10:01:39 +02:00
Benjamin Herrenschmidt	435462c6e6	Merge branch 'merge' into next	2009-05-29 13:54:52 +10:00
Benjamin Herrenschmidt	8b31e49d1d	powerpc: Fix up dma_alloc_coherent() on platforms without cache coherency. The implementation we just revived has issues, such as using a Kconfig-defined virtual address area in kernel space that nothing actually carves out (and thus will overlap whatever is there), or having some dependencies on being self contained in a single PTE page which adds unnecessary constraints on the kernel virtual address space. This fixes it by using more classic PTE accessors and automatically locating the area for consistent memory, carving an appropriate hole in the kernel virtual address space, leaving only the size of that area as a Kconfig option. It also brings some dma-mask related fixes from the ARM implementation which was almost identical initially but grew its own fixes. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-05-27 16:33:59 +10:00
Benjamin Herrenschmidt	f637a49e50	powerpc: Minor cleanups of kernel virt address space definitions Make FIXADDR_TOP a compile time constant and cleanup a couple of definitions relative to the layout of the kernel address space on ppc32. We also print out that layout at boot time for debugging purposes. This is a pre-requisite for properly fixing non-coherent DMA allocactions. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-05-27 16:32:50 +10:00
Benjamin Herrenschmidt	b16e7766d6	powerpc: Move dma-noncoherent.c from arch/powerpc/lib to arch/powerpc/mm (pre-requisite to make the next patches more palatable) Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-05-27 16:32:05 +10:00
Hideo Saito	8e35961b57	powerpc/mm: Fix broken MMU PID stealing on !SMP The recent rework of the MMU PID handling for non-hash CPUs has a subtle bug in the !SMP "optimized" variant of the PID stealing function. It clears the PID in the mm context before it calls local_flush_tlb_mm(). However, the later will not flush anything if the PID in the context is clear... Signed-off-by: Hideo Saito <hsaito.ppc@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-05-26 13:46:49 +10:00
Milton Miller	60dbf43851	powerpc: Add 2.06 tlbie mnemonics This adds the PowerPC 2.06 tlbie mnemonics and keeps backwards compatibilty for CPUs before 2.06. Only useful for bare metal systems. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-05-21 15:44:21 +10:00
Ingo Molnar	dc3f81b129	Merge commit 'v2.6.30-rc6' into perfcounters/core Merge reason: this branch was on an -rc4 base, merge it up to -rc6 to get the latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-18 07:37:49 +02:00
Mel Gorman	af3e4aca47	powerpc: Do not assert pte_locked for hugepage PTE entries With CONFIG_DEBUG_VM, an assertion is made when changing the protection flags of a PTE that the PTE is locked. Huge pages use a different pagetable format and the assertion is bogus and will always trigger with a bug looking something like Unable to handle kernel paging request for data at address 0xf1a00235800006f8 Faulting instruction address: 0xc000000000034a80 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=32 NUMA Maple Modules linked in: dm_snapshot dm_mirror dm_region_hash dm_log dm_mod loop evdev ext3 jbd mbcache sg sd_mod ide_pci_generic pata_amd ata_generic ipr libata tg3 libphy scsi_mod windfarm_pid windfarm_smu_sat windfarm_max6690_sensor windfarm_lm75_sensor windfarm_cpufreq_clamp windfarm_core i2c_powermac NIP: c000000000034a80 LR: c000000000034b18 CTR: 0000000000000003 REGS: c000000003037600 TRAP: 0300 Not tainted (2.6.30-rc3-autokern1) MSR: 9000000000009032 <EE,ME,IR,DR> CR: 28002484 XER: 200fffff DAR: f1a00235800006f8, DSISR: 0000000040010000 TASK = c0000002e54cc740[2960] 'map_high_trunca' THREAD: c000000003034000 CPU: 2 GPR00: 4000000000000000 c000000003037880 c000000000895d30 c0000002e5a2e500 GPR04: 00000000a0000000 c0000002edc40880 0000005700000393 0000000000000001 GPR08: f000000011ac0000 01a00235800006e8 00000000000000f5 f1a00235800006e8 GPR12: 0000000028000484 c0000000008dd780 0000000000001000 0000000000000000 GPR16: fffffffffffff000 0000000000000000 00000000a0000000 c000000003037a20 GPR20: c0000002e5f4ece8 0000000000001000 c0000002edc40880 0000000000000000 GPR24: c0000002e5f4ece8 0000000000000000 00000000a0000000 c0000002e5f4ece8 GPR28: 0000005700000393 c0000002e5a2e500 00000000a0000000 c000000003037880 NIP [c000000000034a80] .assert_pte_locked+0xa4/0xd0 LR [c000000000034b18] .ptep_set_access_flags+0x6c/0xb4 Call Trace: [c000000003037880] [c000000003037990] 0xc000000003037990 (unreliable) [c000000003037910] [c000000000034b18] .ptep_set_access_flags+0x6c/0xb4 [c0000000030379b0] [c00000000014bef8] .hugetlb_cow+0x124/0x674 [c000000003037b00] [c00000000014c930] .hugetlb_fault+0x4e8/0x6f8 [c000000003037c00] [c00000000013443c] .handle_mm_fault+0xac/0x828 [c000000003037cf0] [c0000000000340a8] .do_page_fault+0x39c/0x584 [c000000003037e30] [c0000000000057b0] handle_page_fault+0x20/0x5c Instruction dump: 7d29582a 7d200074 7800d182 0b000000 3c004000 3960ffff 780007c6 796b00c4 7d290214 7929a302 1d290068 7d6b4a14 <800b0010> 7c000074 7800d182 0b000000 This patch fixes the problem by not asseting the PTE is locked for VMAs backed by huge pages. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-05-18 15:19:04 +10:00
Becky Bruce	49a8496525	powerpc: Allow mem=x cmdline to work with 4G+ We're currently choking on mem=4g (and above) due to memory_limit being specified as an unsigned long. Make memory_limit phys_addr_t to fix this. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-05-15 16:43:41 +10:00
Ingo Molnar	e7fd5d4b3d	Merge branch 'linus' into perfcounters/core Merge reason: This brach was on -rc1, refresh it to almost-rc4 to pick up the latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-29 14:47:05 +02:00
Stephen Rothwell	b62c31ae40	powerpc: fix for long standing bug noticed by gcc 4.4.0 Previous gcc versions didn't notice this because one of the preceding #ifs always evaluated to true. gcc 4.4.0 produced this error: arch/powerpc/mm/tlb_nohash_low.S:206:6: error: #elif with no expression Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-04-23 08:52:16 -05:00
Kumar Gala	323d23aeac	Revert "powerpc: Add support for early tlbilx opcode" This reverts commit `e996557740`. Our HW guys were able to fix this so it never sees the light of day. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-04-23 08:51:22 -05:00
Michael Ellerman	24f1ce803c	powerpc: Fix crash on CPU hotplug early_init_mmu_secondary() is called at CPU hotplug time, so it must be marked as __cpuinit, not __init. Caused by `757c74d2` ("powerpc/mm: Introduce early_init_mmu() on 64-bit"). Tested-by: Sachin Sant <sachinp@in.ibm.com> Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2009-04-22 14:56:34 +10:00
Peter Zijlstra	78f13e9525	perf_counter: allow for data addresses to be recorded Paul suggested we allow for data addresses to be recorded along with the traditional IPs as power can provide these. For now, only the software pagefault events provide data addresses, but in the future power might as well for some events. x86 doesn't seem capable of providing this atm. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090408130409.394816925@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-08 19:05:56 +02:00
Kumar Gala	52ce67f157	powerpc/mm: Fix compile warning arch/powerpc/mm/tlb_nohash.c: In function 'flush_tlb_mm': arch/powerpc/mm/tlb_nohash.c:128: warning: unused variable 'cpu_mask' Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-04-07 22:11:10 -05:00
Kumar Gala	e996557740	powerpc: Add support for early tlbilx opcode During the ISA 2.06 development the opcode for tlbilx changed and some early implementations used to old opcode. Add support for a MMU_FTR fixup to deal with this. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-04-07 01:36:30 -05:00
Peter Zijlstra	ac17dc8e58	perf_counter: provide major/minor page fault software events Provide separate sw counters for major and minor page faults. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-06 09:29:40 +02:00
Peter Zijlstra	7dd1fcc258	perf_counter: provide pagefault software events We use the generic software counter infrastructure to provide page fault events. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-06 09:29:37 +02:00
Benjamin Herrenschmidt	757c74d298	powerpc/mm: Introduce early_init_mmu() on 64-bit This moves some MMU related init code out of setup_64.c into hash_utils_64.c and calls it early_init_mmu() and early_init_mmu_secondary(). This will make it easier to plug in a new MMU type. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-24 13:47:34 +11:00
Benjamin Herrenschmidt	ff7c660092	powerpc/mm: Fix printk type warning in mmu_context_nohash We need to use %zu instead of %d when printing a sizeof() Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-24 13:47:34 +11:00
Benjamin Herrenschmidt	d62cbf45a8	powerpc/mm: Rename arch/powerpc/kernel/mmap.c to mmap_64.c This file is only useful on 64-bit, so we name it accordingly. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-24 13:47:33 +11:00
Benjamin Herrenschmidt	8d1cf34e7a	powerpc/mm: Tweak PTE bit combination definitions This patch tweaks the way some PTE bit combinations are defined, in such a way that the 32 and 64-bit variant become almost identical and that will make it easier to bring in a new common pte-* file for the new variant of the Book3-E support. The combination of bits defining access to kernel pages are now clearly separated from the combination used by userspace and the core VM. The resulting generated code should remain identical unless I made a mistake. Note: While at it, I removed a non-sensical statement related to CONFIG_KGDB in ppc_mmu_32.c which could cause kernel mappings to be user accessible when that option is enabled. Probably something that bitrot. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-24 13:47:33 +11:00
Rusty Russell	56aa4129e8	cpumask: Use mm_cpumask() wrapper instead of cpu_vm_mask Makes code futureproof against the impending change to mm->cpu_vm_mask. It's also a chance to use the new cpumask_ ops which take a pointer (the older ones are deprecated, but there's no hurry for arch code). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-24 13:47:29 +11:00
Benjamin Herrenschmidt	9e5efaa936	powerpc/mm: Properly wire up get_user_pages_fast() on 32-bit While we did add support for _PAGE_SPECIAL on some 32-bit platforms, we never actually built get_user_pages_fast() on them. This fixes it which requires a little bit of ifdef'ing around. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-11 17:11:34 +11:00
Benjamin Herrenschmidt	1cdab55d8a	powerpc: Wire up /proc/vmallocinfo to our ioremap() This adds the necessary bits and pieces to powerpc implementation of ioremap to benefit from caller tracking in /proc/vmallocinfo, at least for ioremap's done after mem init as the older ones aren't tracked. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-11 17:10:14 +11:00
Kumar Gala	c3071951d0	powerpc/fsl-booke: Add support for tlbilx instructions The e500mc core supports the new tlbilx instructions that do core local invalidates and also provide us the ability to take down all TLB entries matching a given PID. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-03-09 09:25:38 -05:00
Anton Blanchard	002b0ec73d	powerpc: Increase stack gap on 64bit binaries On 64bit there is a possibility our stack and mmap randomisation will put the two close enough such that we can't expand our stack to match the ulimit specified. To avoid this, start the upper mmap address at 1GB + 128MB below the top of our address space, so in the worst case we end up with the same ~128MB hole as in 32bit. This works because we randomise the stack over a 1GB range. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 15:53:21 +11:00
Anton Blanchard	a5adc91a4b	powerpc: Ensure random space between stack and mmaps get_random_int() returns the same value within a 1 jiffy interval. This means that the mmap and stack regions will almost always end up the same distance apart, making a relative offset based attack possible. To fix this, shift the randomness we use for the mmap region by 1 bit. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 15:53:21 +11:00
Anton Blanchard	9f14c42d75	powerpc: Randomise mmap start address Randomise mmap start address - 8MB on 32bit and 1GB on 64bit tasks. Until ppc32 uses the mmap.c functionality, this is ppc64 specific. Before: # ./test & cat /proc/${!}/maps\|tail -2\|head -1 f75fe000-f7fff000 rw-p f75fe000 00:00 0 f75fe000-f7fff000 rw-p f75fe000 00:00 0 f75fe000-f7fff000 rw-p f75fe000 00:00 0 f75fe000-f7fff000 rw-p f75fe000 00:00 0 f75fe000-f7fff000 rw-p f75fe000 00:00 0 After: # ./test & cat /proc/${!}/maps\|tail -2\|head -1 f718b000-f7b8c000 rw-p f718b000 00:00 0 f7551000-f7f52000 rw-p f7551000 00:00 0 f6ee7000-f78e8000 rw-p f6ee7000 00:00 0 f74d4000-f7ed5000 rw-p f74d4000 00:00 0 f6e9d000-f789e000 rw-p f6e9d000 00:00 0 Similar for 64bit, but with 1GB of scatter: # ./test & cat /proc/${!}/maps\|tail -2\|head -1 fffb97b5000-fffb97b6000 rw-p fffb97b5000 00:00 0 fffce9a3000-fffce9a4000 rw-p fffce9a3000 00:00 0 fffeaaf2000-fffeaaf3000 rw-p fffeaaf2000 00:00 0 fffd88ac000-fffd88ad000 rw-p fffd88ac000 00:00 0 fffbc62e000-fffbc62f000 rw-p fffbc62e000 00:00 0 Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 15:53:07 +11:00
Anton Blanchard	13a2cb3694	powerpc: Rearrange mmap.c Rearrange mmap.c to better match the x86 version. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 15:53:06 +11:00
Nathan Fontenot	0f16ef7fd3	powerpc/numa: Cleanup hot_add_scn_to_nid This patch reworks the hot_add_scn_to_nid and its supporting functions to make them easier to understand. There are no functional changes in this patch and has been tested on machine with memory represented in the device tree as memory nodes and in the ibm,dynamic-memory property. My previous patch that introduced support for hotplug memory add on systems whose memory was represented by the ibm,dynamic-memory property of the device tree only left the code more unintelligible. This will hopefully makes things easier to understand. Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 15:53:04 +11:00
Anton Blanchard	13870b6575	powerpc/mm: Reduce hashtable size when using 64kB pages At the moment we size the hashtable based on 4kB pages / 2, even on a 64kB kernel. This results in a hashtable that is much larger than it needs to be. Grab the real page size and size the hashtable based on that Note: This only has effect on non hypervisor machines. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 10:48:58 +11:00
Benjamin Herrenschmidt	3b7faeb49e	Merge commit 'kumar/next' into next	2009-02-18 13:23:30 +11:00
Benjamin Herrenschmidt	82a0a1cc8f	Merge commit 'origin/master' into next Manual merge of: arch/powerpc/include/asm/pgtable-ppc32.h	2009-02-18 13:19:25 +11:00
Dave Hansen	06eccea6c3	powerpc/mm: Fix numa reserve bootmem page selection Fix the powerpc NUMA reserve bootmem page selection logic. commit `8f64e1f2d1` (powerpc: Reserve in bootmem lmb reserved regions that cross NUMA nodes) changed the logic for how the powerpc LMB reserved regions were converted to bootmen reserved regions. As the folowing discussion reports, the new logic was not correct. mark_reserved_regions_for_nid() goes through each LMB on the system that specifies a reserved area. It searches for active regions that intersect with that LMB and are on the specified node. It attempts to bootmem-reserve only the area where the active region and the reserved LMB intersect. We can not reserve things on other nodes as they may not have bootmem structures allocated, yet. We base the size of the bootmem reservation on two possible things. Normally, we just make the reservation start and stop exactly at the start and end of the LMB. However, the LMB reservations are not aware of NUMA nodes and on occasion a single LMB may cross into several adjacent active regions. Those may even be on different NUMA nodes and will require separate calls to the bootmem reserve functions. So, the bootmem reservation must be trimmed to fit inside the current active region. That's all fine and dandy, but we trim the reservation in a page-aligned fashion. That's bad because we start the reservation at a non-page-aligned address: physbase. The reservation may only span 2 bytes, but that those bytes may span two pfns and cause a reserve_size of 2PAGE_SIZE. Take the case where you reserve 0x2 bytes at 0x0fff and where the active region ends at 0x1000. You'll jump into that if() statment, but node_ar.end_pfn=0x1 and start_pfn=0x0. You'll end up with a reserve_size=0x1000, and then call reserve_bootmem_node(node, physbase=0xfff, size=0x1000); 0x1000 may not be on the same node as 0xfff. Oops. In almost all the vm code, end_<anything> is not inclusive. If you have an end_pfn of 0x1234, page 0x1234 is not included in the range. Using PFN_UP instead of the (>> >> PAGE_SHIFT) will make this consistent with the other VM code. We also need to do math for the reserved size with physbase instead of start_pfn. node_ar.end_pfn << PAGE_SHIFT is precisely* the end of the node. However, (start_pfn << PAGE_SHIFT) is NOT precisely the beginning of the reserved area. That is, of course, physbase. If we don't use physbase here, the reserve_size can be made too large. From: Dave Hansen <dave@linux.vnet.ibm.com> Tested-by: Geoff Levand <geoffrey.levand@am.sony.com> Tested on PS3. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-13 16:37:45 +11:00
Kumar Gala	96a8bac589	powerpc/fsl-booke: Fix compile warning arch/powerpc/mm/fsl_booke_mmu.c: In function 'adjust_total_lowmem': arch/powerpc/mm/fsl_booke_mmu.c:221: warning: format '%ld' expects type 'long int', but argument 3 has type 'phys_addr_t' Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-02-12 16:54:53 -06:00
Kumar Gala	d66c82ea45	powerpc/fsl-booke: Add new ISA 2.06 page sizes and MAS defines The Power ISA 2.06 added power of two page sizes to the embedded MMU architecture. Its done it such a way to be code compatiable with the existing HW. Made the minor code changes to support both power of two and power of four page sizes. Also added some new MAS bits and macros that are defined as part of the 2.06 ISA. Renamed some things to use the 'Book-3e' concept to convey the new MMU that is based on the Freescale Book-E MMU programming model. Note, its still invalid to try and use a page size that isn't supported by cpu. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-02-12 16:37:11 -06:00
Kumar Gala	f99fb8a2cb	powerpc/mm: Fix _PAGE_COHERENT support on classic ppc32 HW The following commit: commit `64b3d0e812` Author: Benjamin Herrenschmidt <benh@kernel.crashing.org> Date: Thu Dec 18 19:13:51 2008 +0000 powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED broke setting of the _PAGE_COHERENT bit in the PPC HW PTE. Since we now actually set _PAGE_COHERENT in the Linux PTE we shouldn't be clearing it out before we propogate it to the PPC HW PTE. Reported-by: Martyn Welch <martyn.welch@gefanuc.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-11 16:07:02 +11:00
Benjamin Herrenschmidt	8d30c14cab	powerpc/mm: Rework I$/D$ coherency (v3) This patch reworks the way we do I and D cache coherency on PowerPC. The "old" way was split in 3 different parts depending on the processor type: - Hash with per-page exec support (64-bit and >= POWER4 only) does it at hashing time, by preventing exec on unclean pages and cleaning pages on exec faults. - Everything without per-page exec support (32-bit hash, 8xx, and 64-bit < POWER4) does it for all page going to user space in update_mmu_cache(). - Embedded with per-page exec support does it from do_page_fault() on exec faults, in a way similar to what the hash code does. That leads to confusion, and bugs. For example, the method using update_mmu_cache() is racy on SMP where another processor can see the new PTE and hash it in before we have cleaned the cache, and then blow trying to execute. This is hard to hit but I think it has bitten us in the past. Also, it's inefficient for embedded where we always end up having to do at least one more page fault. This reworks the whole thing by moving the cache sync into two main call sites, though we keep different behaviours depending on the HW capability. The call sites are set_pte_at() which is now made out of line, and ptep_set_access_flags() which joins the former in pgtable.c The base idea for Embedded with per-page exec support, is that we now do the flush at set_pte_at() time when coming from an exec fault, which allows us to avoid the double fault problem completely (we can even improve the situation more by implementing TLB preload in update_mmu_cache() but that's for later). If for some reason we didn't do it there and we try to execute, we'll hit the page fault, which will do a minor fault, which will hit ptep_set_access_flags() to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make this guys also perform the I/D cache sync for exec faults now. This second path is the catch all for things that weren't cleaned at set_pte_at() time. For cpus without per-pag exec support, we always do the sync at set_pte_at(), thus guaranteeing that when the PTE is visible to other processors, the cache is clean. For the 64-bit hash with per-page exec support case, we keep the old mechanism for now. I'll look into changing it later, once I've reworked a bit how we use _PAGE_EXEC. This is also a first step for adding _PAGE_EXEC support for embedded platforms Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-11 16:00:10 +11:00
Anton Blanchard	91b0f5ec53	powerpc/mm: Move 64-bit unmapped_area to top of address space We currently place mmaps just below the stack on 32bit, but leave them in the middle of the address space on 64bit: 00100000-00120000 r-xp 00100000 00:00 0 [vdso] 10000000-10010000 r-xp 00000000 08:06 179534 /tmp/sleep 10010000-10020000 rw-p 00000000 08:06 179534 /tmp/sleep 10020000-10130000 rw-p 10020000 00:00 0 [heap] 40000000000-40000030000 r-xp 00000000 08:06 440743 /lib64/ld-2.9.so 40000030000-40000040000 rw-p 00020000 08:06 440743 /lib64/ld-2.9.so 40000050000-400001f0000 r-xp 00000000 08:06 440671 /lib64/libc-2.9.so 400001f0000-40000200000 r--p 00190000 08:06 440671 /lib64/libc-2.9.so 40000200000-40000220000 rw-p 001a0000 08:06 440671 /lib64/libc-2.9.so 40000220000-40008230000 rw-p 40000220000 00:00 0 fffffbc0000-fffffd10000 rw-p fffffeb0000 00:00 0 [stack] Right now it isn't an issue, but at some stage we will run into mmap or hugetlb allocation issues. Using the same layout as 32bit gives us a some breathing room. This matches what x86-64 is doing too. 00100000-00103000 r-xp 00100000 00:00 0 [vdso] 10000000-10001000 r-xp 00000000 08:06 554894 /tmp/test 10010000-10011000 r--p 00000000 08:06 554894 /tmp/test 10011000-10012000 rw-p 00001000 08:06 554894 /tmp/test 10012000-10113000 rw-p 10012000 00:00 0 [heap] fffefdf7000-ffff7df8000 rw-p fffefdf7000 00:00 0 ffff7df8000-ffff7f97000 r-xp 00000000 08:06 130591 /lib64/libc-2.9.so ffff7f97000-ffff7fa6000 ---p 0019f000 08:06 130591 /lib64/libc-2.9.so ffff7fa6000-ffff7faa000 r--p 0019e000 08:06 130591 /lib64/libc-2.9.so ffff7faa000-ffff7fc0000 rw-p 001a2000 08:06 130591 /lib64/libc-2.9.so ffff7fc0000-ffff7fc4000 rw-p ffff7fc0000 00:00 0 ffff7fc4000-ffff7fec000 r-xp 00000000 08:06 130663 /lib64/ld-2.9.so ffff7fee000-ffff7ff0000 rw-p ffff7fee000 00:00 0 ffff7ffa000-ffff7ffb000 rw-p ffff7ffa000 00:00 0 ffff7ffb000-ffff7ffc000 r--p 00027000 08:06 130663 /lib64/ld-2.9.so ffff7ffc000-ffff7fff000 rw-p 00028000 08:06 130663 /lib64/ld-2.9.so ffff7fff000-ffff8000000 rw-p ffff7fff000 00:00 0 fffffc59000-fffffc6e000 rw-p ffffffeb000 00:00 0 [stack] Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-11 16:00:07 +11:00
Milton Miller	8b16cd238d	powerpc/numa: Remove redundant find_cpu_node() Use of_get_cpu_node, which is a superset of numa.c's find_cpu_node in a less restrictive section (text vs cpuinit). Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-11 13:37:59 +11:00
Milton Miller	20fcefe5a0	powerpc/numa: Avoid possible reference beyond prop. length in find_min_common_depth() find_min_common_depth() was checking the property length incorrectly. The value is in bytes not cells, and it is using the second entry. Signed-off-By: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-11 13:37:58 +11:00
Benjamin Herrenschmidt	edbc29d76d	Merge commit 'kumar/next' into next	2009-02-11 13:37:44 +11:00
Kumar Gala	6c24b17453	powerpc/fsl-booke: Fix mapping functions to use phys_addr_t Fixed v_mapped_by_tlbcam() and p_mapped_by_tlbcam() to use phys_addr_t instead of unsigned long. In 36-bit physical mode we really need these functions to deal with phys_addr_t when trying to match a physical address or when returning one. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-02-09 21:11:55 -06:00
Trent Piepho	96051465fd	powerpc/fsl-booke: Make CAM entries used for lowmem configurable On booke processors, the code that maps low memory only uses up to three CAM entries, even though there are sixteen and nothing else uses them. Make this number configurable in the advanced options menu along with max low memory size. If one wants 1 GB of lowmem, then it's typically necessary to have four CAM entries. Signed-off-by: Trent Piepho <tpiepho@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-01-28 18:16:54 -06:00
Trent Piepho	c8f3570b7e	powerpc/fsl-booke: Allow larger CAM sizes than 256 MB The code that maps kernel low memory would only use page sizes up to 256 MB. On E500v2 pages up to 4 GB are supported. However, a page must be aligned to a multiple of the page's size. I.e. 256 MB pages must aligned to a 256 MB boundary. This was enforced by a requirement that the physical and virtual addresses of the start of lowmem be aligned to 256 MB. Clearly requiring 1GB or 4GB alignment to allow pages of that size isn't acceptable. To solve this, I simply have adjust_total_lowmem() take alignment into account when it decides what size pages to use. Give it PAGE_OFFSET = 0x7000_0000, PHYSICAL_START = 0x3000_0000, and 2GB of RAM, and it will map pages like this: PA 0x3000_0000 VA 0x7000_0000 Size 256 MB PA 0x4000_0000 VA 0x8000_0000 Size 1 GB PA 0x8000_0000 VA 0xC000_0000 Size 256 MB PA 0x9000_0000 VA 0xD000_0000 Size 256 MB PA 0xA000_0000 VA 0xE000_0000 Size 256 MB Because the lowmem mapping code now takes alignment into account, PHYSICAL_ALIGN can be lowered from 256 MB to 64 MB. Even lower might be possible. The lowmem code will work down to 4 kB but it's possible some of the boot code will fail before then. Poor alignment will force small pages to be used, which combined with the limited number of TLB1 pages available, will result in very little memory getting mapped. So alignments less than 64 MB probably aren't very useful anyway. Signed-off-by: Trent Piepho <tpiepho@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-01-28 18:16:53 -06:00
Trent Piepho	f88747e7f6	powerpc/fsl-booke: Remove code duplication in lowmem mapping The code to map lowmem uses three CAM aka TLB[1] entries to cover it. The size of each is stored in three globals named __cam0, __cam1, and __cam2. All the code that uses them is duplicated three times for each of the three variables. We have these things called arrays and loops.... Once converted to use an array, it will be easier to make the number of CAMs configurable. Signed-off-by: Trent Piepho <tpiepho@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-01-28 18:16:51 -06:00
Gerhard Pircher	4c456a67f5	powerpc/mm: Fix handling of _PAGE_COHERENT in BAT setup code _PAGE_COHERENT is now always set in _PAGE_RAM resp. PAGE_KERNEL. Thus it has to be masked out, if the BAT mapping should be non cacheable or CPU_FTR_NEED_COHERENT is not set. This will work on normal SMP setups because we force-set CPU_FTR_NEED_COHERENT as part of CPU_FTR_COMMON on SMP. Signed-off-by: Gerhard Pircher <gerhard_pircher@gmx.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-28 17:15:52 +11:00

1 2 3 4 5 ...

590 Commits