Currently it doesn't matter where the mdio nodes are placed, but with
power management support (i.e. when sleep = <> properties will take
effect), mdio nodes placement will become important: mdio controller
is a part of the ethernet block, so the mdio nodes should be placed
correctly. Otherwise we may wrongly assume that MDIO controllers are
available during sleep.
Suggested-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Currently it doesn't matter where the mdio nodes are placed, but with
power management support (i.e. when sleep = <> properties will take
effect), mdio nodes placement will become important: mdio controller
is a part of the ethernet block, so the mdio nodes should be placed
correctly. Otherwise we may wrongly assume that MDIO controllers are
available during sleep.
Suggested-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Currently it doesn't matter where the mdio nodes are placed, but with
power management support (i.e. when sleep = <> properties will take
effect), mdio nodes placement will become important: mdio controller
is a part of the ethernet block, so the mdio nodes should be placed
correctly. Otherwise we may wrongly assume that MDIO controllers are
available during sleep.
Suggested-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
This patch adds pmc nodes to the device tree files so that the boards
will able to use standby capability of MPC837x processors. The MPC837x
PMC controllers are compatible with MPC8349 ones (i.e. no deep sleep).
sleep = <> properties are used to specify SCCR masks as described
in "Specifying Device Power Management Information (sleep property)"
chapter in Documentation/powerpc/booting-without-of.txt.
Since I2C1 and eSDHC controllers share the same clock source, they
are now placed under sleep-nexus nodes.
A processor is able to wakeup the boards on LAN events (Wake-On-Lan),
console events (with no_console_suspend kernel command line), GPIO
events and external IRQs (IRQ1 and IRQ2).
The processor can also wakeup the boards by the fourth general purpose
timer in GTM1 block, but the GTM wakeup support isn't yet implemented
(it's tested to work, but it's unclear how can we use the quite short
GTM timers, and how do we want to expose the GTM to userspace).
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
TLB entry should enable memory coherence in SMP.
And like commit 631fba9dd3aca519355322cef035730609e91593,
remove guard attribute to enable the prefetch of guest memory.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Should clear and then update the next victim area here.
Guest kernel only read TLB1 when startup kernel,
this bug result in an extra 4K TLB1 mapping in guest from 0x0 to 0x0.
As the problem has no impact to bootup a guest,
we didn't notice it before.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
So, KVM needs to read tlbcam_index to know exactly
which TLB1 entry is unused by host.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Two KVM archs support irqchips and two don't. Add a Kconfig item to
make selecting between the two models easier.
Signed-off-by: Avi Kivity <avi@redhat.com>
After the rewrite of KVM's debug support, this code doesn't even build any
more.
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Although BOOKE_MAX_INTERRUPT has the right value, the meaning is not match.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
When itlb or dtlb miss happens, E500 needs to update some mmu registers.
So that the auto-load mechanism can work on E500 when write a tlb entry.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
E500 deosn't support this instruction.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Kernel for E500 need clear dbsr when startup.
So add dbsr register in kvm_vcpu_arch for BOOKE.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The Book E code will be shared with e500.
I've left PID in kvmppc_core_emulate_op() just so that we don't need to move
kvmppc_set_pid() right now. Once we have the e500 implementation, we can
probably share that too.
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Passing just the TLB index will ease an e500 implementation.
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Remove the remaining arch fragments of the old guest debug interface
that now break non-x86 builds.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This rips out the support for KVM_DEBUG_GUEST and introduces a new IOCTL
instead: KVM_SET_GUEST_DEBUG. The IOCTL payload consists of a generic
part, controlling the "main switch" and the single-step feature. The
arch specific part adds an x86 interface for intercepting both types of
debug exceptions separately and re-injecting them when the host was not
interested. Moveover, the foundation for guest debugging via debug
registers is layed.
To signal breakpoint events properly back to userland, an arch-specific
data block is now returned along KVM_EXIT_DEBUG. For x86, the arch block
contains the PC, the debug exception, and relevant debug registers to
tell debug events properly apart.
The availability of this new interface is signaled by
KVM_CAP_SET_GUEST_DEBUG. Empty stubs for not yet supported archs are
provided.
Note that both SVM and VTX are supported, but only the latter was tested
yet. Based on the experience with all those VTX corner case, I would be
fairly surprised if SVM will work out of the box.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This moves some MMU related init code out of setup_64.c into hash_utils_64.c
and calls it early_init_mmu() and early_init_mmu_secondary(). This will
make it easier to plug in a new MMU type.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
ppc32 has it already, add it to ppc64 as a preliminary for adding
support for Book3E 64-bit support
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Now that they are almost identical, we can merge some of the definitions
related to the PTE format into common files.
This creates a new pte-common.h which is included by both 32 and 64-bit
right after the CPU specific pte-*.h file, and which defines some
bits to "default" values if they haven't been defined already, and
then provides a generic definition of most of the bit combinations
based on these and exposed to the rest of the kernel.
I also moved to the common pgtable.h most of the "small" accessors to the
PTE bits and modification helpers (pte_mk*). The actual accessors remain
in their separate files.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch tweaks the way some PTE bit combinations are defined, in such a
way that the 32 and 64-bit variant become almost identical and that will
make it easier to bring in a new common pte-* file for the new variant
of the Book3-E support.
The combination of bits defining access to kernel pages are now clearly
separated from the combination used by userspace and the core VM. The
resulting generated code should remain identical unless I made a mistake.
Note: While at it, I removed a non-sensical statement related to CONFIG_KGDB
in ppc_mmu_32.c which could cause kernel mappings to be user accessible when
that option is enabled. Probably something that bitrot.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, we will report a page fault as a segment fault, and report
a segment fault as both a page and segment fault.
Fix the SPF_P definition to be correct according to the iommu docs, and
mask before comparing.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Complete workaround for DTLB errata in e300c2/c3/c4 processors.
Due to the bug, the hardware-implemented LRU algorythm always goes to way
1 of the TLB. This fix implements the proposed software workaround in
form of a LRW table for chosing the TLB-way.
Based on patch from David Jander <david@protonic.nl>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Now that r0 is free we can keep the value of I/DMISS in r3 and not reload
it before doing the tlbli/d. This saves us a few cycles in the fast path
case.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Long ago we had some code that actually used the CTR in the SW TLB
miss handlers (603/e300). Since we don't use it no reason to waste
cycles saving it off and restoring it (we actually didn't restore it
in the fast path case).
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Now that we set archdata for of_platform and platform devices via
platform_notify() we no longer need to special case having a NULL device
pointer or NULL archdata. It should be a driver error if this condition
shows up and the driver should be fixed.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Since a number of powerpc chips are SoCs we end up having dma-able
devices that are registered as platform or of_platform devices. We need
to hook the archdata to setup proper dma_ops for these devices.
Rather than having to add a bus_notify to each platform we add a default
one at the highest priority (called first) to set the default dma_ops for
of_platform and platform devices to dma_direct_ops. This allows platform
code to override the ops by providing their own notifier call back.
In the future to enable >4G DMA support on ppc32 we can hook swiotlb ops.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This will allow us to remove the ppc32 specific checks in get_dma_ops()
that defaults to dma_direct_ops if the archdata is NULL. We really
should always have archdata set to something going forward.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit bedd30d986 ("genirq: make irqreturn_t
an enum") from the genirq tree in next-20090319 caused this new warning:
arch/powerpc/sysdev/pmi.c: In function 'pmi_of_probe':
arch/powerpc/sysdev/pmi.c:166: warning: passing argument 2 of 'request_irq' from incompatible pointer type
Change the return type of the handler from "int" to "irqreturn_t".
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Makes code futureproof against the impending change to mm->cpu_vm_mask.
It's also a chance to use the new cpumask_ ops which take a pointer
(the older ones are deprecated, but there's no hurry for arch code).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Most of the code enabled by these options is __init, and it's much
more useful to actually run the tests.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
pseries SPLPAR machines are able to retrieve a log of dispatch and
preempt events from the hypervisor. With this information, we can
see when and why each dispatch & preempt is occuring.
This change adds a set of debugfs files allowing userspace to read this
dispatch log.
Based on initial patches from Nishanth Aravamudan <nacc@us.ibm.com>.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
PAPR v2.3 defines fields in the virtual processor area for a dispatch
trace log (DLT). Since we'd like to use the DLT, add the necessary
fields to struct lppaca.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The page_ins member ends at byte 0x3, not 0x4. Also, fix up the
alignment.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: performance improvement
This fixes 'powerpc: avoid cpumask games in arch/powerpc/kernel/sysfs.c'
which talked about using smp_call_function_single, but actually used
work_on_cpu (an older version of the patch).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The return code from invoking the notifier chain when updating the
ibm,dynamic-memory property is not handled properly. In failure
cases (rc == NOTIFY_BAD) we should be restoring the original value
of the property. In success (rc == NOTIFY_OK) we should be returning
zero from the calling routine.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit e7943fbbfd broke ppc32 using
Open Firmware client interface due to using the wrong relocation
macro when accessing the variable "linux_banner".
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Grant picked up the wrong version of "Respect _PAGE_COHERENT on classic
ppc32 SW" (commit a4bd6a93c3)
It was missing the code to actually deal with the fixup of
_PAGE_COHERENT based on the CPU feature.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Add the new API pci_enable_msi_block() to allow drivers to
request multiple MSI and reimplement pci_enable_msi in terms of
pci_enable_msi_block. Ensure that the architecture back ends don't
have to know about multiple MSI.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This updates the 32-bit headers to use the same definitions for the RPN
shift inside the PTE as 64-bit, and thus updates _PAGE_CHG_MASK to
become identical.
This does introduce a runtime visible difference, which is that now,
_PAGE_HASHPTE will be part of _PAGE_CHG_MASK and thus preserved. However
this should have no practical effect as it should have been preserved in
the first place and we got away with not having it there due to our
PTE access functions preserving it anyway.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch moves the definition of the PTE format for each MMU type
to separate files instead of all in one file. This improves overall
maintainability and will make it easier to add new types.
On 64-bit, additionally, I've separated the headers relative to the
format of the page table tree (3 vs. 4 levels for 64K vs 4K pages)
from the headers specific to the PTE format for hash based processors,
this will make it easier to add support for Book3 "E" 64-bit
implementations.
There are still some type-related ifdef's in the generic headers,
we might remove them in the long run, but this patch shouldn't result
in any code change, -hopefully- just definitions being moved around.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Weak functions aren't all they're cracked up to be. They lead to
incorrect binaries with some toolchains, they require us to have empty
functions we otherwise wouldn't, and the unused code is not elided
(as of gcc 4.3.2 anyway).
So replace the weak MSI arch hooks with the #define foo foo idiom. We no
longer need empty versions of arch_setup/teardown_msi_irq().
This is less source (by 1 line!), and results in smaller binaries too:
text data bss dec hex filename
9354300 1693916 678424 11726640 b2ef30 build/powerpc/vmlinux-before
9354052 1693852 678424 11726328 b2edf8 build/powerpc/vmlinux-after
Also smaller on x86_64 and arm (iop13xx).
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Impact: build fix for powerpc and sparc
Today's linux-next build (powerpc allyesconfig) failed like this:
> In file included from include/linux/mmzone.h:776,
> from include/linux/gfp.h:5,
> from include/linux/kmod.h:23,
> from include/linux/module.h:14,
> from init/version.c:11:
> arch/powerpc/include/asm/mmzone.h:32: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'numa_cpumask_lookup_table'
Caused by commit 082edb7bf4 ("numa,
cpumask: move numa_node_id default implementation to topology.h") from
the cpus4096 tree which removed the include of linux/topology.h from
linux/mmzone.h.
Same for sparc64 defconfig.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-b: Rusty Russell <rusty@rustcorp.com.au>
Cc: ppc-dev <linuxppc-dev@ozlabs.org>
LKML-Reference: <20090319220322.3baa4613.sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Support for the PPC9A VME Single Board Computer from GE Fanuc (PowerPC
MPC8641D).
This is the default config file for GE Fanuc's PPC9A, a 6U single board
computer, based on Freescale's MPC8641D.
Signed-off-by: Martyn Welch <martyn.welch@gefanuc.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Support for the PPC9A VME Single Board Computer from GE Fanuc (PowerPC
MPC8641D).
This is the basic board support for GE Fanuc's PPC9A, a 6U single board
computer, based on Freescale's MPC8641D.
Signed-off-by: Martyn Welch <martyn.welch@gefanuc.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Building the fs_enet driver as a modules fails because it cannot
access the global cpm2_immr symbol.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Patch to limit NEC fixup to SBC310, following similar patch to SBC610 by
Tony Breeds: 368a12117d
Signed-off-by: Martyn Welch <martyn.welch@gefanuc.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Since we now set _PAGE_COHERENT in the Linux PTE we shouldn't be clearing
it out before we setup the SW TLB. Today all the SW TLB machines
(603/e300) that we support are non-SMP, however there are some errata on
some devices that cause us to set _PAGE_COHERENT via CPU_FTR_NEED_COHERENT.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
BestComm, a DMA engine in MPC52xx SoC, requires snooping when
CPU caches are enabled to work properly.
Adding CPU_FTR_NEED_COHERENT fixes NFS problems on MPC52xx machines
introduced by 'powerpc/mm: Fix handling of _PAGE_COHERENT in BAT setup
code' (sha1: 4c456a67f5).
Signed-off-by: Piotr Ziecik <kosmo@semihalf.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
This new compiler warning:
arch/powerpc/platforms/cell/interrupt.c: In function 'handle_iic_irq':
arch/powerpc/platforms/cell/interrupt.c:240: warning: unused variable 'cpu'
Triggers because the local variable 'cpu' became unused due to commit:
dee4102: sparseirq: use kstat_irqs_cpu instead
Remove the variable.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: ppc-dev <linuxppc-dev@ozlabs.org>
LKML-Reference: <20090316185256.4a160374.sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Convert the PS3 Video RAM Storage Driver from an MTD driver to a plain block
device driver.
The ps3vram driver exposes unused video RAM on the PS3 as a block device
suitable for storage or swap. Fast data transfer is achieved using a local
cache in system RAM and DMA transfers via the GPU.
The new driver is ca. 50% faster for reading, and ca. 10% for writing.
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
fixed-head.o must be linked into the bootwrapper for raw-binary images to
work. This patch adds it into the bootwrapper.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Reported-by: Eddie Dawydiuk <eddie@embeddedarm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch adds the utility function mpc52xx_get_xtal_freq() to get
the frequency of the external oscillator clock connected to the pin
SYS_XTAL_IN. The MSCAN may us it as clock source. Unfortunately, this
value is not available from the FDT blob, but it can be determined
from the IPB frequency.
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Remove poorly designed debug sysfs attribute entry from the GPT driver.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Use device tree to determine if we actually have an MPIC and use
CPU feature to decide if we should use doorbells for IPIs.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
There is no dependece between efp and math-emu. But when disable math-emu
the efp code cannot be built.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
setup_irq(0, NULL) is broken as setup_irq() dereferences action
unconditionally.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The PCI irqs for the protected sources where not correct for PCI PHBs
Signed-off-by: Ted Peters <ted.peters@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
CONFIG_PPC_MULTIPLATFORM is a remain of the pre-powerpc days and isn't
really meaningful anymore. It was basically equivalent to PPC64 || 6xx.
This removes it along with the following changes:
- 32-bit platforms that relied on PPC32 && PPC_MULTIPLATFORM now rely
on 6xx which is what they want anyway.
- A new symbol, PPC_BOOK3S, is defined that represent compliance with
the "Server" variant of the architecture. This is set when either 6xx
or PPC64 is set and open the door for future BOOK3E 64-bit.
- 64-bit platforms that relied on PPC64 && PPC_MULTIPLATFORM now use
PPC64 && PPC_BOOK3S
- A separate and selectable CONFIG_PPC_OF_BOOT_TRAMPOLINE option is now
used to control the use of prom_init.c
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
While we did add support for _PAGE_SPECIAL on some 32-bit platforms,
we never actually built get_user_pages_fast() on them. This fixes
it which requires a little bit of ifdef'ing around.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: cleanup
Convert the last remaining users to struct irq_chip.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: linuxppc-dev@ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: cleanup
Convert the last remaining users.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: linuxppc-dev@ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When the console is on a serial port to be driven by serial8250, a
character can be lost from the end of the first line in the two-line
sequence
serial8250.0: ttyS0 at MMIO 0xe0004500 (irq = 42) is a 16550A
console handover: boot [udbg0] -> real [ttyS0]
This happens because udbg_puts or udbg_write stuff the last byte of
the line into the Tx FIFO and return, whereupon the serial8250
initialization code immediately empties that FIFO. The fix: udbg_puts
and udbg_write now wait for the Tx FIFO to clear before returning.
This delays the system by one additional serial frame time for each
line written by udbg, but the effect is not noticeable, a cumulative
17 milliseconds for 200 lines of early printk output at 115200 baud.
Also, the routines in udbg_16550.c now emit CRLF instead of LFCR.
Linux makes a point of emitting CRLF because, when serial output is
captured to a file, LFCR sequences can confuse text editors. See
http://lkml.org/lkml/2006/2/4/50 for some history.
Signed-off-by: Andrew Klossner <andrew@cesa.opbu.xerox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Fix typo: s/resouces/resources/ in a pr_debug
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The Axon MSI driver depends on more than just PCI_MSI, so add a
Kconfig fragment for it. Fixes randconfig build failures.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There's no way for us to express to firmware that we want a
discontiguous, or non-zero based, range of MSI-X entries. So we
must reject such requests.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
So at least you can see what kernel you're booting if you die
before the kernel prints it mid-way through start_kernel().
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch enables oprofile for all 3 FX variants and GX variant of the
750 processor.
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We need to offset by *pos bytes, not *pos words.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Based on an original patch from Roel Kluin <roel.kluin@gmail.com>.
The write size calculated during regs and fpcr writes may currently
go negative. Because size is unsigned, this will wrap, and our
check for EFBIG will fail.
Instead, do the check for EFBIG before subtracting from size.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Change the ps3av_auto_videomode() mode id argument type from unsigned to
signed so a negative id can be detected and reported as an -EINVAL failure.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To help users diagnose hotpug memory problems, change the
printing of memory hotplug errors from DBG() to pr_err().
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The powerpc 64 bit architecture defines three flags for the
DABR (Data Address Breakpoint Register). Add definitions
for the currently missing DABR_DATA_WRITE and DABR_DATA_READ
flags to the powerpc reg.h file.
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add macros for the GS (guest state) bit to the list of MSR bit definitions.
On PowerPC cores that support embedded hypervisor mode, GS is cleared if
the system is running in hypervisor state (and MSR[PR] is cleared), and set
if it's running in guest state. See the Power ISA 2.06 specification for
more information.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
For ppc750 processors which use 4 performance counters instead of the
6 G4 uses but otherwise is compatible with G4.
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
spuctx_switch_state() warns if ktime goes backwards, but it
sometimes compares an uninitialized value, which showed that
the data was unreliable when we actually saw the warning.
Initialize it to the current time in order to get correct data.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When identify_cpu() is called a second time with a logical PVR, it
only copies a subset of the cpu_spec fields so as to avoid overwriting
the performance monitor fields that were initialized based on the
real PVR.
However some of the other, non performance monitor related fields are
also not copied:
* pvr_mask
* pvr_value
* mmu_features
* machine_check
The fact that pvr_mask is not copied can result in show_cpuinfo()
showing the cpu as "unknown", if we override an unknown PVR with a
logical one - as reported by Shaggy.
So change the logic to copy all fields, and then put back the PMC
related ones in the case that we're overwriting a real PVR with a
logical one.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The for-loop body of identify_cpu() has gotten a little big, so move the
loop body logic into a separate function. No other changes.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds the necessary bits and pieces to powerpc implementation of
ioremap to benefit from caller tracking in /proc/vmallocinfo, at least
for ioremap's done after mem init as the older ones aren't tracked.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Setting G5's cpu frequency transition latency to CPUFREQ_ETERNAL stops
ondemand governor from working. I measured the latency using sched_clock
and haven't seen much higher than 11000ns, so I set this to 12000ns for
my configuration. Possibly other configurations will be different?
Ideally the generic code would be able to measure it in case the platform
does not provide it.
But this simple patch at least makes it throttle again.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: __per_cpu_load available on all SMP capable archs
Percpu now requires three symbols to be defined - __per_cpu_load,
__per_cpu_start and __per_cpu_end. There were three archs which
didn't have it. Update them as follows.
* powerpc: can use generic PERCPU() macro. Compile tested for
powerpc32, compile/boot tested for powerpc64.
* ia64: can use generic PERCPU_VADDR() macro. __phys_per_cpu_start is
identical to __per_cpu_load. Compile tested and symbol table looks
identical after the change except for the additional __per_cpu_load.
* arm: added explicit __per_cpu_load definition. Currently uses
unified .init output section so can't use the generic macro. Dunno
whether the unified .init ouput section is required by arch
peculiarity so I left it alone. Please break it up and use PERCPU()
if possible.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Pat Gefre <pfg@sgi.com>
Cc: Russell King <rmk@arm.linux.org.uk>
The registers for the local bus are incorrectly set to 0xf8005000 rather
than there actual location of 0xfef05000.
Signed-off-by: Martyn Welch <martyn.welch@gefanuc.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Defining flash partition table in platform code is deprecated, and due to
recent changes linkstation and storcenter do not compile any more with
their default configurations because of undefined references to
physmap_set_partitions(). Instead of fixing them by using the correct
kernel configuration macro in preprocessor conditional, remove partition
table definitions altogether. Instead add support for partition definition
on the command-line and in device tree to the default configurations.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The Freescale Serial Synchronous Interface (SSI) is an audio device present on
some Freescale SOCs. Various implementations of the SSI have a different
transmit and receive FIFO depth, but are otherwise identical. To support
these variations, add a new property fsl,fifo-depth to the SSI node that
specifies the depth of the FIFOs.
Also update the MPC8610 HPCD device tree with this property.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The e500mc core supports the new tlbilx instructions that do core
local invalidates and also provide us the ability to take down
all TLB entries matching a given PID.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
On MPC837X CPUs Dual-Role USB isn't always available (for example DR
USB pins can be muxed away to eSDHC).
U-Boot adds status = "disabled" property into the DR USB nodes to
indicate that we must not try to configure or probe Dual-Role USB,
otherwise we'll break eSDHC support on targets with MPC837X CPUs.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The interrupt pending register is write 1 clear. If there are more than
one external interrupts pending at the same time, acking the first
interrupt by reading pending register then OR the corresponding bit and
write back to pending register will also clear other interrupt pending
bits. That will cause loss of interrupt.
Signed-off-by: Da Yu <dayu@datangmobile.cn>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
This patch updates the Xilinx ML507 device tree to match the released
ML507 powerpc reference design (ml507_ppc440_emb_ref). This patch is
needed to boot Linux on the ML507 powerpc reference design without
manually generating and tweaking a device tree from the project directory.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
commit a969e76a71 (powerpc: Correct USB
support for GE Fanuc SBC610) introduced a fixup for NEC usb controllers.
This fixup should only run on GEF SBC610 boards.
Fixes Fedora bug #486511.
(https://bugzilla.redhat.com/show_bug.cgi?id=486511)
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On x86-64, a 32-bit process (TIF_IA32) can switch to 64-bit mode with
ljmp, and then use the "syscall" instruction to make a 64-bit system
call. A 64-bit process make a 32-bit system call with int $0x80.
In both these cases under CONFIG_SECCOMP=y, secure_computing() will use
the wrong system call number table. The fix is simple: test TS_COMPAT
instead of TIF_IA32. Here is an example exploit:
/* test case for seccomp circumvention on x86-64
There are two failure modes: compile with -m64 or compile with -m32.
The -m64 case is the worst one, because it does "chmod 777 ." (could
be any chmod call). The -m32 case demonstrates it was able to do
stat(), which can glean information but not harm anything directly.
A buggy kernel will let the test do something, print, and exit 1; a
fixed kernel will make it exit with SIGKILL before it does anything.
*/
#define _GNU_SOURCE
#include <assert.h>
#include <inttypes.h>
#include <stdio.h>
#include <linux/prctl.h>
#include <sys/stat.h>
#include <unistd.h>
#include <asm/unistd.h>
int
main (int argc, char **argv)
{
char buf[100];
static const char dot[] = ".";
long ret;
unsigned st[24];
if (prctl (PR_SET_SECCOMP, 1, 0, 0, 0) != 0)
perror ("prctl(PR_SET_SECCOMP) -- not compiled into kernel?");
#ifdef __x86_64__
assert ((uintptr_t) dot < (1UL << 32));
asm ("int $0x80 # %0 <- %1(%2 %3)"
: "=a" (ret) : "0" (15), "b" (dot), "c" (0777));
ret = snprintf (buf, sizeof buf,
"result %ld (check mode on .!)\n", ret);
#elif defined __i386__
asm (".code32\n"
"pushl %%cs\n"
"pushl $2f\n"
"ljmpl $0x33, $1f\n"
".code64\n"
"1: syscall # %0 <- %1(%2 %3)\n"
"lretl\n"
".code32\n"
"2:"
: "=a" (ret) : "0" (4), "D" (dot), "S" (&st));
if (ret == 0)
ret = snprintf (buf, sizeof buf,
"stat . -> st_uid=%u\n", st[7]);
else
ret = snprintf (buf, sizeof buf, "result %ld\n", ret);
#else
# error "not this one"
#endif
write (1, buf, ret);
syscall (__NR_exit, 1);
return 2;
}
Signed-off-by: Roland McGrath <roland@redhat.com>
[ I don't know if anybody actually uses seccomp, but it's enabled in
at least both Fedora and SuSE kernels, so maybe somebody is. - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Virtex FPGA designs have two serial port logic cores to choose from; the
simple uartlite, and the full featured uart16550. Both cores are in
common use so the defconfig should support both of them. Currently
only console on uartlite is supported in the defconfig. This patch adds
console support for the 16550 core.
The Virtex reference designs do not work without this patch.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
To better match the ePAPR specification, device nodes which claim
"simple-bus" compatibility should be probed by default.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
On digsy MTC PSC4 and PSC5 should be configured as UART, not PSC3 and PSC4.
Signed-off-by: Grzegorz Bernacki <gjb@semihalf.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
The following options are enabled to support the digsy-mtc.
- LXT phy
- AT24 eeprom
- RTC (DS1337)
- MTD partitioning based on OF description
Signed-off-by: Grzegorz Bernacki <gjb@semihalf.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
The PCI 2.x cells used on some 44x SoCs only let us configure the decode
for the low 32-bit of the incoming PLB addresses. The top 4 bits (this
is a 36-bit bus) are hard wired to different values depending on the
specific SoC in use. Our code used to work "by accident" until I added
support for the ISA memory holes and while at it added more validity
checking of the addresses.
This patch should bring it back to working condition. It still relies
on the device-tree being correct but that's somewhat a pre-requisite
for anything to work anyway.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
This fixes a regression introduced by commit
a4e22f02f5 ("powerpc: Update 64bit
__copy_tofrom_user() using CPU_FTR_UNALIGNED_LD_STD").
The same bug that existed in the 64bit memcpy() also exists here so fix
it here too. The fix is the same as that applied to memcpy() with the
addition of fixes for the exception handling code required for
__copy_tofrom_user().
This stops us reading beyond the end of the source region we were told
to copy.
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This fixes a regression introduced by commit
25d6e2d7c5 ("powerpc: Update 64bit memcpy()
using CPU_FTR_UNALIGNED_LD_STD").
This commit allowed CPUs that have the CPU_FTR_UNALIGNED_LD_STD CPU
feature bit present to do the memcpy() with unaligned load doubles. But,
along with this came a bug where our final load double would read bytes
beyond a page boundary and into the next (unmapped) page. This was caught
by enabling CONFIG_DEBUG_PAGEALLOC,
The fix was to read only the number of bytes that we need to store rather
than reading a full 8-byte doubleword and storing only a portion of that.
In order to minimise the amount of existing code touched we use the
original do_tail for the src_unaligned case.
Below is an example of the regression, as reported by Sachin Sant:
Unable to handle kernel paging request for data at address 0xc00000003f380000
Faulting instruction address: 0xc000000000039574
cpu 0x1: Vector: 300 (Data Access) at [c00000003baf3020]
pc: c000000000039574: .memcpy+0x74/0x244
lr: d00000000244916c: .ext3_xattr_get+0x288/0x2f4 [ext3]
sp: c00000003baf32a0
msr: 8000000000009032
dar: c00000003f380000
dsisr: 40000000
current = 0xc00000003e54b010
paca = 0xc000000000a53680
pid = 1840, comm = readahead
enter ? for help
[link register ] d00000000244916c .ext3_xattr_get+0x288/0x2f4 [ext3]
[c00000003baf32a0] d000000002449104 .ext3_xattr_get+0x220/0x2f4 [ext3]
(unreliab
le)
[c00000003baf3390] d00000000244a6e8 .ext3_xattr_security_get+0x40/0x5c [ext3]
[c00000003baf3400] c000000000148154 .generic_getxattr+0x74/0x9c
[c00000003baf34a0] c000000000333400 .inode_doinit_with_dentry+0x1c4/0x678
[c00000003baf3560] c00000000032c6b0 .security_d_instantiate+0x50/0x68
[c00000003baf35e0] c00000000013c818 .d_instantiate+0x78/0x9c
[c00000003baf3680] c00000000013ced0 .d_splice_alias+0xf0/0x120
[c00000003baf3720] d00000000243e05c .ext3_lookup+0xec/0x134 [ext3]
[c00000003baf37c0] c000000000131e74 .do_lookup+0x110/0x260
[c00000003baf3880] c000000000134ed0 .__link_path_walk+0xa98/0x1010
[c00000003baf3970] c0000000001354a0 .path_walk+0x58/0xc4
[c00000003baf3a20] c000000000135720 .do_path_lookup+0x138/0x1e4
[c00000003baf3ad0] c00000000013645c .path_lookup_open+0x6c/0xc8
[c00000003baf3b70] c000000000136780 .do_filp_open+0xcc/0x874
[c00000003baf3d10] c0000000001251e0 .do_sys_open+0x80/0x140
[c00000003baf3dc0] c00000000016aaec .compat_sys_open+0x24/0x38
[c00000003baf3e30] c00000000000855c syscall_exit+0x0/0x40
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When we introduced VSX, we changed the way FPRs are stored in the
thread_struct. Unfortunately we missed the load/store float double
alignment handler code when updating how we access FPRs in the
thread_struct.
Below fixes this and merges the little/big endian case.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Randomise ELF_ET_DYN_BASE, which is used when loading position independent
executables.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On 64bit there is a possibility our stack and mmap randomisation will put
the two close enough such that we can't expand our stack to match the ulimit
specified.
To avoid this, start the upper mmap address at 1GB + 128MB below the top of our
address space, so in the worst case we end up with the same ~128MB hole as in
32bit. This works because we randomise the stack over a 1GB range.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
get_random_int() returns the same value within a 1 jiffy interval. This means
that the mmap and stack regions will almost always end up the same distance
apart, making a relative offset based attack possible.
To fix this, shift the randomness we use for the mmap region by 1 bit.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Randomise the lower bits of the stack address. More randomisation is good for
security but the scatter can also help with SMT threads that share an L1. A
quick test case shows this working:
int main()
{
int sp;
printf("%x\n", (unsigned long)&sp & 4095);
}
before:
80
80
80
80
80
after:
610
490
300
6b0
d80
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
At the moment we randomise the stack by 8MB on 32bit and 64bit tasks. Since we
have a lot more address space to play with on 64bit, lets do what x86 does and
increase that randomisation to 1GB:
before:
# for i in seq `1 10` ; do sleep 1 & cat /proc/${!}/maps | grep stack; done
fffffebc000-fffffed1000 rw-p ffffffeb000 00:00 0 [stack]
ffffff5a000-ffffff6f000 rw-p ffffffeb000 00:00 0 [stack]
fffffdb2000-fffffdc7000 rw-p ffffffeb000 00:00 0 [stack]
fffffd3e000-fffffd53000 rw-p ffffffeb000 00:00 0 [stack]
fffffad9000-fffffaee000 rw-p ffffffeb000 00:00 0 [stack]
after:
# for i in seq `1 10` ; do sleep 1 & cat /proc/${!}/maps | grep stack; done
ffff5c27000-ffff5c3c000 rw-p ffffffeb000 00:00 0 [stack]
fffebe5e000-fffebe73000 rw-p ffffffeb000 00:00 0 [stack]
fffcb298000-fffcb2ad000 rw-p ffffffeb000 00:00 0 [stack]
fffc719d000-fffc71b2000 rw-p ffffffeb000 00:00 0 [stack]
fffe01af000-fffe01c4000 rw-p ffffffeb000 00:00 0 [stack]
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Rearrange mmap.c to better match the x86 version.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Move is_32bit_task into asm/thread_info.h, that allows us to test for
32/64bit tasks without an ugly CONFIG_PPC64 ifdef.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On Wed, 18 Feb 2009 22:18:21 +0100
Giuliano Pochini <pochini@shiny.it> wrote:
Since 2.6.28, /sys/devices/system/cpu/cpu*/online don't exist anymore
on 32-bit PowerMacs due to change in the generic powerpc code.
Signed-off-by: Giuliano Pochini <pochini@shiny.it>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The new firmware release exports further RTC calls. This
patch adds these calls to the QPACE platform setup file.
Signed-off-by: Benjamin Krill <ben@codiert.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When we introduced VSX, we changed the way FPRs are stored in the
thread_struct. Unfortunately we missed the load/store float double
alignment handler code when updating how we access FPRs in the
thread_struct.
Below fixes this and merges the little/big endian case.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
lfiwzx is a new floating point load instruction in 2.06 that needs an
alignment handler for Linux.
Turns out to be the worlds easiest handler to add.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch reworks the hot_add_scn_to_nid and its supporting functions
to make them easier to understand. There are no functional changes in
this patch and has been tested on machine with memory represented in the
device tree as memory nodes and in the ibm,dynamic-memory property.
My previous patch that introduced support for hotplug memory add on
systems whose memory was represented by the ibm,dynamic-memory property
of the device tree only left the code more unintelligible. This
will hopefully makes things easier to understand.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
While testing partition migration with heavy CPU load using
shared processors, it was observed that sometimes the migration
would never complete and would appear to hang. Currently, the
migration code assumes that if H_SUCCESS is returned from the H_JOIN
then the migration is complete and the processor is waking up on
the target system. If there was an outstanding PROD to the processor
when the H_JOIN is called, however, it will return H_SUCCESS on the source
system, causing the migration to hang, or in some scenarios cause
the kernel to crash on the complete call waking the caller
of rtas_percpu_suspend_me. Fix this by calling H_JOIN multiple times
if necessary during the migration.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There are hardware limitations on the number of available MSIs,
which firmware expresses using a property named "ibm,pe-total-#msi".
This property tells us how many MSIs are available for devices below
the point in the PCI tree where we find the property.
For old firmwares which don't have the property, we assume there are
8 MSIs available per "partitionable endpoint" (PE). The PE can be
found using existing EEH code, which uses the methods described in
PAPR. For our purposes we want the parent of the node that's
identified using this method.
When a driver requests n MSIs for a device, we first establish where
the "ibm,pe-total-#msi" property above that device is, or we find the
PE if the property is not found. In both cases we call this node
the "pe_dn".
We then count all non-bridge devices below the pe_dn, to establish
how many devices in total may need MSIs. The quota is then simply the
total available divided by the number of devices, if the request is
less than or equal to the quota, the request is fine and we're done.
If the request is greater than the quota, we try to determine if there
are any "spare" MSIs which we can give to this device. Spare MSIs are
found by looking for other devices which can never use their full
quota, because their "req#msi(-x)" property is less than the quota.
If we find any spare, we divide the spares by the number of devices
that could request more than their quota. This ensures the spare
MSIs are spread evenly amongst all over-quota requestors.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
If a driver asks for more MSIs than the devices "req#msi(-x)" property,
we currently return -ENOSPC. This doesn't give the driver any chance to
make a new request with a number that might work.
So if "req#msi(-x)" is less than the request, return its value. To be
100% safe, make sure we return an error if req_msi == 0.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The e500mc supports the new msgsnd/doorbell mechanisms that were added in
the Power ISA 2.05 architecture. We use the normal level doorbell for
doing SMP IPIs at this point.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
cbe_cpufreq has a partial dependency on cbe_cpufreq_pmi, which cannot
be easily expressed in Kconfig. This fixes it by introducing an
extra Kconfig symbol CBE_CPUFREQ_PMI_ENABLE. To make the dependency
clearer, turn PPC_PMI into an automatic symbol.
Reported-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The spufs context directory contents definitions are not changed after
initialisation, so we can declare them as const. We can do the same
with the spu coredump reader callbacks too.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, we may setup the MFC for isolated mode initilaisation with
the purge still active. This means that DMAs required to perform the
init do not happen.
This change clears the purge status after doing the purge, so that
the isolated init can proceed.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, spu_handle_mm_fault disregards the 'ret' variable and always
returns -EFAULT on error.
This change refactos spu_handle_mm_fault a little, to return the
ret variable as appropriate. This allows us to combine the error and
sucess paths.
Also, remove the #if-0-ed IS_VALID_EA() check, it has never been
used.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
At the moment we size the hashtable based on 4kB pages / 2, even on a
64kB kernel. This results in a hashtable that is much larger than it
needs to be.
Grab the real page size and size the hashtable based on that
Note: This only has effect on non hypervisor machines.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch rewrites consistent dma allocations support to use vmalloc
layer to allocate virtual memory space from vmalloc pool and get rid
of CONFIG_CONSISTENT_{START,SIZE}.
This greatly simplifies the code by effectively removing a custom
allocator we had for virtual space.
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
include/asm/bootx.h:12: include of <linux/types.h> is preferred over <asm/types.h>
include/asm/bootx.h:57: found __[us]{8,16,32,64} type without #include <linux/types.h>
include/asm/elf.h:5: include of <linux/types.h> is preferred over <asm/types.h>
include/asm/kvm.h:23: include of <linux/types.h> is preferred over <asm/types.h>
include/asm/kvm.h:26: found __[us]{8,16,32,64} type without #include <linux/types.h>
include/asm/ps3fb.h:33: found __[us]{8,16,32,64} type without #include <linux/types.h>
include/asm/spu_info.h:27: found __[us]{8,16,32,64} type without #include <linux/types.h>
include/asm/swab.h:11: include of <linux/types.h> is preferred over <asm/types.h>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Old OF variants used to create a 'dummy' parent node "multifunc-device"
for devices with more than one PCI function. Our code that matches OF
nodes to PCI devices dealt with that in one place but not in another,
this fixes it.
This has the practical effect of fixing interrupt routing of multifunction
PCI cards on some older PowerMac machines.
Signed-off-by: Tom Arbuckle <tom.d.arbuckle@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Create a new header that becomes a single location for defining PowerPC
opcodes used by code that is either generationg instructions
at runtime (fixups, debug, etc.), emulating instructions, or just
compiling instructions old assemblers don't know about.
We currently don't handle the floating point emulation or alignment decode
as both are better handled by the specific decode support they already
have.
Added support for the new dcbzl, dcbal, msgsnd, tlbilx, & wait instructions
since older assemblers don't know about them.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: clean up, remove duplicate code
When ftrace was first ported to PowerPC, there existed a
create_function_call that would create the instruction to make a call
to a given address. Unfortunately, this call expected to write to
the address it was given, and since it used the address to calculate
the offset, it could not be faked.
ftrace needed a way to create the instruction without actually writing
that instruction to the text section. So ftrace had to implement its
own code.
Now we have create_branch in the code patching library, which does
exactly what ftrace needs. This patch replaces ftrace's implementation
with the library function.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The original port of ftrace to PowerPC kept a lot of the code used
by x86. Some of this code was to handle x86's 5 byte instruction.
This was handled by using character arrays to manipulate the
code.
PowerPC has a consistent 4 byte instruction. Using unsigned ints
makes the code more efficient as well as more readable.
By converting to use unsigned ints to represent instructions,
I was able to remove the side effects that were needed for
manipulating character strings.
i.e. memcpy and memcmp
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch gets function graph tracing working with dynamic function
tracer on PowerPC32.
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch ports the function graph tracer for PowerPC, but only
for static function tracing.
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: clean up
Use a macro to save and restore the registers for PowerPC32,
since that code is duplicated.
This is similar to the work done by Cyrill Gorcunov for the
mcount code in x86_64.
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The TOCS used by modules are different than the one used by
the core kernel code. The function graph tracer must save and
restore the TOC whenever it traces a module call. But this
is an added overhead to burden the majority of core kernel
code being traced.
Benjamin Herrenschmidt suggested in testing the entry of
the call to tell if it is a core kernel function or a module.
He recommended using the REGION_ID() macro to perform this test.
This patch implements Benjamin's idea, and uses a different
return_to_handler routine dependent on if the entry is a core
kernel function or not. The module version saves the TOC, where as
the core kernel version does not.
Geoff Lavand tested on PS3.
Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This is the port of the function graph tracer to PowerPC with
dynamic tracing.
Geoff Lavand tested on PS3.
Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This is a port of the function graph tracer that was written by
Frederic Weisbecker for the x86.
This only works for PPC64 at the moment and only for static tracing.
PPC32 and dynamic function graph tracing support will come later.
The trace produces a visual calling of functions:
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
0) 2.224 us | }
0) ! 271.024 us | }
0) ! 320.080 us | }
0) ! 324.656 us | }
0) ! 329.136 us | }
0) | .put_prev_task_fair() {
0) | .update_curr() {
0) 2.240 us | .update_min_vruntime();
0) 6.512 us | }
0) 2.528 us | .__enqueue_entity();
0) + 15.536 us | }
0) | .pick_next_task_fair() {
0) 2.032 us | .__pick_next_entity();
0) 2.064 us | .__clear_buddies();
0) | .set_next_entity() {
0) 2.672 us | .__dequeue_entity();
0) 6.864 us | }
Geoff Lavand tested on PS3.
Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Neuling reported a compile bug when dynamic ftrace was
configured in and modules were not. This was due to the ftrace
code referencing module specific structures.
Reported-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: cleanup
The PowerPC ftrace code uses a hacked up DEBUGP macro for prints.
This patch converts it to the standard pr_debug.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: new timer API
Based on an idea from Martin Josefsson with the help of
Patrick McHardy and Stephen Hemminger:
introduce the mod_timer_pending() API which is a mod_timer()
offspring that is an invariant on already removed timers.
(regular mod_timer() re-activates non-pending timers.)
This is useful for the networking code in that it can
allow unserialized mod_timer_pending() timer-forwarding
calls, but a single del_timer*() will stop the timer
from being reactivated again.
Also while at it:
- optimize the regular mod_timer() path some more, the
timer-stat and a debug check was needlessly duplicated
in __mod_timer().
- make the exports come straight after the function, as
most other exports in timer.c already did.
- eliminate __mod_timer() as an external API, change the
users to mod_timer().
The regular mod_timer() code path is not impacted
significantly, due to inlining optimizations and due to
the simplifications.
Based-on-patch-from: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Cc: netdev@vger.kernel.org
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds support for AMCC ppc4xx security device driver. This is the
initial release that includes the driver framework with AES and SHA1 algorithms
support.
The remaining algorithms will be released in the near future.
Signed-off-by: James Hsiao <jhsiao@amcc.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
User space can request hardware and/or software time stamping.
Reporting of the result(s) via a new control message is enabled
separately for each field in the message because some of the
fields may require additional computation and thus cause overhead.
User space can tell the different kinds of time stamps apart
and choose what suits its needs.
When a TX timestamp operation is requested, the TX skb will be cloned
and the clone will be time stamped (in hardware or software) and added
to the socket error queue of the skb, if the skb has a socket
associated with it.
The actual TX timestamp will reach userspace as a RX timestamp on the
cloned packet. If timestamping is requested and no timestamping is
done in the device driver (potentially this may use hardware
timestamping), it will be done in software after the device's
start_hard_xmit routine.
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
kvm_arch_sync_events is introduced to quiet down all other events may happen
contemporary with VM destroy process, like IRQ handler and work struct for
assigned device.
For kvm_arch_sync_events is called at the very beginning of kvm_destroy_vm(), so
the state of KVM here is legal and can provide a environment to quiet down other
events.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
4xx chips commonly now have multiple PHBs, there is no reason to not
enable PCI domains on them. The main issue with PCI domains is X but
currently its already somewhat busted for other reasons such as the
36-bit physical address space, which I'm fixing separately.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
This adds the device-tree entries for a handful of devices on the
Canyonlands board, such as the EHCI and OHCI controllers, the real
time clock and the AD7414 thermal monitor.
I also updated the defconfig to enable various options related to
these devices.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
This patch adds support for 256KB pages on ppc44x-based boards.
For simplification of implementation with 256KB pages we still assume
2-level paging. As a side effect this leads to wasting extra memory space
reserved for PTE tables: only 1/4 of pages allocated for PTEs are
actually used. But this may be an acceptable trade-off to achieve the
high performance we have with big PAGE_SIZEs in some applications (e.g.
RAID).
Also with 256KB PAGE_SIZE we increase THREAD_SIZE up to 32KB to minimize
the risk of stack overflows in the cases of on-stack arrays, which size
depends on the page size (e.g. multipage BIOs, NTFS, etc.).
With 256KB PAGE_SIZE we need to decrease the PKMAP_ORDER at least down
to 9, otherwise all high memory (2 ^ 10 * PAGE_SIZE == 256MB) we'll be
occupied by PKMAP addresses leaving no place for vmalloc. We do not
separate PKMAP_ORDER for 256K from 16K/64K PAGE_SIZE here; actually that
value of 10 in support for 16K/64K had been selected rather intuitively.
Thus now for all cases of PAGE_SIZE on ppc44x (including the default, 4KB,
one) we have 512 pages for PKMAP.
Because ELF standard supports only page sizes up to 64K, then you should
use binutils later than 2.17.50.0.3 with '-zmax-page-size' set to 256K
for building applications, which are to be run with the 256KB-page sized
kernel. If using the older binutils, then you should patch them like follows:
--- binutils/bfd/elf32-ppc.c.orig
+++ binutils/bfd/elf32-ppc.c
-#define ELF_MAXPAGESIZE 0x10000
+#define ELF_MAXPAGESIZE 0x40000
One more restriction we currently have with 256KB page sizes is inability
to use shmem safely, so, for now, the 256KB is available only if you turn
the CONFIG_SHMEM option off (another variant is to use BROKEN).
Though, if you need shmem with 256KB pages, you can always remove the !SHMEM
dependency in 'config PPC_256K_PAGES', and use the workaround available here:
http://lkml.org/lkml/2008/12/19/20
Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Fix the VSX alignment handler for VSX registers > 32. 32-63 are stored
in the VMX part of the thread_struct not the FPR part.
Signed-off-by: Michael Neuling <mikey@neuling.org>
CC: stable@kernel.org (2.6.27 & .28 please)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Change the PS3 hotplug memory routine ps3_mm_add_memory() from
a core_initcall to a device_initcall.
core_initcall routines run before the powerpc topology_init()
startup routine, which is a subsys_initcall, resulting in
failure of ps3_mm_add_memory() when CONFIG_NUMA=y. When
ps3_mm_add_memory() fails the system will boot with just the
128 MiB of boot memory
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Fix the powerpc NUMA reserve bootmem page selection logic.
commit 8f64e1f2d1 (powerpc: Reserve
in bootmem lmb reserved regions that cross NUMA nodes) changed
the logic for how the powerpc LMB reserved regions were converted
to bootmen reserved regions. As the folowing discussion reports,
the new logic was not correct.
mark_reserved_regions_for_nid() goes through each LMB on the
system that specifies a reserved area. It searches for
active regions that intersect with that LMB and are on the
specified node. It attempts to bootmem-reserve only the area
where the active region and the reserved LMB intersect. We
can not reserve things on other nodes as they may not have
bootmem structures allocated, yet.
We base the size of the bootmem reservation on two possible
things. Normally, we just make the reservation start and
stop exactly at the start and end of the LMB.
However, the LMB reservations are not aware of NUMA nodes and
on occasion a single LMB may cross into several adjacent
active regions. Those may even be on different NUMA nodes
and will require separate calls to the bootmem reserve
functions. So, the bootmem reservation must be trimmed to
fit inside the current active region.
That's all fine and dandy, but we trim the reservation
in a page-aligned fashion. That's bad because we start the
reservation at a non-page-aligned address: physbase.
The reservation may only span 2 bytes, but that those bytes
may span two pfns and cause a reserve_size of 2*PAGE_SIZE.
Take the case where you reserve 0x2 bytes at 0x0fff and
where the active region ends at 0x1000. You'll jump into
that if() statment, but node_ar.end_pfn=0x1 and
start_pfn=0x0. You'll end up with a reserve_size=0x1000,
and then call
reserve_bootmem_node(node, physbase=0xfff, size=0x1000);
0x1000 may not be on the same node as 0xfff. Oops.
In almost all the vm code, end_<anything> is not inclusive.
If you have an end_pfn of 0x1234, page 0x1234 is not
included in the range. Using PFN_UP instead of the
(>> >> PAGE_SHIFT) will make this consistent with the other VM
code.
We also need to do math for the reserved size with physbase
instead of start_pfn. node_ar.end_pfn << PAGE_SHIFT is
*precisely* the end of the node. However,
(start_pfn << PAGE_SHIFT) is *NOT* precisely the beginning
of the reserved area. That is, of course, physbase.
If we don't use physbase here, the reserve_size can be
made too large.
From: Dave Hansen <dave@linux.vnet.ibm.com>
Tested-by: Geoff Levand <geoffrey.levand@am.sony.com> Tested on PS3.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Fix _PAGE_CHG_MASK so that pte_modify() does not affect the _PAGE_SPECIAL bit.
Signed-off-by: Philippe Gerum <rpm@xenomai.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
arch/powerpc/mm/fsl_booke_mmu.c: In function 'adjust_total_lowmem':
arch/powerpc/mm/fsl_booke_mmu.c:221: warning: format '%ld' expects type 'long int', but argument 3 has type 'phys_addr_t'
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The Power ISA 2.06 spec introduces a standard MMU programming model that
is based on the Freescale Book-E MMU programing model. The Freescale
version is pretty backwards compatiable with the ISA 2.06 definition so
we are starting to refactor some of the Freescale code so it can be
easily shared.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The Power ISA 2.06 added power of two page sizes to the embedded MMU
architecture. Its done it such a way to be code compatiable with the
existing HW. Made the minor code changes to support both power of two
and power of four page sizes. Also added some new MAS bits and macros
that are defined as part of the 2.06 ISA. Renamed some things to use
the 'Book-3e' concept to convey the new MMU that is based on the
Freescale Book-E MMU programming model.
Note, its still invalid to try and use a page size that isn't supported
by cpu.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Added a device tree that should be identical to mpc8572ds.dtb except
the physical addresses for all IO are above the 4G boundary.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The following commit:
commit 64b3d0e812
Author: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date: Thu Dec 18 19:13:51 2008 +0000
powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED
broke setting of the _PAGE_COHERENT bit in the PPC HW PTE. Since we now
actually set _PAGE_COHERENT in the Linux PTE we shouldn't be clearing it
out before we propogate it to the PPC HW PTE.
Reported-by: Martyn Welch <martyn.welch@gefanuc.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch reworks the way we do I and D cache coherency on PowerPC.
The "old" way was split in 3 different parts depending on the processor type:
- Hash with per-page exec support (64-bit and >= POWER4 only) does it
at hashing time, by preventing exec on unclean pages and cleaning pages
on exec faults.
- Everything without per-page exec support (32-bit hash, 8xx, and
64-bit < POWER4) does it for all page going to user space in update_mmu_cache().
- Embedded with per-page exec support does it from do_page_fault() on
exec faults, in a way similar to what the hash code does.
That leads to confusion, and bugs. For example, the method using update_mmu_cache()
is racy on SMP where another processor can see the new PTE and hash it in before
we have cleaned the cache, and then blow trying to execute. This is hard to hit but
I think it has bitten us in the past.
Also, it's inefficient for embedded where we always end up having to do at least
one more page fault.
This reworks the whole thing by moving the cache sync into two main call sites,
though we keep different behaviours depending on the HW capability. The call
sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
which joins the former in pgtable.c
The base idea for Embedded with per-page exec support, is that we now do the
flush at set_pte_at() time when coming from an exec fault, which allows us
to avoid the double fault problem completely (we can even improve the situation
more by implementing TLB preload in update_mmu_cache() but that's for later).
If for some reason we didn't do it there and we try to execute, we'll hit
the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
this guys also perform the I/D cache sync for exec faults now. This second path
is the catch all for things that weren't cleaned at set_pte_at() time.
For cpus without per-pag exec support, we always do the sync at set_pte_at(),
thus guaranteeing that when the PTE is visible to other processors, the cache
is clean.
For the 64-bit hash with per-page exec support case, we keep the old mechanism
for now. I'll look into changing it later, once I've reworked a bit how we
use _PAGE_EXEC.
This is also a first step for adding _PAGE_EXEC support for embedded platforms
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>