Add macros for the GS (guest state) bit to the list of MSR bit definitions.
On PowerPC cores that support embedded hypervisor mode, GS is cleared if
the system is running in hypervisor state (and MSR[PR] is cleared), and set
if it's running in guest state. See the Power ISA 2.06 specification for
more information.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
For ppc750 processors which use 4 performance counters instead of the
6 G4 uses but otherwise is compatible with G4.
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
spuctx_switch_state() warns if ktime goes backwards, but it
sometimes compares an uninitialized value, which showed that
the data was unreliable when we actually saw the warning.
Initialize it to the current time in order to get correct data.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When identify_cpu() is called a second time with a logical PVR, it
only copies a subset of the cpu_spec fields so as to avoid overwriting
the performance monitor fields that were initialized based on the
real PVR.
However some of the other, non performance monitor related fields are
also not copied:
* pvr_mask
* pvr_value
* mmu_features
* machine_check
The fact that pvr_mask is not copied can result in show_cpuinfo()
showing the cpu as "unknown", if we override an unknown PVR with a
logical one - as reported by Shaggy.
So change the logic to copy all fields, and then put back the PMC
related ones in the case that we're overwriting a real PVR with a
logical one.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The for-loop body of identify_cpu() has gotten a little big, so move the
loop body logic into a separate function. No other changes.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds the necessary bits and pieces to powerpc implementation of
ioremap to benefit from caller tracking in /proc/vmallocinfo, at least
for ioremap's done after mem init as the older ones aren't tracked.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Setting G5's cpu frequency transition latency to CPUFREQ_ETERNAL stops
ondemand governor from working. I measured the latency using sched_clock
and haven't seen much higher than 11000ns, so I set this to 12000ns for
my configuration. Possibly other configurations will be different?
Ideally the generic code would be able to measure it in case the platform
does not provide it.
But this simple patch at least makes it throttle again.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: __per_cpu_load available on all SMP capable archs
Percpu now requires three symbols to be defined - __per_cpu_load,
__per_cpu_start and __per_cpu_end. There were three archs which
didn't have it. Update them as follows.
* powerpc: can use generic PERCPU() macro. Compile tested for
powerpc32, compile/boot tested for powerpc64.
* ia64: can use generic PERCPU_VADDR() macro. __phys_per_cpu_start is
identical to __per_cpu_load. Compile tested and symbol table looks
identical after the change except for the additional __per_cpu_load.
* arm: added explicit __per_cpu_load definition. Currently uses
unified .init output section so can't use the generic macro. Dunno
whether the unified .init ouput section is required by arch
peculiarity so I left it alone. Please break it up and use PERCPU()
if possible.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Pat Gefre <pfg@sgi.com>
Cc: Russell King <rmk@arm.linux.org.uk>
The registers for the local bus are incorrectly set to 0xf8005000 rather
than there actual location of 0xfef05000.
Signed-off-by: Martyn Welch <martyn.welch@gefanuc.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Defining flash partition table in platform code is deprecated, and due to
recent changes linkstation and storcenter do not compile any more with
their default configurations because of undefined references to
physmap_set_partitions(). Instead of fixing them by using the correct
kernel configuration macro in preprocessor conditional, remove partition
table definitions altogether. Instead add support for partition definition
on the command-line and in device tree to the default configurations.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The Freescale Serial Synchronous Interface (SSI) is an audio device present on
some Freescale SOCs. Various implementations of the SSI have a different
transmit and receive FIFO depth, but are otherwise identical. To support
these variations, add a new property fsl,fifo-depth to the SSI node that
specifies the depth of the FIFOs.
Also update the MPC8610 HPCD device tree with this property.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The e500mc core supports the new tlbilx instructions that do core
local invalidates and also provide us the ability to take down
all TLB entries matching a given PID.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
On MPC837X CPUs Dual-Role USB isn't always available (for example DR
USB pins can be muxed away to eSDHC).
U-Boot adds status = "disabled" property into the DR USB nodes to
indicate that we must not try to configure or probe Dual-Role USB,
otherwise we'll break eSDHC support on targets with MPC837X CPUs.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The interrupt pending register is write 1 clear. If there are more than
one external interrupts pending at the same time, acking the first
interrupt by reading pending register then OR the corresponding bit and
write back to pending register will also clear other interrupt pending
bits. That will cause loss of interrupt.
Signed-off-by: Da Yu <dayu@datangmobile.cn>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
This patch updates the Xilinx ML507 device tree to match the released
ML507 powerpc reference design (ml507_ppc440_emb_ref). This patch is
needed to boot Linux on the ML507 powerpc reference design without
manually generating and tweaking a device tree from the project directory.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
commit a969e76a71 (powerpc: Correct USB
support for GE Fanuc SBC610) introduced a fixup for NEC usb controllers.
This fixup should only run on GEF SBC610 boards.
Fixes Fedora bug #486511.
(https://bugzilla.redhat.com/show_bug.cgi?id=486511)
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On x86-64, a 32-bit process (TIF_IA32) can switch to 64-bit mode with
ljmp, and then use the "syscall" instruction to make a 64-bit system
call. A 64-bit process make a 32-bit system call with int $0x80.
In both these cases under CONFIG_SECCOMP=y, secure_computing() will use
the wrong system call number table. The fix is simple: test TS_COMPAT
instead of TIF_IA32. Here is an example exploit:
/* test case for seccomp circumvention on x86-64
There are two failure modes: compile with -m64 or compile with -m32.
The -m64 case is the worst one, because it does "chmod 777 ." (could
be any chmod call). The -m32 case demonstrates it was able to do
stat(), which can glean information but not harm anything directly.
A buggy kernel will let the test do something, print, and exit 1; a
fixed kernel will make it exit with SIGKILL before it does anything.
*/
#define _GNU_SOURCE
#include <assert.h>
#include <inttypes.h>
#include <stdio.h>
#include <linux/prctl.h>
#include <sys/stat.h>
#include <unistd.h>
#include <asm/unistd.h>
int
main (int argc, char **argv)
{
char buf[100];
static const char dot[] = ".";
long ret;
unsigned st[24];
if (prctl (PR_SET_SECCOMP, 1, 0, 0, 0) != 0)
perror ("prctl(PR_SET_SECCOMP) -- not compiled into kernel?");
#ifdef __x86_64__
assert ((uintptr_t) dot < (1UL << 32));
asm ("int $0x80 # %0 <- %1(%2 %3)"
: "=a" (ret) : "0" (15), "b" (dot), "c" (0777));
ret = snprintf (buf, sizeof buf,
"result %ld (check mode on .!)\n", ret);
#elif defined __i386__
asm (".code32\n"
"pushl %%cs\n"
"pushl $2f\n"
"ljmpl $0x33, $1f\n"
".code64\n"
"1: syscall # %0 <- %1(%2 %3)\n"
"lretl\n"
".code32\n"
"2:"
: "=a" (ret) : "0" (4), "D" (dot), "S" (&st));
if (ret == 0)
ret = snprintf (buf, sizeof buf,
"stat . -> st_uid=%u\n", st[7]);
else
ret = snprintf (buf, sizeof buf, "result %ld\n", ret);
#else
# error "not this one"
#endif
write (1, buf, ret);
syscall (__NR_exit, 1);
return 2;
}
Signed-off-by: Roland McGrath <roland@redhat.com>
[ I don't know if anybody actually uses seccomp, but it's enabled in
at least both Fedora and SuSE kernels, so maybe somebody is. - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Virtex FPGA designs have two serial port logic cores to choose from; the
simple uartlite, and the full featured uart16550. Both cores are in
common use so the defconfig should support both of them. Currently
only console on uartlite is supported in the defconfig. This patch adds
console support for the 16550 core.
The Virtex reference designs do not work without this patch.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
To better match the ePAPR specification, device nodes which claim
"simple-bus" compatibility should be probed by default.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
On digsy MTC PSC4 and PSC5 should be configured as UART, not PSC3 and PSC4.
Signed-off-by: Grzegorz Bernacki <gjb@semihalf.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
The following options are enabled to support the digsy-mtc.
- LXT phy
- AT24 eeprom
- RTC (DS1337)
- MTD partitioning based on OF description
Signed-off-by: Grzegorz Bernacki <gjb@semihalf.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
The PCI 2.x cells used on some 44x SoCs only let us configure the decode
for the low 32-bit of the incoming PLB addresses. The top 4 bits (this
is a 36-bit bus) are hard wired to different values depending on the
specific SoC in use. Our code used to work "by accident" until I added
support for the ISA memory holes and while at it added more validity
checking of the addresses.
This patch should bring it back to working condition. It still relies
on the device-tree being correct but that's somewhat a pre-requisite
for anything to work anyway.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
This fixes a regression introduced by commit
a4e22f02f5 ("powerpc: Update 64bit
__copy_tofrom_user() using CPU_FTR_UNALIGNED_LD_STD").
The same bug that existed in the 64bit memcpy() also exists here so fix
it here too. The fix is the same as that applied to memcpy() with the
addition of fixes for the exception handling code required for
__copy_tofrom_user().
This stops us reading beyond the end of the source region we were told
to copy.
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This fixes a regression introduced by commit
25d6e2d7c5 ("powerpc: Update 64bit memcpy()
using CPU_FTR_UNALIGNED_LD_STD").
This commit allowed CPUs that have the CPU_FTR_UNALIGNED_LD_STD CPU
feature bit present to do the memcpy() with unaligned load doubles. But,
along with this came a bug where our final load double would read bytes
beyond a page boundary and into the next (unmapped) page. This was caught
by enabling CONFIG_DEBUG_PAGEALLOC,
The fix was to read only the number of bytes that we need to store rather
than reading a full 8-byte doubleword and storing only a portion of that.
In order to minimise the amount of existing code touched we use the
original do_tail for the src_unaligned case.
Below is an example of the regression, as reported by Sachin Sant:
Unable to handle kernel paging request for data at address 0xc00000003f380000
Faulting instruction address: 0xc000000000039574
cpu 0x1: Vector: 300 (Data Access) at [c00000003baf3020]
pc: c000000000039574: .memcpy+0x74/0x244
lr: d00000000244916c: .ext3_xattr_get+0x288/0x2f4 [ext3]
sp: c00000003baf32a0
msr: 8000000000009032
dar: c00000003f380000
dsisr: 40000000
current = 0xc00000003e54b010
paca = 0xc000000000a53680
pid = 1840, comm = readahead
enter ? for help
[link register ] d00000000244916c .ext3_xattr_get+0x288/0x2f4 [ext3]
[c00000003baf32a0] d000000002449104 .ext3_xattr_get+0x220/0x2f4 [ext3]
(unreliab
le)
[c00000003baf3390] d00000000244a6e8 .ext3_xattr_security_get+0x40/0x5c [ext3]
[c00000003baf3400] c000000000148154 .generic_getxattr+0x74/0x9c
[c00000003baf34a0] c000000000333400 .inode_doinit_with_dentry+0x1c4/0x678
[c00000003baf3560] c00000000032c6b0 .security_d_instantiate+0x50/0x68
[c00000003baf35e0] c00000000013c818 .d_instantiate+0x78/0x9c
[c00000003baf3680] c00000000013ced0 .d_splice_alias+0xf0/0x120
[c00000003baf3720] d00000000243e05c .ext3_lookup+0xec/0x134 [ext3]
[c00000003baf37c0] c000000000131e74 .do_lookup+0x110/0x260
[c00000003baf3880] c000000000134ed0 .__link_path_walk+0xa98/0x1010
[c00000003baf3970] c0000000001354a0 .path_walk+0x58/0xc4
[c00000003baf3a20] c000000000135720 .do_path_lookup+0x138/0x1e4
[c00000003baf3ad0] c00000000013645c .path_lookup_open+0x6c/0xc8
[c00000003baf3b70] c000000000136780 .do_filp_open+0xcc/0x874
[c00000003baf3d10] c0000000001251e0 .do_sys_open+0x80/0x140
[c00000003baf3dc0] c00000000016aaec .compat_sys_open+0x24/0x38
[c00000003baf3e30] c00000000000855c syscall_exit+0x0/0x40
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When we introduced VSX, we changed the way FPRs are stored in the
thread_struct. Unfortunately we missed the load/store float double
alignment handler code when updating how we access FPRs in the
thread_struct.
Below fixes this and merges the little/big endian case.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Randomise ELF_ET_DYN_BASE, which is used when loading position independent
executables.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On 64bit there is a possibility our stack and mmap randomisation will put
the two close enough such that we can't expand our stack to match the ulimit
specified.
To avoid this, start the upper mmap address at 1GB + 128MB below the top of our
address space, so in the worst case we end up with the same ~128MB hole as in
32bit. This works because we randomise the stack over a 1GB range.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
get_random_int() returns the same value within a 1 jiffy interval. This means
that the mmap and stack regions will almost always end up the same distance
apart, making a relative offset based attack possible.
To fix this, shift the randomness we use for the mmap region by 1 bit.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Randomise the lower bits of the stack address. More randomisation is good for
security but the scatter can also help with SMT threads that share an L1. A
quick test case shows this working:
int main()
{
int sp;
printf("%x\n", (unsigned long)&sp & 4095);
}
before:
80
80
80
80
80
after:
610
490
300
6b0
d80
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
At the moment we randomise the stack by 8MB on 32bit and 64bit tasks. Since we
have a lot more address space to play with on 64bit, lets do what x86 does and
increase that randomisation to 1GB:
before:
# for i in seq `1 10` ; do sleep 1 & cat /proc/${!}/maps | grep stack; done
fffffebc000-fffffed1000 rw-p ffffffeb000 00:00 0 [stack]
ffffff5a000-ffffff6f000 rw-p ffffffeb000 00:00 0 [stack]
fffffdb2000-fffffdc7000 rw-p ffffffeb000 00:00 0 [stack]
fffffd3e000-fffffd53000 rw-p ffffffeb000 00:00 0 [stack]
fffffad9000-fffffaee000 rw-p ffffffeb000 00:00 0 [stack]
after:
# for i in seq `1 10` ; do sleep 1 & cat /proc/${!}/maps | grep stack; done
ffff5c27000-ffff5c3c000 rw-p ffffffeb000 00:00 0 [stack]
fffebe5e000-fffebe73000 rw-p ffffffeb000 00:00 0 [stack]
fffcb298000-fffcb2ad000 rw-p ffffffeb000 00:00 0 [stack]
fffc719d000-fffc71b2000 rw-p ffffffeb000 00:00 0 [stack]
fffe01af000-fffe01c4000 rw-p ffffffeb000 00:00 0 [stack]
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Rearrange mmap.c to better match the x86 version.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Move is_32bit_task into asm/thread_info.h, that allows us to test for
32/64bit tasks without an ugly CONFIG_PPC64 ifdef.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On Wed, 18 Feb 2009 22:18:21 +0100
Giuliano Pochini <pochini@shiny.it> wrote:
Since 2.6.28, /sys/devices/system/cpu/cpu*/online don't exist anymore
on 32-bit PowerMacs due to change in the generic powerpc code.
Signed-off-by: Giuliano Pochini <pochini@shiny.it>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The new firmware release exports further RTC calls. This
patch adds these calls to the QPACE platform setup file.
Signed-off-by: Benjamin Krill <ben@codiert.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When we introduced VSX, we changed the way FPRs are stored in the
thread_struct. Unfortunately we missed the load/store float double
alignment handler code when updating how we access FPRs in the
thread_struct.
Below fixes this and merges the little/big endian case.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
lfiwzx is a new floating point load instruction in 2.06 that needs an
alignment handler for Linux.
Turns out to be the worlds easiest handler to add.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch reworks the hot_add_scn_to_nid and its supporting functions
to make them easier to understand. There are no functional changes in
this patch and has been tested on machine with memory represented in the
device tree as memory nodes and in the ibm,dynamic-memory property.
My previous patch that introduced support for hotplug memory add on
systems whose memory was represented by the ibm,dynamic-memory property
of the device tree only left the code more unintelligible. This
will hopefully makes things easier to understand.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
While testing partition migration with heavy CPU load using
shared processors, it was observed that sometimes the migration
would never complete and would appear to hang. Currently, the
migration code assumes that if H_SUCCESS is returned from the H_JOIN
then the migration is complete and the processor is waking up on
the target system. If there was an outstanding PROD to the processor
when the H_JOIN is called, however, it will return H_SUCCESS on the source
system, causing the migration to hang, or in some scenarios cause
the kernel to crash on the complete call waking the caller
of rtas_percpu_suspend_me. Fix this by calling H_JOIN multiple times
if necessary during the migration.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There are hardware limitations on the number of available MSIs,
which firmware expresses using a property named "ibm,pe-total-#msi".
This property tells us how many MSIs are available for devices below
the point in the PCI tree where we find the property.
For old firmwares which don't have the property, we assume there are
8 MSIs available per "partitionable endpoint" (PE). The PE can be
found using existing EEH code, which uses the methods described in
PAPR. For our purposes we want the parent of the node that's
identified using this method.
When a driver requests n MSIs for a device, we first establish where
the "ibm,pe-total-#msi" property above that device is, or we find the
PE if the property is not found. In both cases we call this node
the "pe_dn".
We then count all non-bridge devices below the pe_dn, to establish
how many devices in total may need MSIs. The quota is then simply the
total available divided by the number of devices, if the request is
less than or equal to the quota, the request is fine and we're done.
If the request is greater than the quota, we try to determine if there
are any "spare" MSIs which we can give to this device. Spare MSIs are
found by looking for other devices which can never use their full
quota, because their "req#msi(-x)" property is less than the quota.
If we find any spare, we divide the spares by the number of devices
that could request more than their quota. This ensures the spare
MSIs are spread evenly amongst all over-quota requestors.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
If a driver asks for more MSIs than the devices "req#msi(-x)" property,
we currently return -ENOSPC. This doesn't give the driver any chance to
make a new request with a number that might work.
So if "req#msi(-x)" is less than the request, return its value. To be
100% safe, make sure we return an error if req_msi == 0.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The e500mc supports the new msgsnd/doorbell mechanisms that were added in
the Power ISA 2.05 architecture. We use the normal level doorbell for
doing SMP IPIs at this point.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
cbe_cpufreq has a partial dependency on cbe_cpufreq_pmi, which cannot
be easily expressed in Kconfig. This fixes it by introducing an
extra Kconfig symbol CBE_CPUFREQ_PMI_ENABLE. To make the dependency
clearer, turn PPC_PMI into an automatic symbol.
Reported-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The spufs context directory contents definitions are not changed after
initialisation, so we can declare them as const. We can do the same
with the spu coredump reader callbacks too.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, we may setup the MFC for isolated mode initilaisation with
the purge still active. This means that DMAs required to perform the
init do not happen.
This change clears the purge status after doing the purge, so that
the isolated init can proceed.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, spu_handle_mm_fault disregards the 'ret' variable and always
returns -EFAULT on error.
This change refactos spu_handle_mm_fault a little, to return the
ret variable as appropriate. This allows us to combine the error and
sucess paths.
Also, remove the #if-0-ed IS_VALID_EA() check, it has never been
used.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
At the moment we size the hashtable based on 4kB pages / 2, even on a
64kB kernel. This results in a hashtable that is much larger than it
needs to be.
Grab the real page size and size the hashtable based on that
Note: This only has effect on non hypervisor machines.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch rewrites consistent dma allocations support to use vmalloc
layer to allocate virtual memory space from vmalloc pool and get rid
of CONFIG_CONSISTENT_{START,SIZE}.
This greatly simplifies the code by effectively removing a custom
allocator we had for virtual space.
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
include/asm/bootx.h:12: include of <linux/types.h> is preferred over <asm/types.h>
include/asm/bootx.h:57: found __[us]{8,16,32,64} type without #include <linux/types.h>
include/asm/elf.h:5: include of <linux/types.h> is preferred over <asm/types.h>
include/asm/kvm.h:23: include of <linux/types.h> is preferred over <asm/types.h>
include/asm/kvm.h:26: found __[us]{8,16,32,64} type without #include <linux/types.h>
include/asm/ps3fb.h:33: found __[us]{8,16,32,64} type without #include <linux/types.h>
include/asm/spu_info.h:27: found __[us]{8,16,32,64} type without #include <linux/types.h>
include/asm/swab.h:11: include of <linux/types.h> is preferred over <asm/types.h>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Old OF variants used to create a 'dummy' parent node "multifunc-device"
for devices with more than one PCI function. Our code that matches OF
nodes to PCI devices dealt with that in one place but not in another,
this fixes it.
This has the practical effect of fixing interrupt routing of multifunction
PCI cards on some older PowerMac machines.
Signed-off-by: Tom Arbuckle <tom.d.arbuckle@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Create a new header that becomes a single location for defining PowerPC
opcodes used by code that is either generationg instructions
at runtime (fixups, debug, etc.), emulating instructions, or just
compiling instructions old assemblers don't know about.
We currently don't handle the floating point emulation or alignment decode
as both are better handled by the specific decode support they already
have.
Added support for the new dcbzl, dcbal, msgsnd, tlbilx, & wait instructions
since older assemblers don't know about them.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: clean up, remove duplicate code
When ftrace was first ported to PowerPC, there existed a
create_function_call that would create the instruction to make a call
to a given address. Unfortunately, this call expected to write to
the address it was given, and since it used the address to calculate
the offset, it could not be faked.
ftrace needed a way to create the instruction without actually writing
that instruction to the text section. So ftrace had to implement its
own code.
Now we have create_branch in the code patching library, which does
exactly what ftrace needs. This patch replaces ftrace's implementation
with the library function.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The original port of ftrace to PowerPC kept a lot of the code used
by x86. Some of this code was to handle x86's 5 byte instruction.
This was handled by using character arrays to manipulate the
code.
PowerPC has a consistent 4 byte instruction. Using unsigned ints
makes the code more efficient as well as more readable.
By converting to use unsigned ints to represent instructions,
I was able to remove the side effects that were needed for
manipulating character strings.
i.e. memcpy and memcmp
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch gets function graph tracing working with dynamic function
tracer on PowerPC32.
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch ports the function graph tracer for PowerPC, but only
for static function tracing.
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: clean up
Use a macro to save and restore the registers for PowerPC32,
since that code is duplicated.
This is similar to the work done by Cyrill Gorcunov for the
mcount code in x86_64.
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The TOCS used by modules are different than the one used by
the core kernel code. The function graph tracer must save and
restore the TOC whenever it traces a module call. But this
is an added overhead to burden the majority of core kernel
code being traced.
Benjamin Herrenschmidt suggested in testing the entry of
the call to tell if it is a core kernel function or a module.
He recommended using the REGION_ID() macro to perform this test.
This patch implements Benjamin's idea, and uses a different
return_to_handler routine dependent on if the entry is a core
kernel function or not. The module version saves the TOC, where as
the core kernel version does not.
Geoff Lavand tested on PS3.
Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This is the port of the function graph tracer to PowerPC with
dynamic tracing.
Geoff Lavand tested on PS3.
Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This is a port of the function graph tracer that was written by
Frederic Weisbecker for the x86.
This only works for PPC64 at the moment and only for static tracing.
PPC32 and dynamic function graph tracing support will come later.
The trace produces a visual calling of functions:
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
0) 2.224 us | }
0) ! 271.024 us | }
0) ! 320.080 us | }
0) ! 324.656 us | }
0) ! 329.136 us | }
0) | .put_prev_task_fair() {
0) | .update_curr() {
0) 2.240 us | .update_min_vruntime();
0) 6.512 us | }
0) 2.528 us | .__enqueue_entity();
0) + 15.536 us | }
0) | .pick_next_task_fair() {
0) 2.032 us | .__pick_next_entity();
0) 2.064 us | .__clear_buddies();
0) | .set_next_entity() {
0) 2.672 us | .__dequeue_entity();
0) 6.864 us | }
Geoff Lavand tested on PS3.
Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Neuling reported a compile bug when dynamic ftrace was
configured in and modules were not. This was due to the ftrace
code referencing module specific structures.
Reported-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: cleanup
The PowerPC ftrace code uses a hacked up DEBUGP macro for prints.
This patch converts it to the standard pr_debug.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: new timer API
Based on an idea from Martin Josefsson with the help of
Patrick McHardy and Stephen Hemminger:
introduce the mod_timer_pending() API which is a mod_timer()
offspring that is an invariant on already removed timers.
(regular mod_timer() re-activates non-pending timers.)
This is useful for the networking code in that it can
allow unserialized mod_timer_pending() timer-forwarding
calls, but a single del_timer*() will stop the timer
from being reactivated again.
Also while at it:
- optimize the regular mod_timer() path some more, the
timer-stat and a debug check was needlessly duplicated
in __mod_timer().
- make the exports come straight after the function, as
most other exports in timer.c already did.
- eliminate __mod_timer() as an external API, change the
users to mod_timer().
The regular mod_timer() code path is not impacted
significantly, due to inlining optimizations and due to
the simplifications.
Based-on-patch-from: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Cc: netdev@vger.kernel.org
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds support for AMCC ppc4xx security device driver. This is the
initial release that includes the driver framework with AES and SHA1 algorithms
support.
The remaining algorithms will be released in the near future.
Signed-off-by: James Hsiao <jhsiao@amcc.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
User space can request hardware and/or software time stamping.
Reporting of the result(s) via a new control message is enabled
separately for each field in the message because some of the
fields may require additional computation and thus cause overhead.
User space can tell the different kinds of time stamps apart
and choose what suits its needs.
When a TX timestamp operation is requested, the TX skb will be cloned
and the clone will be time stamped (in hardware or software) and added
to the socket error queue of the skb, if the skb has a socket
associated with it.
The actual TX timestamp will reach userspace as a RX timestamp on the
cloned packet. If timestamping is requested and no timestamping is
done in the device driver (potentially this may use hardware
timestamping), it will be done in software after the device's
start_hard_xmit routine.
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
kvm_arch_sync_events is introduced to quiet down all other events may happen
contemporary with VM destroy process, like IRQ handler and work struct for
assigned device.
For kvm_arch_sync_events is called at the very beginning of kvm_destroy_vm(), so
the state of KVM here is legal and can provide a environment to quiet down other
events.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
4xx chips commonly now have multiple PHBs, there is no reason to not
enable PCI domains on them. The main issue with PCI domains is X but
currently its already somewhat busted for other reasons such as the
36-bit physical address space, which I'm fixing separately.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
This adds the device-tree entries for a handful of devices on the
Canyonlands board, such as the EHCI and OHCI controllers, the real
time clock and the AD7414 thermal monitor.
I also updated the defconfig to enable various options related to
these devices.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
This patch adds support for 256KB pages on ppc44x-based boards.
For simplification of implementation with 256KB pages we still assume
2-level paging. As a side effect this leads to wasting extra memory space
reserved for PTE tables: only 1/4 of pages allocated for PTEs are
actually used. But this may be an acceptable trade-off to achieve the
high performance we have with big PAGE_SIZEs in some applications (e.g.
RAID).
Also with 256KB PAGE_SIZE we increase THREAD_SIZE up to 32KB to minimize
the risk of stack overflows in the cases of on-stack arrays, which size
depends on the page size (e.g. multipage BIOs, NTFS, etc.).
With 256KB PAGE_SIZE we need to decrease the PKMAP_ORDER at least down
to 9, otherwise all high memory (2 ^ 10 * PAGE_SIZE == 256MB) we'll be
occupied by PKMAP addresses leaving no place for vmalloc. We do not
separate PKMAP_ORDER for 256K from 16K/64K PAGE_SIZE here; actually that
value of 10 in support for 16K/64K had been selected rather intuitively.
Thus now for all cases of PAGE_SIZE on ppc44x (including the default, 4KB,
one) we have 512 pages for PKMAP.
Because ELF standard supports only page sizes up to 64K, then you should
use binutils later than 2.17.50.0.3 with '-zmax-page-size' set to 256K
for building applications, which are to be run with the 256KB-page sized
kernel. If using the older binutils, then you should patch them like follows:
--- binutils/bfd/elf32-ppc.c.orig
+++ binutils/bfd/elf32-ppc.c
-#define ELF_MAXPAGESIZE 0x10000
+#define ELF_MAXPAGESIZE 0x40000
One more restriction we currently have with 256KB page sizes is inability
to use shmem safely, so, for now, the 256KB is available only if you turn
the CONFIG_SHMEM option off (another variant is to use BROKEN).
Though, if you need shmem with 256KB pages, you can always remove the !SHMEM
dependency in 'config PPC_256K_PAGES', and use the workaround available here:
http://lkml.org/lkml/2008/12/19/20
Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Fix the VSX alignment handler for VSX registers > 32. 32-63 are stored
in the VMX part of the thread_struct not the FPR part.
Signed-off-by: Michael Neuling <mikey@neuling.org>
CC: stable@kernel.org (2.6.27 & .28 please)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Change the PS3 hotplug memory routine ps3_mm_add_memory() from
a core_initcall to a device_initcall.
core_initcall routines run before the powerpc topology_init()
startup routine, which is a subsys_initcall, resulting in
failure of ps3_mm_add_memory() when CONFIG_NUMA=y. When
ps3_mm_add_memory() fails the system will boot with just the
128 MiB of boot memory
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Fix the powerpc NUMA reserve bootmem page selection logic.
commit 8f64e1f2d1 (powerpc: Reserve
in bootmem lmb reserved regions that cross NUMA nodes) changed
the logic for how the powerpc LMB reserved regions were converted
to bootmen reserved regions. As the folowing discussion reports,
the new logic was not correct.
mark_reserved_regions_for_nid() goes through each LMB on the
system that specifies a reserved area. It searches for
active regions that intersect with that LMB and are on the
specified node. It attempts to bootmem-reserve only the area
where the active region and the reserved LMB intersect. We
can not reserve things on other nodes as they may not have
bootmem structures allocated, yet.
We base the size of the bootmem reservation on two possible
things. Normally, we just make the reservation start and
stop exactly at the start and end of the LMB.
However, the LMB reservations are not aware of NUMA nodes and
on occasion a single LMB may cross into several adjacent
active regions. Those may even be on different NUMA nodes
and will require separate calls to the bootmem reserve
functions. So, the bootmem reservation must be trimmed to
fit inside the current active region.
That's all fine and dandy, but we trim the reservation
in a page-aligned fashion. That's bad because we start the
reservation at a non-page-aligned address: physbase.
The reservation may only span 2 bytes, but that those bytes
may span two pfns and cause a reserve_size of 2*PAGE_SIZE.
Take the case where you reserve 0x2 bytes at 0x0fff and
where the active region ends at 0x1000. You'll jump into
that if() statment, but node_ar.end_pfn=0x1 and
start_pfn=0x0. You'll end up with a reserve_size=0x1000,
and then call
reserve_bootmem_node(node, physbase=0xfff, size=0x1000);
0x1000 may not be on the same node as 0xfff. Oops.
In almost all the vm code, end_<anything> is not inclusive.
If you have an end_pfn of 0x1234, page 0x1234 is not
included in the range. Using PFN_UP instead of the
(>> >> PAGE_SHIFT) will make this consistent with the other VM
code.
We also need to do math for the reserved size with physbase
instead of start_pfn. node_ar.end_pfn << PAGE_SHIFT is
*precisely* the end of the node. However,
(start_pfn << PAGE_SHIFT) is *NOT* precisely the beginning
of the reserved area. That is, of course, physbase.
If we don't use physbase here, the reserve_size can be
made too large.
From: Dave Hansen <dave@linux.vnet.ibm.com>
Tested-by: Geoff Levand <geoffrey.levand@am.sony.com> Tested on PS3.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Fix _PAGE_CHG_MASK so that pte_modify() does not affect the _PAGE_SPECIAL bit.
Signed-off-by: Philippe Gerum <rpm@xenomai.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
arch/powerpc/mm/fsl_booke_mmu.c: In function 'adjust_total_lowmem':
arch/powerpc/mm/fsl_booke_mmu.c:221: warning: format '%ld' expects type 'long int', but argument 3 has type 'phys_addr_t'
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The Power ISA 2.06 spec introduces a standard MMU programming model that
is based on the Freescale Book-E MMU programing model. The Freescale
version is pretty backwards compatiable with the ISA 2.06 definition so
we are starting to refactor some of the Freescale code so it can be
easily shared.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The Power ISA 2.06 added power of two page sizes to the embedded MMU
architecture. Its done it such a way to be code compatiable with the
existing HW. Made the minor code changes to support both power of two
and power of four page sizes. Also added some new MAS bits and macros
that are defined as part of the 2.06 ISA. Renamed some things to use
the 'Book-3e' concept to convey the new MMU that is based on the
Freescale Book-E MMU programming model.
Note, its still invalid to try and use a page size that isn't supported
by cpu.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Added a device tree that should be identical to mpc8572ds.dtb except
the physical addresses for all IO are above the 4G boundary.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The following commit:
commit 64b3d0e812
Author: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date: Thu Dec 18 19:13:51 2008 +0000
powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED
broke setting of the _PAGE_COHERENT bit in the PPC HW PTE. Since we now
actually set _PAGE_COHERENT in the Linux PTE we shouldn't be clearing it
out before we propogate it to the PPC HW PTE.
Reported-by: Martyn Welch <martyn.welch@gefanuc.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch reworks the way we do I and D cache coherency on PowerPC.
The "old" way was split in 3 different parts depending on the processor type:
- Hash with per-page exec support (64-bit and >= POWER4 only) does it
at hashing time, by preventing exec on unclean pages and cleaning pages
on exec faults.
- Everything without per-page exec support (32-bit hash, 8xx, and
64-bit < POWER4) does it for all page going to user space in update_mmu_cache().
- Embedded with per-page exec support does it from do_page_fault() on
exec faults, in a way similar to what the hash code does.
That leads to confusion, and bugs. For example, the method using update_mmu_cache()
is racy on SMP where another processor can see the new PTE and hash it in before
we have cleaned the cache, and then blow trying to execute. This is hard to hit but
I think it has bitten us in the past.
Also, it's inefficient for embedded where we always end up having to do at least
one more page fault.
This reworks the whole thing by moving the cache sync into two main call sites,
though we keep different behaviours depending on the HW capability. The call
sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
which joins the former in pgtable.c
The base idea for Embedded with per-page exec support, is that we now do the
flush at set_pte_at() time when coming from an exec fault, which allows us
to avoid the double fault problem completely (we can even improve the situation
more by implementing TLB preload in update_mmu_cache() but that's for later).
If for some reason we didn't do it there and we try to execute, we'll hit
the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
this guys also perform the I/D cache sync for exec faults now. This second path
is the catch all for things that weren't cleaned at set_pte_at() time.
For cpus without per-pag exec support, we always do the sync at set_pte_at(),
thus guaranteeing that when the PTE is visible to other processors, the cache
is clean.
For the 64-bit hash with per-page exec support case, we keep the old mechanism
for now. I'll look into changing it later, once I've reworked a bit how we
use _PAGE_EXEC.
This is also a first step for adding _PAGE_EXEC support for embedded platforms
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CONFIG_CC_OPTIMIZE_FOR_SIZE is selected, because otherwise the kernel
wouldn't boot. The AmigaOne's U-boot firmware seems to have a problem
loading uImages bigger than 1.8 MB.
Signed-off-by: Gerhard Pircher <gerhard_pircher@gmx.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds the bootwrapper for the cuImage target and a compatible property
check for "pnpPNP,501" to the generic serial console support code.
The default link address for the cuImage target is set to 0x800000. This
allows to boot the kernel with AmigaOS4's second level bootloader, which
always loads a uImage at 0x500000.
Signed-off-by: Gerhard Pircher <gerhard_pircher@gmx.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This device tree does not provide the correct CPU name, as various CPU
models and revisions are used in AmigaOnes. Also the PCI root node does
not contain a interrupt mapping property, as all boards have different
interrupt routing. However the kernel can do a 1:1 mapping of all PCI
interrupts, as only i8259 legacy interrupts are used.
Signed-off-by: Gerhard Pircher <gerhard_pircher@gmx.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This commit adds the setup code for booting Linux on AmigaOne G3SE (G3
only), AmigaOne XE and uA1 (G3/G4) desktop computers. These boards were
sold by Eyetech and are based on MAI Logic's Teron boards and its
Articia S northbridge.
The AmigaOne uses U-boot as firmware, which doesn't support a flattened
device tree yet. The northbridge has some design flaws, which makes it
necessary to use non cacheable memory for DMA operations
(CONFIG_NOT_COHERENT_CACHE) and to avoid setting the coherence (M) flag
for memory pages.
Signed-off-by: Gerhard Pircher <gerhard_pircher@gmx.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The EEH code disables and enables interrupts during the
device recovery process. This is unnecessary for MSI
and MSI-X interrupts because they are effectively disabled
by the DMA Stopped state when an EEH error occurs. The
current code is also incorrect for MSI-X interrupts. It
doesn't take into account that MSI-X interrupts are tracked
in a different way than LSI/MSI interrupts. This patch
ensures only LSI interrupts are disabled/enabled.
Signed-off-by: Mike Mason <mmlnx@us.ibm.com>
Acked-by: Linas Vepstas <linasvepstas@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
After the last changes, the mv643xx_eth driver now detects
a spurious interface on port 0. Since only port 1 is actually
connected to a PHY, remove its description.
Signed-off-by: Gabriel Paubert <paubert@iram.es>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
move the definition of hose_list next to its hotplug spinlock.
create pcibios_io_size to encapsulate ifdef in existing pci-common
function pcibios_vaddr_is_ioport
move pci_address_to_pio to pci-common, using new pcibios_io_size, and
protect this GPL exported function against concurrent hotplug removal
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
If we can't allocate the requested number of MSIs, we can still tell the
generic code how many we were able to allocate. That can then be passed
onto the driver, allowing it to request that many in future, and
probably succeeed.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We also need to check that the device isn't using MSI-X in the irq fixup
routine, otherwise we might leave MSI-Xs configured at boot.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Firmware encodes the number of MSI-X requested by a device in a
different property than for MSI. Pull the property name out as a
parameter and share the logic for both cases.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We need to increment i in the loop that queries what interrupts firmware
gave us, otherwise we'll incorrectly use the first value over and over.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The lmb debugging can be turned on at boottime with lmb=debug on the
command line. However on powerpc that doesn't work, because we don't
necessarily call lmb_dump_all().
So always call lmb_dump_all() after lmb_analyze(), no output is
generated unless lmb=debug is found on the command line.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Since we never hotplug add an isa bus, we never need to set primary.
Delete this write-only variable.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Use of_get_cpu_node, which is a superset of numa.c's find_cpu_node in
a less restrictive section (text vs cpuinit).
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
find_min_common_depth() was checking the property length incorrectly.
The value is in bytes not cells, and it is using the second entry.
Signed-off-By: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
arch/powerpc/platforms/pseries/hotplug-memory.c uses
remove_section_mapping() but doesn't include sparsemem.h which defines
it. This can cause compilation fails for some configs.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The new legacy_mem file in sysfs is causing problems with X on machines
that don't support legacy memory access. The way I initially implemented
it, we would fail with -ENXIO when trying to mmap it, thus exposing to
X that we do support the API but there is no legacy memory.
Unfortunately, X poor error handling is causing it to fail to start when
it gets this error.
This implements a workaround hack that instead maps anonymous memory
instead (using shmem if VM_SHARED is set, just like /dev/zero does).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
arch/powerpc/oprofile/cell/spu_profiler.c is missing a asm/time.h
include which is required for ppc_proc_freq. This can cause compile
failures for some config combinations.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: fix dynamic ftrace with large modules in PPC64
The math to calculate the offset into the TOC that is taken from reading
the trampoline is incorrect. The bottom half of the offset is a signed
extended short. The current code was using an OR to create the offset
when it should have been using an addition.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently emulate_step() emulates mr. instructions without updating cr0
and this can be disastrous. Don't emulate mr.
This bug has been around for a while, but I am not sure if its a worthy
-stable candidate. I'll leave it to Ben do decide.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Fixed v_mapped_by_tlbcam() and p_mapped_by_tlbcam() to use phys_addr_t
instead of unsigned long. In 36-bit physical mode we really need these
functions to deal with phys_addr_t when trying to match a physical
address or when returning one.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Simply add appropriate sdhci nodes.
Note that U-Boot should configure pin multiplexing for eSDHC prior
to Linux could use it. U-Boot should also fill-in the clock-frequency
property (eSDHC clock depends on board-specific SCCR[ESDHCCM] bits).
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
- sdhc node renamed to sdhci ("sdhc" name is confusing since SDHC is
used to name Secure Digital High Capacity cards, while SDHCI is an
interface).
- Get rid of "fsl,esdhc" compatible entry, it's replaced by the
"fsl,<chip>-esdhc" scheme;
- Get rid of `model' property.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Taking sizeof the result of sizeof is quite strange and does not seem to be
what is wanted here.
This was fixed using the following semantic patch.
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
expression E;
@@
- sizeof (
sizeof (E)
- )
@@
type T;
@@
- sizeof (
sizeof (T)
- )
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
This is a simple change to correct problems when using set_irq_type
on platforms using CPM2. This code corrects the problem on most platform
but may have issues on 8272 derived platforms for some interrupts.
On 8272 PC2 & 3 are missing and PC 23 & 29 are added, which this patch
does not address.
Signed-off-by: Paul Bilke <paul@conspiracy.net>
Reviewed-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
TSEC0 is connected to Vitesse 7385 5-port switch. The switch
isn't connected to any mdio bus, the link to the switch is fixed
to Full-duplex 1000 Mb/s (no pause).
This patch fixes following failure during bootup:
mdio@24520:01 not found
eth0: Could not attach to PHY
IP-Config: Failed to open eth0
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
commit b31a1d8b41 ("gianfar: Convert
gianfar to an of_platform_driver") introduced a child node for
the ethernet@25000 controller, but no address and size cells
specifiers were added, and that makes dtc unhappy:
DTC: dts->dtb on file "arch/powerpc/boot/dts/mpc8313erdb.dts"
Warning (reg_format): "reg" property in /soc8313@e0000000/ethernet@25000/mdio@25520 has invalid length (8 bytes) (#address-cells == 2, #size-cells == 1)
Warning (avoid_default_addr_size): Relying on default #address-cells value for /soc8313@e0000000/ethernet@25000/mdio@25520
Warning (avoid_default_addr_size): Relying on default #size-cells value for /soc8313@e0000000/ethernet@25000/mdio@25520
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
I noticed this doing some randconfig testing (.config below). I have
CONFIG_PM but no CONFIG_SUSPEND. Bug is against mainline.
arch/powerpc/sysdev/built-in.o: In function `ipic_suspend':
ipic.c:(.text+0x6b34): undefined reference to `fsl_deep_sleep'
make[1]: *** [.tmp_vmlinux1] Error 1
make: *** [sub-make] Error 2
Looks like #ifdef CONFIG_PM in arch/powerpc/sysdev/ipic.c should be
CONFIG_SUSPEND. d49747bdfb introduced
this.
Fix build when we have CONFIG_PM but no CONFIG_SUSPEND.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Board support for the InterControl Digsy-MTC device based on the MPC5200B SoC.
Signed-off-by: Grzegorz Bernacki <gjb@semihalf.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
This patch adds board support for the Media5200 platform. Changes are:
- add the media5200 device tree
- add the media5200 platform support code and cascaded interrupt controller
- add media5200 to the build targets.
Note: this patch also includes a minor tweak to the lite5200(b) target
images list to add the .dtb files to the image list.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
This patch adds IRQ controller support to the MPC5200 General
Purpose Timer (GPT) device driver. With this patch the mpc5200-gpt
driver supports both GPIO and IRQ functions.
The GPT driver was contained within the mpc52xx_gpio.c file, but this
patch moves it out into a new file (mpc52xx_gpt.c) since it has more
than just GPIO functionality now and it was only grouped with the
mpc52xx-gpio drivers as a matter of convenience before. Also, this
driver will most likely get extended again to also provide support
for the timer function.
Implementation note: Alternately, I could have tried to implement
the IRQ support as a separate driver and left the GPIO portion alone.
However, multiple functions of this device (ie. GPIO input+interrupt
controller, or timer+GPIO) can be active at the same time and the
registers are shared so it is safer to contain all functionality
within a single driver.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Reviewed-by: Wolfram Sang <w.sang@pengutronix.de>
Rework the mpc5200-pic driver to simplify it and fix up the setting
of desc->status when set_type is called for internal IRQs (so they
are reported as level, not edge). The simplification is due to
splitting off the handling of external IRQs into a separate block
so they don't need to be handled as exceptions in the normal
CRIT, MAIN and PERP paths.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
pr_debug() calls in the 'hot' *_mask(), *_unmask(), *_ack() and
get_irq() makes adding #define DEBUG pretty much useless. Remove
these calls because they completely swamp the output.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Reviewed-by: Wolfram Sang <w.sang@pengutronix.de>
Trim out obsolete/extraneous properties and tighten up some usage
conventions. Changes include:
- removal of device_type properties
- removal of cell-index properties
- Addition of gpio-controller and #gpio-cells properties to gpio
nodes
- Move common interrupt-parent property out of device nodes and
into top level parent node.
This patch also include what looks to be just trivial editorial
whitespace/format changes, but there is real method in this
madness. Editorial changes were made to keep the all the
mpc5200 board device trees as similar as possible so that diffs
between them only show the real differences between the boards.
The pcm030 device tree was most affected by this because many
of the comments had been changed from // to /* */ style and
some cell values where changed from decimal to hex format when
it was cloned from one of the other 5200 device trees.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Reviewed-by: Wolfram Sang <w.sang@pengutronix.de>
Automatic I2C device probing is not done any more. Therefore we need
proper DTS device node definitions for the I2C LM75 thermal sensor on
the TQM85xx modules.
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Recently, a patch left DEBUG enabled in the powerpc common PCI code,
resulting in an old bug in a pr_debug() statement to show up and cause
a NULL dereference on some machines.
This fixes the pr_debug() statement and reverts to DEBUG not being
force-enabled in that file.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
fix the following 'make headers_check' warning:
usr/include/asm-powerpc/swab.h:11: include of <linux/types.h> is preferred over <asm/types.h>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
fix the following 'make headers_check' warning:
usr/include/asm-powerpc/spu_info.h:27: found __[us]{8,16,32,64} type without #include <linux/types.h>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
fix the following 'make headers_check' warning:
usr/include/asm-powerpc/ps3fb.h:33: found __[us]{8,16,32,64} type without #include <linux/types.h>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
fix the following 'make headers_check' warnings:
usr/include/asm-powerpc/kvm.h:23: include of <linux/types.h> is preferred over <asm/types.h>
usr/include/asm-powerpc/kvm.h:26: found __[us]{8,16,32,64} type without #include <linux/types.h>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
fix the following 'make headers_check' warning:
usr/include/asm-powerpc/elf.h:5: include of <linux/types.h> is preferred over <asm/types.h>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
fix the following 'make headers_check' warnings:
usr/include/asm-powerpc/bootx.h:12: include of <linux/types.h> is preferred over <asm/types.h>
usr/include/asm-powerpc/bootx.h:57: found __[us]{8,16,32,64} type without #include <linux/types.h>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
This patch ensures that memory gets properly mapped into the PCI
address space. Without this patch, the memory window BAR is left
at whatever value happened to be loaded into the BAR when Linux
was booted. Without this patch, memory could end up getting mapped
at any of the 1G address boundaries instead of at '0' where Linux
expects it.
Similarly, this patch also ensures that the internally memory mapped
registers (IMMR) are mapped to the correct PCI address range.
Without this patch, PCI appears to work correctly until a PCI
device is inserted which DMAs into memory.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Tested-by: Wolfram Sang <w.sang@pengutronix.de>
This patch adds basic support for the 6 GPIO lines found on GE Fanucs SBC310 to the GE Fanuc GPIO driver.
Signed-off-by: Martyn Welch <martyn.welch@gefanuc.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Support for the SBC310 VPX Single Board Computer from GE Fanuc (PowerPC
MPC8641D).
This is the default config file for GE Fanuc's SBC310, a 3U single board
computer, based on Freescale's MPC8641D.
Signed-off-by: Martyn Welch <martyn.welch@gefanuc.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Support for the SBC310 VPX Single Board Computer from GE Fanuc (PowerPC
MPC8641D).
This is the basic board support for GE Fanuc's SBC310, a 3U single board
computer, based on Freescale's MPC8641D.
Signed-off-by: Martyn Welch <martyn.welch@gefanuc.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The FSL PCI code depends on PCI quirks being enabled to function
properly. We can ensure this by doing a select in Kconfig of
PCI_QUIRKS.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Instead of rounding the divider down, improve the baud-rate generators
accuracy by rounding to the nearest integer.
Signed-off-by: Laurent Pinchart <laurentp@cse-semaphore.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
On booke processors, the code that maps low memory only uses up to three
CAM entries, even though there are sixteen and nothing else uses them.
Make this number configurable in the advanced options menu along with max
low memory size. If one wants 1 GB of lowmem, then it's typically
necessary to have four CAM entries.
Signed-off-by: Trent Piepho <tpiepho@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The code that maps kernel low memory would only use page sizes up to 256
MB. On E500v2 pages up to 4 GB are supported.
However, a page must be aligned to a multiple of the page's size. I.e.
256 MB pages must aligned to a 256 MB boundary. This was enforced by a
requirement that the physical and virtual addresses of the start of lowmem
be aligned to 256 MB. Clearly requiring 1GB or 4GB alignment to allow
pages of that size isn't acceptable.
To solve this, I simply have adjust_total_lowmem() take alignment into
account when it decides what size pages to use. Give it PAGE_OFFSET =
0x7000_0000, PHYSICAL_START = 0x3000_0000, and 2GB of RAM, and it will map
pages like this:
PA 0x3000_0000 VA 0x7000_0000 Size 256 MB
PA 0x4000_0000 VA 0x8000_0000 Size 1 GB
PA 0x8000_0000 VA 0xC000_0000 Size 256 MB
PA 0x9000_0000 VA 0xD000_0000 Size 256 MB
PA 0xA000_0000 VA 0xE000_0000 Size 256 MB
Because the lowmem mapping code now takes alignment into account,
PHYSICAL_ALIGN can be lowered from 256 MB to 64 MB. Even lower might be
possible. The lowmem code will work down to 4 kB but it's possible some of
the boot code will fail before then. Poor alignment will force small pages
to be used, which combined with the limited number of TLB1 pages available,
will result in very little memory getting mapped. So alignments less than
64 MB probably aren't very useful anyway.
Signed-off-by: Trent Piepho <tpiepho@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The code to map lowmem uses three CAM aka TLB[1] entries to cover it. The
size of each is stored in three globals named __cam0, __cam1, and __cam2.
All the code that uses them is duplicated three times for each of the three
variables.
We have these things called arrays and loops....
Once converted to use an array, it will be easier to make the number of
CAMs configurable.
Signed-off-by: Trent Piepho <tpiepho@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
We currently have a few variants of fsl-booke processors (e500v1, e500v2,
e500mc, and e200). They all have minor differences that we had previously
been handling via ifdefs.
To move towards having this support the following changes have been made:
* PID1, PID2 only exist on e500v1 & e500v2 and should not be accessed on
e500mc or e200. We use MMUCFG[NPIDS] to determine which case we are
since we only touch PID1/2 in extremely early init code.
* Not all IVORs exist on all the processors so introduce cpu_setup
functions for each variant to setup the proper IVORs that are either
unique or exist but have some variations between the processors
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
This patch adds pcie nodes to the appropriate dts files, plus adds
some probing code for the boards.
Also, remove of_device_is_avaliable() check from the mpc837x_mds.c
board file, as mpc83xx_add_bridge() has the same check now.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
This patch adds support for PCI-Express controllers as found on the
newer MPC83xx chips.
The work is loosely based on the Tony Li's patch[1], but unlike the
original patch, this patch implements sliding window for the Type 1
transactions using outbound window translations, so we don't have to
ioremap the whole PCI-E configuration space.
[1] http://ozlabs.org/pipermail/linuxppc-dev/2008-January/049028.html
Signed-off-by: Tony Li <tony.li@freescale.com>
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
_PAGE_COHERENT is now always set in _PAGE_RAM resp. PAGE_KERNEL.
Thus it has to be masked out, if the BAT mapping should be non
cacheable or CPU_FTR_NEED_COHERENT is not set.
This will work on normal SMP setups because we force-set
CPU_FTR_NEED_COHERENT as part of CPU_FTR_COMMON on SMP.
Signed-off-by: Gerhard Pircher <gerhard_pircher@gmx.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
In the VIO bus code the wrappers for dma alloc_coherent and free_coherent
calls are rounding to IOMMU_PAGE_SIZE. Taking a look at the underlying
calls, the actual mapping is promoted to PAGE_SIZE. Changing the
rounding in these two functions fixes under-reporting the entitlement
used by the system. Without this change, the system could run out of
entitlement before it believes it has and incur mapping failures at the
firmware level.
Also in the VIO bus code, the wrapper for dma map_sg is not exiting in
an error path where it should. Rather than fall through to code for the
success case, this patch adds the return that is needed in the error path.
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Remove some leftover cruft from the arch/ppc days
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Probe the new mdio node added by b31a1d8b. Fix kernel panic problem when
gianfar driver wants to get the of_platform_device of that mdio.
Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Now that all EEPROM drivers live in the same place, let's harmonize
their symbol names.
Also fix eeprom's dependencies, it definitely needs sysfs, and is no
longer experimental after many years in the kernel tree.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: David Brownell <dbrownell@users.sourceforge.net>
Convert the Warp platform to use the newly merged NDFC driver
- warp.dts changed to work with ndfc
- warp-nand.c no longer needed
- removed obsolete rev A support from cuboot-warp.c
Signed-off-by: Sean MacLennan <smaclennan@pikatech.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Added additional information for type and compatibility strings and
interrupt information to the SDRAM0 memory-controller device tree
nodes for AMCC PowerPC 405EX[r]-based boards to facilitate binding
with the new "ibm,sdram-4xx-ddr2" EDAC memory controller adapter driver.
Signed-off-by: Grant Erickson <gerickson@nuovations.com>
Acked-by: Stefan Roese <sr@denx.de>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
arm, arm/mach-integrator and powerpc were missing
.data.percpu.page_aligned in their percpu output section definitions.
Add it.
Signed-off-by: Tejun Heo <tj@kernel.org>
The PAPR says that the property for specifying the number of SLBs should
be called "slb-size". We currently only look for "ibm,slb-size" because
this is what firmware actually presents.
This patch makes us look for the "slb-size" property as well and in
preference to the "ibm,slb-size". This should future proof us if
firmware changes to match PAPR.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc: is_hugepage_only_range() must account for both 4kB and 64kB slices
The subpage_prot syscall fails on second and subsequent calls for a given
region, because is_hugepage_only_range() is mis-identifying the 4 kB
slices when the process has a 64 kB page size.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Fixes compiler warnings:
arch/powerpc/platforms/ps3/mm.c:1205: warning: passing argument 2 of 'ps3_repository_read_mm_info' from incompatible pointer type
arch/powerpc/platforms/ps3/mm.c:1205: warning: passing argument 3 of 'ps3_repository_read_mm_info' from incompatible pointer type
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This fixes these compiler warning:
arch/powerpc/platforms/ps3/interrupt.c:109: warning: passing argument 2 of 'clear_bit' from incompatible pointer type
arch/powerpc/platforms/ps3/interrupt.c:130: warning: passing argument 2 of 'set_bit' from incompatible pointer type
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We just fix up the reference parameters as the others are dealt with by
arithmetic promotion rules and don't cause warnings.
This removes warnings like this:
arch/powerpc/platforms/ps3/interrupt.c:327: warning: passing argument 1 of 'lv1_construct_event_receive_port' from incompatible pointer type
Also, these:
drivers/ps3/ps3-vuart.c:462: warning: passing argument 4 of 'ps3_vuart_raw_read' from incompatible pointer type
drivers/ps3/ps3-vuart.c:592: warning: passing argument 4 of 'ps3_vuart_raw_read' from incompatible pointer type
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Push the dma_addr_t type usage all the way down to where the actual
values are manipulated.
Now that u64 is "unsigned long long", this removes warnings like:
arch/powerpc/platforms/ps3/system-bus.c:532: warning: passing argument 4 of 'ps3_dma_map' from incompatible pointer type
arch/powerpc/platforms/ps3/system-bus.c:649: warning: passing argument 4 of 'ps3_dma_map' from incompatible pointer type
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Also silences this warning:
arch/powerpc/platforms/ps3/setup.c:275: warning: initialization from incompatible pointer type
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (29 commits)
powerpc/83xx: Move mcu_mpc8349emitx driver out of drivers/i2c/chips/
powerpc/83xx: Make serial ports work on MPC8315E-RDB w/ FSL U-Boots
powerpc/e500mc: Doorbells need to be taken w/exceptions disabled
powerpc: Enable PS3 options and QPACE in ppc64_defconfig
powerpc/powermac: Fix occasional SMP boot failure
powerpc/cacheinfo: Rename cache_dir per-cpu variable
hvc_console: Use kzalloc() instead of kmalloc() + memset()
hvc_console: Do not set low_latency when using interrupts
hvc_console: Call free_irq() only if request_irq() was successful
hvc_console: Change an mb() to smp_mb() and add some comments
powerpc: Cleanup from l64 to ll64 change: drivers/net
powerpc: Cleanup from l64 to ll64 change: drivers/char
powerpc: Cleanup from l64 to ll64 change: arch code
powerpc: Change u64/s64 to a long long integer type
powerpc/kexec: Check crash_base for relocatable kernel
powerpc: Make dummy section a valid note header
Xilinx: SPI: updated driver for device tree
drivers/of: Add the of_find_i2c_device_by_node function.
powerpc/xsysace: add compatible string for non-ipcore instance
powerpc/mpc52xx: remove dead code from GPIO driver
...
* 'syscalls' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: (44 commits)
[CVE-2009-0029] s390 specific system call wrappers
[CVE-2009-0029] System call wrappers part 33
[CVE-2009-0029] System call wrappers part 32
[CVE-2009-0029] System call wrappers part 31
[CVE-2009-0029] System call wrappers part 30
[CVE-2009-0029] System call wrappers part 29
[CVE-2009-0029] System call wrappers part 28
[CVE-2009-0029] System call wrappers part 27
[CVE-2009-0029] System call wrappers part 26
[CVE-2009-0029] System call wrappers part 25
[CVE-2009-0029] System call wrappers part 24
[CVE-2009-0029] System call wrappers part 23
[CVE-2009-0029] System call wrappers part 22
[CVE-2009-0029] System call wrappers part 21
[CVE-2009-0029] System call wrappers part 20
[CVE-2009-0029] System call wrappers part 19
[CVE-2009-0029] System call wrappers part 18
[CVE-2009-0029] System call wrappers part 17
[CVE-2009-0029] System call wrappers part 16
[CVE-2009-0029] System call wrappers part 15
...
Add swab.h to kbuild.asm and remove the individual entries from
each arch, mark as unifdef as some arches have some kernel-only
bits inside.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This enables the use of syscall wrappers to do proper sign extension
for 64-bit programs.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
This patch is used to help Jean Delvare to get rid of drivers/i2c/chips/
directory. The new location suggested by Kumar Gala: as the driver is
83xx specific it's placed into arch/powerpc/platforms/83xx/.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
FSL U-Boots use /soc8315@e0000000 node to search and fixup serial
nodes' clock-frequency properties. Though in upstream kernels we use
new naming convention -- for IMMR address space dts files specify
/immr@e0000000 nodes.
This makes FSL U-Boots fail to fixup the clock frequencies, and that
leads to serial ports misbehaviour. We can workaround the issue by
filling the clock frequency values manually.
p.s. For the same reason FSL U-Boots fail to fixup MAC addresses for
ethernet nodes, so users should either change the .dts file locally
or set MAC address via `ifconfig hw ether' command.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
We use Doorbell interrupts for IPIs and thus we need to make sure we aren't
interrupted in the process of processing the IPI.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Acked-by: Dave Liu <daveliu@freescale.com>
To increase the amount of code that's built for a defconfig build.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The PowerMac kernel occasionally fails to bring up the secondary CPUs on
SMP, the trigger factor seem to be fairly random and related to location
of code and data.
This appears to be due to the initial loading of the TOC value by the
secondary processor which now happens before we clear HID4:RM_CI (Real
Mode Cache Invalidate). This bit should really be cleared before we do
any load or store other than fetching code.
This fix works based on the assumption that all SMP 64-bit PowerMacs use
variants of the 970, which fortunately is true, by explicitely clearing
that bit, adding an slbia for good measure as RM_CI mode is known to
create bogus ERAT entries.
I also removed some spurrious debug output that was left enabled by
mistake while at it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The per_cpu__ prefix on DECLARE_PER_CPU'd variables is going away;
rename cache_dir to cache_dir_pcpu.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Convert arch/powerpc/ over to long long based u64:
-#ifdef __powerpc64__
-# include <asm-generic/int-l64.h>
-#else
-# include <asm-generic/int-ll64.h>
-#endif
+#include <asm-generic/int-ll64.h>
This will avoid reoccuring spurious warnings in core kernel code that
comes when people test on their own hardware. (i.e. x86 in ~98% of the
cases) This is what x86 uses and it generally helps keep 64-bit code
32-bit clean too.
[Adjusted to not impact user mode (from paulus) - sfr]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Enforce that the crash kernel region never overlaps the current kernel,
as it will be written directly on kexec load.
Also, default to the previous KDUMP_KERNELBASE if the start is 0.
Other architectures (x86, ia64) state that specifying the start address
0 (or omitting it) will result in the kernel allocating it. Before the
relocatable patch in 2.6.28, powerpc would adjust any other start value
to the hardcoded KDUMP_KERNELBASE of 32M.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We are declaring the dummy section (used to work around a binutils
bug) as PT_NOTE, but we don't have enough bytes for it to be a valid
note header, and kexec userspace complains:
Warning: Elf Note name is not null terminated
Warning: append= option is not passed. Using the first kernel root partition
Warning: Elf Note name is not null terminated
Instead of using the arbitray value 0xf177 (aka "fill"), declare a
no-name no-description note of type 0.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: cleanup, update to new cpumask API
Irq_desc.affinity and irq_desc.pending_mask are now cpumask_var_t's
so access to them should be using the new cpumask API.
Signed-off-by: Mike Travis <travis@sgi.com>
Support for the FPGA based watchdog timer on GE Fanuc's SBC610.
This patch enables one of the watchdog timers found on the SBC610. There are
two identical watchdog timers at different offsets in the above mentioned
boards, however the current driver is only capable of supporting one of them.
The watchdog timers are also capable of generating interrupts at a
user-configurable threshold, though support for this operation is currently
not supported by the driver.
Signed-off-by: Martyn Welch <martyn.welch@gefanuc.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
This updates the cpufreq drivers in arch/powerpc so they build again
after the core cpufreq changes that broke them in commit
in835481d9bcd65720b473db6b38746a74a3964218.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: build fix
Ingo Molnar wrote:
> tip/arch/blackfin/kernel/irqchip.c: In function 'show_interrupts':
> tip/arch/blackfin/kernel/irqchip.c:85: error: 'struct kernel_stat' has no member named 'irqs'
> make[2]: *** [arch/blackfin/kernel/irqchip.o] Error 1
> make[2]: *** Waiting for unfinished jobs....
>
So could move kstat_irqs array to irq_desc struct.
(s390, m68k, sparc) are not touched yet, because they don't support genirq
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The MPC5200 PIC driver doesn't correctly update the .status field of
the irq_desc structure when the set_type hook is called. This patch
adds the required code.
Also cleans up the external IRQ typename field to be something easier
to read (very minor).
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
strcmp on NULL results in a segmentation fault, also, remove the second,
redundant test on dev
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile: (31 commits)
powerpc/oprofile: fix whitespaces in op_model_cell.c
powerpc/oprofile: IBM CELL: add SPU event profiling support
powerpc/oprofile: fix cell/pr_util.h
powerpc/oprofile: IBM CELL: cleanup and restructuring
oprofile: make new cpu buffer functions part of the api
oprofile: remove #ifdef CONFIG_OPROFILE_IBS in non-ibs code
ring_buffer: fix ring_buffer_event_length()
oprofile: use new data sample format for ibs
oprofile: add op_cpu_buffer_get_data()
oprofile: add op_cpu_buffer_add_data()
oprofile: rework implementation of cpu buffer events
oprofile: modify op_cpu_buffer_read_entry()
oprofile: add op_cpu_buffer_write_reserve()
oprofile: rename variables in add_ibs_begin()
oprofile: rename add_sample() in cpu_buffer.c
oprofile: rename variable ibs_allowed to has_ibs in op_model_amd.c
oprofile: making add_sample_entry() inline
oprofile: remove backtrace code for ibs
oprofile: remove unused ibs macro
oprofile: remove unused components in struct oprofile_cpu_buffer
...
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (53 commits)
serial: Add driver for the Cell Network Processor serial port NWP device
powerpc: enable dynamic ftrace
powerpc/cell: Fix the prototype of create_vma_map()
powerpc/mm: Make clear_fixmap() actually work
powerpc/kdump: Use ppc_save_regs() in crash_setup_regs()
powerpc: Export cacheable_memzero as its now used in a driver
powerpc: Fix missing semicolons in mmu_decl.h
powerpc/pasemi: local_irq_save uses an unsigned long
powerpc/cell: Fix some u64 vs. long types
powerpc/cell: Use correct types in beat files
powerpc: Use correct type in prom_init.c
powerpc: Remove unnecessary casts
mtd/ps3vram: Use _PAGE_NO_CACHE in memory ioremap
mtd/ps3vram: Use msleep in waits
mtd/ps3vram: Use proper kernel types
mtd/ps3vram: Cleanup ps3vram driver messages
mtd/ps3vram: Remove ps3vram debug routines
mtd/ps3vram: Add modalias support to the ps3vram driver
mtd/ps3vram: Add ps3vram driver for accessing video RAM as MTD
powerpc: Fix iseries drivers build failure without CONFIG_VIOPATH
...
When I review ocfs2 code, find there are 2 typos to "successfull". After
doing grep "successfull " in kernel tree, 22 typos found totally -- great
minds always think alike :)
This patch fixes all the similar typos. Thanks for Randy's ack and comments.
Signed-off-by: Coly Li <coyli@suse.de>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Roland Dreier <rolandd@cisco.com>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds the SPU event based profiling funcitonality for the
IBM Cell processor. Previously, the CELL OProfile kernel code supported
PPU event, PPU cycle profiling and SPU cycle profiling. The addition of
SPU event profiling allows the users to identify where in their SPU code
various SPU evnets are occuring. This should help users further identify
issues with their code. Note, SPU profiling has some limitations due to HW
constraints. Only one event at a time can be used for profiling and SPU event
profiling must be time sliced across all of the SPUs in a node.
The patch adds a new arch specific file to the OProfile file system. The
file has bit 0 set to indicate that the kernel supports SPU event profiling.
The user tool must check this file/bit to make sure the kernel supports
SPU event profiling before trying to do SPU event profiling. The user tool
check is part of the user tool patch for SPU event profiling.
Signed-off-by: Carl Love <carll@us.ibm.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
This patch restructures and cleans up the code a bit to make it
easier to add new functionality later. The patch makes no
functional changes to the existing code.
Signed-off-by: Carl Love <carll@us.ibm.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
This patch enables dynamic ftrace. The PowerPC port was dependent on
other code not yet in mainline. Now that the code is, we can now
let PowerPC compile with dynamic ftrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The clear_fixmap() routine issues map_page() with flags set to 0.
Currently this causes a BUG_ON() inside the map_page(), as it assumes
that a PTE should be clear before mapping.
This patch makes the map_page() to trigger the BUG_ON() only if the
flags were set.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch replaces internal registers dump implementation with
ppc_save_regs(). From now on PPC64 and PPC32 are using the same
code for crash_setup_regs().
NOTE: The old regs dump implementation was capturing SP (r1) directly
as is, so you could see crash_kexec() function on top of the back-trace.
But ppc_save_regs() goes up one stack frame, so you'll not see it
anymore, at the top-level you'll see who actually triggered the crash
dump instead.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The Freescale PowerPC specific gianfar driver (gig-e) uses
cacheable_memzero for performance reasons we need to export
the symbol to allow the driver to be built as a module.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This is a brown paper bag from one of my earlier patches that
breaks build on 40x and 8xx.
And yes, I've now added 40x and 8xx to my list of test configs :-)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[Split from a larger patch - sfr]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
in/out_be64() work on u64s.
The first parameter to ppc_md.ioremap is a phys_addr_t.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Only pass the address of a u64 if that is what the function requires.
[Split out of a larger patch - sfr]
[update comment - sfr]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
tce_entryp is a "u64 *" not an "unsigned long *".
[Split from a large patch -sfr]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
of_get_flat_dt_prop() returns a "void *", so we don't need to cast when
assigning its result to a pointer variable.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Update ps3vram driver to use the new ps3 three id modalias support.
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add ps3vram driver, which exposes unused video RAM on the PS3 as a MTD
device suitable for storage or swap. Fast data transfer is achieved
using a local cache in system RAM and DMA transfers via the GPU.
Signed-off-by: Vivien Chappelier <vivien.chappelier@free.fr>
Signed-off-by: Jim Paris <jim@jtan.com>
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
iSeries dependent drivers fail to build, when CONFIG_VIOPATH is disabled.
Fix the problem by making those drivers select it.
Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Enable RELOCATABLE option if user selects CRASH_DUMP option. Without this
patch user has to first select RELOCATABLE option and then has to enable
CRASH_DUMP option.
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Dave Liu <daveliu@freescale.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
So change the flags member of struct spu from u64 to unsigned long.
This change will also prevent some warnings when we change u64 to unsigned
long long.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
These ioctls take a struct serial_rs485
(see linux/serial.h) as argument. They are already available
on x86. This patch adds them for the powerpc architecture.
Signed-off-by: Matthias Fuchs <mfuchs@ma-fu.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
X has been failing to start on my quad G5 powermac since commit
1fd0f52583 ("powerpc: Fix domain numbers
in /proc on 64-bit") went in. The reason is that the change allows X
to see the PCI-PCI bridge above the video card (previously it was
obscured by the fact that there were two "00" directories in
/proc/bus/pci), and the pciconfig_iobase system call on the bridge is
failing because of a hack that we have to return information about the
AGP bus when X asks about bus 0. This machine doesn't have an AGP bus
(it has PCI Express) and so the pciconfig_iobase call is returning -1,
which ultimately causes X to fail to start.
This fixes it by checking that we have an AGP bridge before
redirecting the pciconfig_iobase call to return information about the
AGP bus. With this, X starts successfully both on a quad G5 with
PCI Express and on an older dual G5 with AGP.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
SPIN_LOCK_UNLOCKED is deprecated. The following makes the change suggested
in Documentation/spinlocks.txt
The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
declarer name DEFINE_SPINLOCK;
identifier xxx_lock;
@@
- spinlock_t xxx_lock = SPIN_LOCK_UNLOCKED;
+ DEFINE_SPINLOCK(xxx_lock);
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
SPIN_LOCK_UNLOCKED is deprecated. The following makes the change suggested
in Documentation/spinlocks.txt
The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
declarer name DEFINE_SPINLOCK;
identifier xxx_lock;
@@
- spinlock_t xxx_lock = SPIN_LOCK_UNLOCKED;
+ DEFINE_SPINLOCK(xxx_lock);
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The current code for providing processor cache information in sysfs
has the following deficiencies:
- several complex functions that are hard to understand
- implicit recursion (cache_desc_release -> kobject_put -> cache_desc_release)
- explicit recursion (create_cache_index_info)
- use of two per-cpu arrays when one would suffice
- duplication of work on systems where CPUs share cache
Also, when I looked at implementing support for a shared_cpu_map
attribute, it was pretty much impossible to handle hotplug without
checking every single online CPU's cache_desc list and fixing things
up... not that this is a hot path, but it would have introduced
O(n^2)-ish behavior during boot. Addressing this involved rethinking
the core data structures used, which didn't lend itself to an
incremental approach.
This implementation maintains a "forest" (potentially more than one
tree) of cache objects which reflects the system's cache topology.
Cache objects are instantiated as needed as CPUs come online. A
per-cpu array is used mainly for sysfs-related bookkeeping; the
objects in the array just point to the appropriate points in the
forest.
This maintains compatibility with the existing code and includes some
enhancements:
- Implement the shared_cpu_map attribute, which is essential for
enabling userspace to discover the system's overall cache topology.
- Use cache-block-size properties if cache-line-size is not available.
I chose to place this implementation in a new file since it would have
roughly doubled the size of sysfs.c, which is already kind of messy.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The non-zero return from the prepare callback is returned by sys_kexec_load()
to userspace, indicating that kexec is not supported on the machine.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch makes the default install script (arch/powerpc/boot/install.sh)
copy the bootable image files into the install directory. Before this
patch only the vmlinux image file was copied.
This patch makes the default 'make install' command useful for embedded
development when $(INSTALL_PATH) is set in the environment.
As a side effect, this patch changes the calling convention of the
install.sh script. Instead of a single 5th parameter, the script is now
passed a list of all the target images stored in the $(image-y) Makefile
variable. This should be backwards compatible with existing install scripts
since it just adds additional arguments and does not change existing ones.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Both users of careful_allocation() immediately memset() the
result. So, just do it in one place.
Also give careful_allocation() a 'z' prefix to bring it in
line with kzmalloc() and friends.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Since we memset() the result in both of the uses here,
just make careful_alloc() return a virtual address.
Also, add a separate variable to store the physial
address that comes back from the lmb_alloc() functions.
This makes it less likely that someone will screw it up
forgetting to convert before returning since the vaddr
is always in a void* and the paddr is always in an
unsigned long.
I admit this is arbitrary since one of its users needs
a paddr and one a vaddr, but it does remove a good
number of casts.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
If we fail a bootmem allocation, the bootmem code itself
panics. No need to redo it here.
Also change the wording of the other panic. We don't
strictly have to allocate memory on the specified node.
It is just a hint and that node may not even *have* any
memory on it. In that case we can and do fall back to
other nodes.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The behavior in careful_allocation() really confused me
at first. Add a comment to hopefully make it easier
on the next doofus that looks at it.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch fixes some unbalanced OF node references in the
powermac code
Signed-off-by: Nicolas Palix <npalix@diku.dk>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There's a problem on some embedded platforms when we re-assign
everything on PCI, such as 44x. The generic code tries to avoid
assigning devices to addresses overlapping the low legacy
addresses such as VGA hard decoded areas using constants that
are unfortunately no good for us, as they don't take into account
the address translation we do to access PCI busses.
Thus we end up allocating things like IO BARs to 0, which is
technically legal, but will shadow hard decoded ports for use
by things like VGA cards.
This works around it by attempting to reserve legacy regions
before we try to assign addresses.
NOTE: This may have nasty side effects in cases I haven't tested
yet:
- We try to use FW mappings (ie. powermac) and the FW has allocated
a conflicting address over those legacy regions. This will typically
happen. I would expect the new code to just fail with an informative
message without harm but I haven't had a chance to test that scenario
yet.
- A device with fixed BARs overlapping those legacy addresses such
as an IDE controller in legacy mode is in the system. I don't know
for sure yet what will happen there, I have to test :-)
Ideally, we should change PCIBIOS_MIN_IO/MIN_MEM accross the board
to take a bus pointer so they can provide appropriate per-bus translated
values to the generic code but that's a more invasive patch. I will
do that in the future, but in the meantime, this fixes the problem
locally
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (98 commits)
PCI PM: Put PM callbacks in the order of execution
PCI PM: Run default PM callbacks for all devices using new framework
PCI PM: Register power state of devices during initialization
PCI PM: Call pci_fixup_device from legacy routines
PCI PM: Rearrange code in pci-driver.c
PCI PM: Avoid touching devices behind bridges in unknown state
PCI PM: Move pci_has_legacy_pm_support
PCI PM: Power-manage devices without drivers during suspend-resume
PCI PM: Add suspend counterpart of pci_reenable_device
PCI PM: Fix poweroff and restore callbacks
PCI: Use msleep instead of cpu_relax during ASPM link retraining
PCI: PCIe portdrv: Add kerneldoc comments to remining core funtions
PCI: PCIe portdrv: Rearrange code so that related things are together
PCI: PCIe portdrv: Fix suspend and resume of PCI Express port services
PCI: PCIe portdrv: Add kerneldoc comments to some core functions
x86/PCI: Do not use interrupt links for devices using MSI-X
net: sfc: Use pci_clear_master() to disable bus mastering
PCI: Add pci_clear_master() as opposite of pci_set_master()
PCI hotplug: remove redundant test in cpq hotplug
PCI: pciehp: cleanup register and field definitions
...
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: (123 commits)
wimax/i2400m: add CREDITS and MAINTAINERS entries
wimax: export linux/wimax.h and linux/wimax/i2400m.h with headers_install
i2400m: Makefile and Kconfig
i2400m/SDIO: TX and RX path backends
i2400m/SDIO: firmware upload backend
i2400m/SDIO: probe/disconnect, dev init/shutdown and reset backends
i2400m/SDIO: header for the SDIO subdriver
i2400m/USB: TX and RX path backends
i2400m/USB: firmware upload backend
i2400m/USB: probe/disconnect, dev init/shutdown and reset backends
i2400m/USB: header for the USB bus driver
i2400m: debugfs controls
i2400m: various functions for device management
i2400m: RX and TX data/control paths
i2400m: firmware loading and bootrom initialization
i2400m: linkage to the networking stack
i2400m: Generic probe/disconnect, reset and message passing
i2400m: host/device procotol and core driver definitions
i2400m: documentation and instructions for usage
wimax: Makefile, Kconfig and docbook linkage for the stack
...
This is a global variable defined in fsl_booke_mmu.c with a value that gets
initialized in assembly code in head_fsl_booke.S.
It's never used.
If some code ever does want to know the number of entries in TLB1, then
"numcams = mfspr(SPRN_TLB1CFG) & 0xfff", is a whole lot simpler than a
global initialized during kernel boot from assembly.
Signed-off-by: Trent Piepho <tpiepho@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Some assembly code in head_fsl_booke.S hard-coded the size of struct tlbcam
to 20 when it indexed the TLBCAM table. Anyone changing the size of struct
tlbcam would not know to expect that.
The kernel already has a system to get the size of C structures into
assembly language files, asm-offsets, so let's use it.
The definition of the struct gets moved to a header, so that asm-offsets.c
can include it.
Signed-off-by: Trent Piepho <tpiepho@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Provides a small speedup when accessing pefetchable ranges. To indicate
that a memory range is prefetchable, mark it in the dts file with 42000000
instead of 02000000.
A powepc pci_controller is allowed three memory ranges, any of which may be
prefetchable. However, the PCI-PCI bridge configuration space only has one
field for "non-prefetchable memory behind bridge", which has a 32 bit
address, and one field for "prefetchable memory behind bridge", which may
have a 64 bit address. These are PCI bus addresses, not CPU physical
addresses.
So really you're only allowed one memory range of each type. And if you
want the range at a PCI address above 32 bits you must make it
prefetchable.
Signed-off-by: Trent Piepho <tpiepho@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The code that sets up the outbound ATMU windows, which is used to map CPU
physical addresses into PCI bus addresses where BARs will be mapped, didn't
work so well.
For one, it leaked the ioremap() of the ATMU registers. Another small bug
was the high 20 bits of the PCI bus address were left as zero. It's legal
for prefetchable memory regions to be above 32 bits, so the high 20 bits
might not be zero.
Mainly, it couldn't handle ranges that were not a power of two in size or
were not naturally aligned. The ATMU windows have these requirements (size
& alignment), but the code didn't bother to check if the ranges it was
programming met them. If they didn't, the windows would silently be
programmed incorrectly.
This new code can handle ranges which are not power of two sized nor
naturally aligned. It simply splits the ranges into multiple valid ATMU
windows. As there are only four windows, pooly aligned or sized ranges
(which didn't even work before) may run out of windows. In this case an
error is printed and an effort is made to disable the unmapped resources.
An improvement that could be made would be to make use of the default
outbound window. Iff hose->pci_mem_offset is zero, then it's possible that
some or all of the ranges might not need an outbound window and could just
use the default window.
The default ATMU window can support a pci_mem_offset less than zero too,
but pci_mem_offset is unsigned. One could say the abilities allowed a
powerpc pci_controller is neither subset nor a superset of the abilities of
a Freescale PCIe controller. Thankfully, the most useful bits are in the
intersection of the two abilities.
Signed-off-by: Trent Piepho <tpiepho@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits)
trivial: chack -> check typo fix in main Makefile
trivial: Add a space (and a comma) to a printk in 8250 driver
trivial: Fix misspelling of "firmware" in docs for ncr53c8xx/sym53c8xx
trivial: Fix misspelling of "firmware" in powerpc Makefile
trivial: Fix misspelling of "firmware" in usb.c
trivial: Fix misspelling of "firmware" in qla1280.c
trivial: Fix misspelling of "firmware" in a100u2w.c
trivial: Fix misspelling of "firmware" in megaraid.c
trivial: Fix misspelling of "firmware" in ql4_mbx.c
trivial: Fix misspelling of "firmware" in acpi_memhotplug.c
trivial: Fix misspelling of "firmware" in ipw2100.c
trivial: Fix misspelling of "firmware" in atmel.c
trivial: Fix misspelled firmware in Kconfig
trivial: fix an -> a typos in documentation and comments
trivial: fix then -> than typos in comments and documentation
trivial: update Jesper Juhl CREDITS entry with new email
trivial: fix singal -> signal typo
trivial: Fix incorrect use of "loose" in event.c
trivial: printk: fix indentation of new_text_line declaration
trivial: rtc-stk17ta8: fix sparse warning
...
Use the generic pci_swizzle_interrupt_pin() instead of arch-specific code.
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
A published errata for ppc440epx states, that when running Linux with
both EHCI and OHCI modules loaded, the EHCI module experiences a fatal
error when a high-speed device is connected to the USB2.0, and
functions normally if OHCI module is not loaded.
There used to be recommendation to use only hi-speed or full-speed
devices with specific conditions, when respective module was unloaded.
Later, it was observed that ohci suspend is enough to keep things
going, and it was turned into workaround, as explained below.
Quote from original descriprion:
The 440EPx USB 2.0 Host controller is an EHCI compliant controller. In
USB 2.0 Host controllers, each EHCI controller has one or more companion
controllers, which may be OHCI or UHCI. An USB 2.0 Host controller will
contain one or more ports. For each port, only one of the controllers
is connected at any one time. In the 440EPx, there is only one OHCI
companion controller, and only one USB 2.0 Host port.
All ports on an USB 2.0 controller default to the companion
controller. If you load only an ohci driver, it will have control of
the ports and any deviceplugged in will operate, although high speed
devices will be forced to operate at full speed. When an ehci driver
is loaded, it explicitly takes control of the ports. If there is a
device connected, and / or every time there is a new device connected,
the ehci driver determines if the device is high speed or not. If it
is high speed, the driver retains control of the port. If it is not,
the driver explicitly gives the companion controller control of the
port.
The is a software workaround that uses
Initial version of the software workaround was posted to
linux-usb-devel:
http://www.mail-archive.com/linux-usb-devel@lists.sourceforge.net/msg54019.html
and later available from amcc.com:
http://www.amcc.com/Embedded/Downloads/download.html?cat=1&family=15&ins=2
The patch below is generally based on the latter, but reworked to
powerpc/of_device USB drivers, and uses a few devicetree inquiries to
get rid of (some) hardcoded defines.
Signed-off-by: Vitaly Bordug <vitb@kernel.crashing.org>
Signed-off-by: Stefan Roese <sr@denx.de>
Cc: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add the UCC_GETH_UPSMR_xxx definitions to qe.h. The ucc_geth driver will
eventually use these instead of the UPSMR_ macros it currently defines.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The general pci resume code can only restore part of the
configuration registers. We need to reconfigure those
registers in the FIXUP_RESUME.
Signed-off-by: Jason Jin <Jason.jin@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Add kprobe_insn_mutex for protecting kprobe_insn_pages hlist, and remove
kprobe_mutex from architecture dependent code.
This allows us to call arch_remove_kprobe() (and free_insn_slot) while
holding kprobe_mutex.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The atomic_t type cannot currently be used in some header files because it
would create an include loop with asm/atomic.h. Move the type definition
to linux/types.h to break the loop.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Show node to memory section relationship with symlinks in sysfs
Add /sys/devices/system/node/nodeX/memoryY symlinks for all
the memory sections located on nodeX. For example:
/sys/devices/system/node/node1/memory135 -> ../../memory/memory135
indicates that memory section 135 resides on node1.
Also revises documentation to cover this change as well as updating
Documentation/ABI/testing/sysfs-devices-memory to include descriptions
of memory hotremove files 'phys_device', 'phys_index', and 'state'
that were previously not described there.
In addition to it always being a good policy to provide users with
the maximum possible amount of physical location information for
resources that can be hot-added and/or hot-removed, the following
are some (but likely not all) of the user benefits provided by
this change.
Immediate:
- Provides information needed to determine the specific node
on which a defective DIMM is located. This will reduce system
downtime when the node or defective DIMM is swapped out.
- Prevents unintended onlining of a memory section that was
previously offlined due to a defective DIMM. This could happen
during node hot-add when the user or node hot-add assist script
onlines _all_ offlined sections due to user or script inability
to identify the specific memory sections located on the hot-added
node. The consequences of reintroducing the defective memory
could be ugly.
- Provides information needed to vary the amount and distribution
of memory on specific nodes for testing or debugging purposes.
Future:
- Will provide information needed to identify the memory
sections that need to be offlined prior to physical removal
of a specific node.
Symlink creation during boot was tested on 2-node x86_64, 2-node
ppc64, and 2-node ia64 systems. Symlink creation during physical
memory hot-add tested on a 2-node x86_64 system.
Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The KernelPageSize entry in /proc/pid/smaps is the pagesize used by the
kernel to back a VMA. This matches the size used by the MMU in the
majority of cases. However, one counter-example occurs on PPC64 kernels
whereby a kernel using 64K as a base pagesize may still use 4K pages for
the MMU on older processor. To distinguish, this patch reports
MMUPageSize as the pagesize used by the MMU in /proc/pid/smaps.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix misspelling of "firmware" in powerpc Makefile
It's spelled "firmware".
Signed-off-by: Nick Andrew <nick@nick-andrew.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
... and don't bother in callers. Don't bother with zeroing i_blocks,
while we are at it - it's already been zeroed.
i_mode is not worth the effort; it has no common default value.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (77 commits)
x86: setup_per_cpu_areas() cleanup
cpumask: fix compile error when CONFIG_NR_CPUS is not defined
cpumask: use alloc_cpumask_var_node where appropriate
cpumask: convert shared_cpu_map in acpi_processor* structs to cpumask_var_t
x86: use cpumask_var_t in acpi/boot.c
x86: cleanup some remaining usages of NR_CPUS where s/b nr_cpu_ids
sched: put back some stack hog changes that were undone in kernel/sched.c
x86: enable cpus display of kernel_max and offlined cpus
ia64: cpumask fix for is_affinity_mask_valid()
cpumask: convert RCU implementations, fix
xtensa: define __fls
mn10300: define __fls
m32r: define __fls
h8300: define __fls
frv: define __fls
cris: define __fls
cpumask: CONFIG_DISABLE_OBSOLETE_CPUMASK_FUNCTIONS
cpumask: zero extra bits in alloc_cpumask_var_node
cpumask: replace for_each_cpu_mask_nr with for_each_cpu in kernel/time/
cpumask: convert mm/
...
* 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits)
x86: export vector_used_by_percpu_irq
x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and()
sched: nominate preferred wakeup cpu, fix
x86: fix lguest used_vectors breakage, -v2
x86: fix warning in arch/x86/kernel/io_apic.c
sched: fix warning in kernel/sched.c
sched: move test_sd_parent() to an SMP section of sched.h
sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0
sched: activate active load balancing in new idle cpus
sched: bias task wakeups to preferred semi-idle packages
sched: nominate preferred wakeup cpu
sched: favour lower logical cpu number for sched_mc balance
sched: framework for sched_mc/smt_power_savings=N
sched: convert BALANCE_FOR_xx_POWER to inline functions
x86: use possible_cpus=NUM to extend the possible cpus allowed
x86: fix cpu_mask_to_apicid_and to include cpu_online_mask
x86: update io_apic.c to the new cpumask code
x86: Introduce topology_core_cpumask()/topology_thread_cpumask()
x86: xen: use smp_call_function_many()
x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c
...
Fixed up trivial conflict in kernel/time/tick-sched.c manually
* 'kvm-updates/2.6.29' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (140 commits)
KVM: MMU: handle large host sptes on invlpg/resync
KVM: Add locking to virtual i8259 interrupt controller
KVM: MMU: Don't treat a global pte as such if cr4.pge is cleared
MAINTAINERS: Maintainership changes for kvm/ia64
KVM: ia64: Fix kvm_arch_vcpu_ioctl_[gs]et_regs()
KVM: x86: Rework user space NMI injection as KVM_CAP_USER_NMI
KVM: VMX: Fix pending NMI-vs.-IRQ race for user space irqchip
KVM: fix handling of ACK from shared guest IRQ
KVM: MMU: check for present pdptr shadow page in walk_shadow
KVM: Consolidate userspace memory capability reporting into common code
KVM: Advertise the bug in memory region destruction as fixed
KVM: use cpumask_var_t for cpus_hardware_enabled
KVM: use modern cpumask primitives, no cpumask_t on stack
KVM: Extract core of kvm_flush_remote_tlbs/kvm_reload_remote_mmus
KVM: set owner of cpu and vm file operations
anon_inodes: use fops->owner for module refcount
x86: KVM guest: kvm_get_tsc_khz: return khz, not lpj
KVM: MMU: prepopulate the shadow on invlpg
KVM: MMU: skip global pgtables on sync due to cr3 switch
KVM: MMU: collapse remote TLB flushes on root sync
...
Impact: New API
The old topology_core_siblings() and topology_thread_siblings() return
a cpumask_t; these new ones return a (const) struct cpumask *.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
struct dentry is one of the most critical structures in the kernel. So it's
sad to see it going neglected.
With CONFIG_PROFILING turned on (which is probably the common case at least
for distros and kernel developers), sizeof(struct dcache) == 208 here
(64-bit). This gives 19 objects per slab.
I packed d_mounted into a hole, and took another 4 bytes off the inline
name length to take the padding out from the end of the structure. This
shinks it to 200 bytes. I could have gone the other way and increased the
length to 40, but I'm aiming for a magic number, read on...
I then got rid of the d_cookie pointer. This shrinks it to 192 bytes. Rant:
why was this ever a good idea? The cookie system should increase its hash
size or use a tree or something if lookups are a problem. Also the "fast
dcookie lookups" in oprofile should be moved into the dcookie code -- how
can oprofile possibly care about the dcookie_mutex? It gets dropped after
get_dcookie() returns so it can't be providing any sort of protection.
At 192 bytes, 21 objects fit into a 4K page, saving about 3MB on my system
with ~140 000 entries allocated. 192 is also a multiple of 64, so we get
nice cacheline alignment on 64 and 32 byte line systems -- any given dentry
will now require 3 cachelines to touch all fields wheras previously it
would require 4.
I know the inline name size was chosen quite carefully, however with the
reduction in cacheline footprint, it should actually be just about as fast
to do a name lookup for a 36 character name as it was before the patch (and
faster for other sizes). The memory footprint savings for names which are
<= 32 or > 36 bytes long should more than make up for the memory cost for
33-36 byte names.
Performance is a feature...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The only significant changes were to kvmppc_exit_timing_write() and
kvmppc_exit_timing_show(), both of which were dramatically simplified.
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Existing KVM statistics are either just counters (kvm_stat) reported for
KVM generally or trace based aproaches like kvm_trace.
For KVM on powerpc we had the need to track the timings of the different exit
types. While this could be achieved parsing data created with a kvm_trace
extension this adds too much overhead (at least on embedded PowerPC) slowing
down the workloads we wanted to measure.
Therefore this patch adds a in-kernel exit timing statistic to the powerpc kvm
code. These statistic is available per vm&vcpu under the kvm debugfs directory.
As this statistic is low, but still some overhead it can be enabled via a
.config entry and should be off by default.
Since this patch touched all powerpc kvm_stat code anyway this code is now
merged and simplified together with the exit timing statistic code (still
working with exit timing disabled in .config).
Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Store shadow TLB entries in memory, but only use it on host context switch
(instead of every guest entry). This improves performance for most workloads on
440 by reducing the guest TLB miss rate.
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Formerly, we used to maintain a per-vcpu shadow TLB and on every entry to the
guest would load this array into the hardware TLB. This consumed 1280 bytes of
memory (64 entries of 16 bytes plus a struct page pointer each), and also
required some assembly to loop over the array on every entry.
Instead of saving a copy in memory, we can just store shadow mappings directly
into the hardware TLB, accepting that the host kernel will clobber these as
part of the normal 440 TLB round robin. When we do that we need less than half
the memory, and we have decreased the exit handling time for all guest exits,
at the cost of increased number of TLB misses because the host overwrites some
guest entries.
These savings will be increased on processors with larger TLBs or which
implement intelligent flush instructions like tlbivax (which will avoid the
need to walk arrays in software).
In addition to that and to the code simplification, we have a greater chance of
leaving other host userspace mappings in the TLB, instead of forcing all
subsequent tasks to re-fault all their mappings.
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
KVM currently ignores the host's round robin TLB eviction selection, instead
maintaining its own TLB state and its own round robin index. However, by
participating in the normal 44x TLB selection, we can drop the alternate TLB
processing in KVM. This results in a significant performance improvement,
since that processing currently must be done on *every* guest exit.
Accordingly, KVM needs to be able to access and increment tlb_44x_index.
(KVM on 440 cannot be a module, so there is no need to export this symbol.)
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
KVM on 440 has always been able to handle large guest mappings with 4K host
pages -- we must, since the guest kernel uses 256MB mappings.
This patch makes KVM work when the host has large pages too (tested with 64K).
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
We used to defer invalidating userspace TLB entries until jumping out of the
kernel. This was causing MMU weirdness most easily triggered by using a pipe in
the guest, e.g. "dmesg | tail". I believe the problem was that after the guest
kernel changed the PID (part of context switch), the old process's mappings
were still present, and so copy_to_user() on the "return to new process" path
ended up using stale mappings.
Testing with large pages (64K) exposed the problem, probably because with 4K
pages, pressure on the TLB faulted all process A's mappings out before the
guest kernel could insert any for process B.
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Bare metal Linux on 440 can "overmap" RAM in the kernel linear map, so that it
can use large (256MB) mappings even if memory isn't a multiple of 256MB. To
prevent the hardware prefetcher from loading from an invalid physical address
through that mapping, it's marked Guarded.
However, KVM must ensure that all guest mappings are backed by real physical
RAM (since a deliberate access through a guarded mapping could still cause a
machine check). Accordingly, we don't need to make our mappings guarded, so
let's allow prefetching as the designers intended.
Curiously this patch didn't affect performance at all on the quick test I
tried, but it's clearly the right thing to do anyways and may improve other
workloads.
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Make sure that CONFIG_KVM cannot be selected without processor support
(currently, 440 is the only processor implementation available).
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
set ESR[PTR] when emulating a guest trap. This allows Linux guests to
properly handle WARN_ON() (i.e. detect that it's a non-fatal trap).
Also remove debugging printk in trap emulation.
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>