This adds markers two important points in the spufs code and a new
module (sputrace.ko) that allows reading these out through a proc file.
Long-term I'd rather see something like lttng extended to use the spufs
instrumentation, but for now I think this is a good enough quick
solution. We'll probably want to add various addition event in addition
to that ones I have already.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Commit aed3a8c9bb introduced a
definition of notify_spus_active in .../cell/spu_syscalls.c, and
another definition under #ifndef MODULE in .../cell/spufs/sched.c.
The latter is not necessary and causes the build to fail when
CONFIG_SPU_FS=y, so this removes it. It also removes the export
of do_notify_spus_active, which is unnecessary.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
This removes an OProfile dependency on the spufs module. This
dependency was causing a problem for multiplatform systems that are
built with support for Oprofile on Cell but try to load the oprofile
module on a non-Cell system.
Signed-off-by: Bob Nelson <rrnelson@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Based on an original patch from Arnd Bergmann
<arnd.bergmann@de.ibm.com>
If there's no entry in the mailbox, then a read on the _info file will
return data from an uninitialised variable.
This change returns EOF if there's no mailbox info available instead.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This fixes the behavior of spufs when a spu tries a DMA operation
based on a wrong / unavailable address.
Instead of just generating a SIGBUS signal, spufs now
generates a SIGSEGV signal and restarts the problematic DMA operation
after the execution of the application's signal handler. This allows
applications to employ user-level paging systems.
Although the restart_dma function is called before the application's
signal handler, the operation is not actually performed at this time,
since the spu context is already stopped. The operation only takes
place when spu_run is restarted (which happens automatically).
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The original spusched_timer was designed to take effect only when
a context is waiting in the runqueue.
This change adds an additional lower-freq timer has been added to
purely handle the spu_load updates. The new timer will be triggered
per LOAD_FREQ ticks.
Signed-off-by: Aegis Lin <aegislin@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Make most places that use spu_acquire/spu_acquire_saved interruptible,
this allows getting out of the spufs code when e.g. pressing ctrl+c.
There are a few places where we get called e.g. from spufs teardown
routines were we can't simply err out so these are left with a comment.
For now I've also not touched the poll routines because it's open what
libspe would expect in terms of interrupted system calls.
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The simple attr macros currently used by spufs can't deal with the
handlers returning errors, which is required to make the state_mutex
interruptible. This adds a local copy that allows for an error
return from the get/set handlers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Change spufs_spu_run so that the context is queued directly to the
scheduler and the controlling thread advances directly to spufs_wait()
for spe errors and exceptions.
nosched contexts are treated the same as before.
Fixes from Christoph Hellwig <hch@lst.de>
Signed-off-by: Luke Browning <lukebr@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This changes the spu context switch code to not write to reserved bits
of spu interrupt status register.
The architecture book says the reserved fields should be set to zero.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Need to re-check priority after dropping lock. Otherwise, a
more favored context may be preempted.
Signed-off-by: Luke Browning <lukebr@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This cleans up spu_run_init so that it does all of the spu
initialization for spufs_run_spu. It initializes the spu context as
much as possible before it activates the spu and writes the runcntl
register.
Signed-off-by: Luke Browning <lukebr@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Based on original patches from
Arnd Bergmann <arnd.bergman@de.ibm.com>; and
Luke Browning <lukebr@linux.vnet.ibm.com>
Currently, spu contexts need to be loaded to the SPU in order to take
class 0 and class 1 exceptions.
This change makes the actual interrupt-handlers much simpler (ie,
set the exception information in the context save area), and defers the
handling code to the spufs_handle_class[01] functions, called from
spufs_run_spu.
This should improve the concurrency of the spu scheduling leading to
greater SPU utilization when SPUs are overcommited.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add a few #defines for the class 0, 1 and 2 interrupt status bits, and
use them instead of magic numbers when we're setting or checking for
these interrupts.
Also, add a #define for the class 2 mailbox threshold interrupt mask.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When doing a poll on the mbox stat file of a swapped-out context, we
clear the class 0 interrupt status, rather than the class 2 interrupt
status.
This change corrects the poll operation to clear the correct interrupt.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This change encapsulates the spu_privcntl_RW register so that it can
be written through backing ops. This is necessary so that spu contexts
can be initialized and queued to the scheduler in spufs_run_spu.
Signed-off-by: Luke Browning <lukebr@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This change disables the logic that faults-in spu contexts under the
covers from the page fault handler. When a fault requires a runnable
context, the handler will block until the context is scheduled by
other means.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently, part of the spufs code (switch.o, lscsa_alloc.o and fault.o)
is compiled directly into the kernel.
This change moves these components of spufs into the kernel.
The lscsa and switch objects are fairly straightforward to move in.
For the fault.o module, we split the fault-handling code into two
parts: a/p/p/c/spu_fault.c and a/p/p/c/spufs/fault.c. The former is for
the in-kernel spu_handle_mm_fault function, and we move the rest of the
fault-handling code into spufs.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Fix a few typos in the spufs scheduler comments
Signed-off-by: Julio M. Merino Vidal <jmerino@ac.upc.edu>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add platform specific SPU run control routines to the spufs. The current
spufs implementation uses the SPU master run control bit (MFC_SR1[S]) to
control SPE execution, but the PS3 hypervisor does not support the use of
this feature.
This change adds the run control wrapper routies spu_enable_spu() and
spu_disable_spu(). The bare metal routines use the master run control
bit, and the PS3 specific routines use the priv2 run control register.
An outstanding enhancement for the PS3 would be to add a guard to check
for incorrect access to the spu problem state when the spu context is
disabled. This check could be implemented with a flag added to the spu
context that would inhibit mapping problem state pages, and a routine
to unmap spu problem state pages. When the spu is enabled with
ps3_enable_spu() the flag would be set allowing pages to be mapped,
and when the spu is disabled with ps3_disable_spu() the flag would be
cleared and mapped problem state pages would be unmapped.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently, we have a possibilty that the SLBs setup during context
switch don't cover the entirety of the necessary lscsa and code
regions, if these regions cross a segment boundary.
This change checks the start and end of each region, and inserts a SLB
entry for each, if unique. We also remove the assumption that the
spu_save_code and spu_restore_code reside in the same segment, by using
the specific code array for save and restore.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Add a function spu_64k_pages_available(), so that we can abstract the
explicity use of mmu_psize_defs() in lssca_alloc.c
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Currently, the SPU context switch code (spufs/switch.c) sets up the
SPU's SLBs directly, which requires some low-level mm stuff.
This change moves the kernel SLB setup to spu_base.c, by exposing
a function spu_setup_kernel_slbs() to do this setup. This allows us
to remove the low-level mm code from switch.c, making it possible
to later move switch.c to the spufs module.
Also, add a struct spu_slb for the cases where we need to deal with
SLB entries.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
We can currently cause an oops by repeatedly creating and destroying
contexts, while doing getdents() calls on the "/spu" directory.
This is due to the context's top-level dentry remaining hashed while
the context is being destroyed.
Fix this by unhashing the context's dentry with the
dentry->d_inode->i_mutex held. This way, we'll hit the check for
d_unhashed in dentry_readdir, and won't be included in the
list of subdirs for /spu.
test: spufs-testsuite:tests/01-spu_create/07-destroy-vs-readdir-race
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* Convert files to UTF-8.
* Also correct some people's names
(one example is Eißfeldt, which was found in a source file.
Given that the author used an ß at all in a source file
indicates that the real name has in fact a 'ß' and not an 'ss',
which is commonly used as a substitute for 'ß' when limited to
7bit.)
* Correct town names (Goettingen -> Göttingen)
* Update Eberhard Mönkeberg's address (http://lkml.org/lkml/2007/1/8/313)
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Slab constructors currently have a flags parameter that is never used. And
the order of the arguments is opposite to other slab functions. The object
pointer is placed before the kmem_cache pointer.
Convert
ctor(void *object, struct kmem_cache *s, unsigned long flags)
to
ctor(struct kmem_cache *s, void *object)
throughout the kernel
[akpm@linux-foundation.org: coupla fixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This makes the kernel use 1TB segments for all kernel mappings and for
user addresses of 1TB and above, on machines which support them
(currently POWER5+, POWER6 and PA6T).
We detect that the machine supports 1TB segments by looking at the
ibm,processor-segment-sizes property in the device tree.
We don't currently use 1TB segments for user addresses < 1T, since
that would effectively prevent 32-bit processes from using huge pages
unless we also had a way to revert to using 256MB segments. That
would be possible but would involve extra complications (such as
keeping track of which segment size was used when HPTEs were inserted)
and is not addressed here.
Parts of this patch were originally written by Ben Herrenschmidt.
Signed-off-by: Paul Mackerras <paulus@samba.org>
The commit 8b6f50ef1d seems to have
been affected by a mismerge of a duplicate patch
(d054b36ffd) - both the
spufs_dir_contents and spufs_dir_nosched_contents have been given
write-only signal notification files.
This change reverts the spufs_dir_contents array to use the
readable signal notification file implementation.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
find_victim can dereference a NULL pointer when iterating over the list
of victim spus because list_mutex only guarantees spu->ct to be stable,
but of course not to be non-NULL.
Also fix find_victim to not call spu_unbind_context without list_mutex
because that violates the above guarantee.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds DEFINE_SPUFS_ATTRIBUTE(), a wrapper around
DEFINE_SIMPLE_ATTRIBUTE which does the specified locking for the get
routine for us.
Unfortunately we need two get routines (a locked and unlocked version) to
support the coredump code. This hides one of those (the locked version)
inside the macro foo.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently the spu coredump code doesn't respect the ulimit, it should.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Rework spufs_coredump_extra_notes_write() to check for and return errors.
If we're coredumping to a pipe we can't trust file->f_pos, we need to
maintain the foffset value passed to us. The cleanest way to do this is
to have the low level write routine increment foffset when we've
successfully written.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Because spufs might be built as a module, we can't have other parts of the
kernel calling directly into it, we need stub routines that check first if the
module is loaded.
Currently we have two structures which hold callbacks for these stubs, the
syscalls are in spufs_calls and the coredump calls are in spufs_coredump_calls.
In both cases the logic for registering/unregistering is essentially the same,
so we can simplify things by combining the two.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The SPUFS attribute get routines take a void * because the generic attribute
code doesn't know what sort of data it's passing around.
However our internal __spufs_get_foo() routines can take a spu_context *
directly, which saves plonking it in and out of a void * again.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The spufs_coredump_read array is NULL terminated, and we also store the size.
We only need one or the other, and the other arrays in file.c are NULL
terminated, so do that.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The routine to dump the local store, __spufs_mem_read(), does not take the
spu_lslr_RW value into account - so we shouldn't check it when we're
calculating the size either.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Unfortunately GDB expects some of the SPU coredump values to be identical
in format to what is found in spufs. This means we need to dump some of
the values as ASCII strings, not the actual values.
Because we don't know what the values will be, we always print the values
with the format "0x%.16lx", that way we know the result will be 19 bytes.
do_coredump_read() doesn't take a __user buffer, so remove the annotation,
and because we know that it's safe to just snprintf() directly to it.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The spufs_coredump_reader array contains the size of the data that will be
returned by the read routine. Currently these are specified as literals,
and though some are obvious, sizeof(u32) == 4, others are not, 69 * 8 == ???
Instead, use sizeof() whatever type is returned by each routine, or in
the case of spufs_mem_read() the #define LS_SIZE.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
It makes sense to stop the SPU processes as soon as possible. Also if we
dont acquire_saved() I think there's a possibility that the value in
csa.priv2.spu_lslr_RW won't be accurate.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Remove the ctx_info struct entirely, and also the ctx_info_list. This
fixes a race where two processes can clobber each other's ctx_info structs.
Instead of using the list, we just repeat the search through the file
descriptor table.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Extract the logic for searching through the file descriptors for spu contexts
into a separate routine, coredump_next_context(), so we can use it elsewhere
in future. In the process we flatten the for loop, and move the NOSCHED test
into coredump_next_context().
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Based on an original patch from Masato Noguchi
<Masato.Noguchi@jp.sony.com>.
We're currently not restoring the SPE decrementer as specified by the
CBE handbook. This change fixes our implementation to match, and makes
the function read more like the docs.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
At present, a built-in spufs will not use the spufs_calls callbacks, but
directly call sys_spu_create. This saves us an indirect branch, but
means we have duplicated functions - one for CONFIG_SPU_FS=y and one for
=m.
This change unifies the spufs syscall path, and provides access to the
spufs_calls structure through a get/put pair. At present, the only user
of the spufs_calls structure is spu_syscalls.c, but this will facilitate
adding the coredump calls later.
Everyone likes numbers, right? Here's a before/after comparison with
CONFIG_SPU_FS=y, doing spu_create(); close(); 64k times.
Before:
[jk@cell ~]$ time ./spu_create
performing 65536 spu_create calls
real 0m24.075s
user 0m0.146s
sys 0m23.925s
After:
[jk@cell ~]$ time ./spu_create
performing 65536 spu_create calls
real 0m24.777s
user 0m0.141s
sys 0m24.631s
So, we're adding around 11us per syscall, at the benefit of having
only one syscall path.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Affinity reference point location (gang->aff_ref_spu) is reset
when the whole gang is descheduled. However, the last member of
a gang can be descheduled while we are trying to schedule another
member of the gang. This was leading to a race condition, and
the code was using gang->aff_ref_spu in an unsafe manner.
By holding the gang->aff_mutex a little bit longer, and increment
gang->aff_sched_count (which controls when gang->aff_ref_spu
should be reset) a little bit earlier, the problem is fixed.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
According to the comment in spufs_init_isolated_loader(), the isolated
loader should be aligned on a 16 byte boundary.
ARCH_{KMALLOC,SLAB}_MINALIGN is not defined so only 8 byte alignment is
guaranteed.
This enforces alignment via __get_free_pages.
Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Based on an initial patch from Sebastian Siewior
<sebastian@breakpoint.cc>
spu_harvest isn't used, remove it.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>