If we call spu_remove_from_active_list that spu is always guaranteed
to be on the active list and in runnable state, so we can simply
do a list_del to remove it and unconditionally take the was_active
codepath.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
There is no need to directly wake up contexts in spu_activate when
called from spu_run, so add a flag to surpress this wakeup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This is the biggest patch in this series, and it reworks the guts of
the spu scheduler runqueue mechanism:
- instead of embedding a waitqueue in the runqueue there is now a
simple doubly-linked list, the actual wakeups happen by reusing
the stop_wq in the spu context (maybe we should rename it one day)
- spu_free and spu_prio_wakeup are merged into a single spu_reschedule
function
- various functionality is split out into small helpers, and kerneldoc
comments are added in various places to document what's going on.
- spu_activate is rewritten into a tight loop by removing test for
various impossible conditions and using the infrastructure in this
patch.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
It doesn't make any sense to have a priority field in the physical spu
structure. Move it into the spu context instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
The r/w semaphore to lock the spus was overkill and can be replaced
with a mutex to make it faster, simpler and easier to debug. It also
helps to allow making most spufs interruptible in future patches.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Various cleanups to sched.c that don't change the global control flow:
- add kerneldoc comments to various functions
- add spu_ prefixes to various functions
- add/remove context from the runqueue in bind/unbind_context as
it's part of the logical operation
- add a call to put_active_spu to spu_unbind_contex as it's logically
part of the unbind operation
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Only bind_context/unbind_context change the spu context state. Thus
we can move all assignents of SPU_STATE_RUNNABLE into bind_context,
which parallels the unbind side aswell.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
unbind_context already sets the context state to SPU_STATE_SAVED, thus
the spu_deactivate callers don't need to do it again.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This adds an 'object-id' file that the spe library can
use to store a pointer to its ELF object. This was
originally meant for use by oprofile, but is now
also used by the GNU debugger, if available.
In order for oprofile to find the location in an spu-elf
binary where an event counter triggered, we need a way
to identify the binary in the first place.
Unfortunately, that binary itself can be embedded in a
powerpc ELF binary. Since we can assume it is mapped into
the effective address space of the running process,
have that one write the pointer value into a new spufs
file.
When a context switch occurs, pass the user value to
the profiler so that can look at the mapped file (with
some care).
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This tries to fix spufs so we have an interface closer to what is
specified in the man page for events returned in the third argument of
spu_run.
Fortunately, libspe has never been using the returned contents of that
register, as they were the same as the return code of spu_run (duh!).
Unlike the specification that we never implemented correctly, we now
require a SPU_CREATE_EVENTS_ENABLED flag passed to spu_create, in
order to get the new behavior. When this flag is not passed, spu_run
will simply ignore the third argument now.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds NUMA support to the the spufs scheduler.
The new arch/powerpc/platforms/cell/spufs/sched.c is greatly
simplified, in an attempt to reduce complexity while adding
support for NUMA scheduler domains. SPUs are allocated starting
from the calling thread's node, moving to others as supported by
current->cpus_allowed. Preemption is gone as it was buggy, but
should be re-enabled in another patch when stable.
The new arch/powerpc/platforms/cell/spu_base.c maintains idle
lists on a per-node basis, and allows caller to specify which
node(s) an SPU should be allocated from, while passing -1 tells
spu_alloc() that any node is allowed.
Since the patch removes the currently implemented preemptive
scheduling, it is technically a regression, but practically
all users have since migrated to this version, as it is
part of the IBM SDK and the yellowdog distribution, so there
is not much point holding it back while the new preemptive
scheduling patch gets delayed further.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This changes the hypervisor abstraction of setting cpu affinity to a
higher level to avoid platform dependent interrupt controller
routines. I replaced spu_priv1_ops:spu_int_route_set() with a
new routine spu_priv1_ops:spu_cpu_affinity_set().
As a by-product, this change eliminated what looked like an
existing bug in the set affinity code where spu_int_route_set()
mistakenly called int_stat_get().
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds a new file called 'mfc' to each spufs directory.
The file accepts DMA commands that are a subset of what would
be legal DMA commands for problem state register access. Upon
reading the file, a bitmask is returned with the completed
tag groups set.
The file is meant to be used from an abstraction in libspe
that is added by a different patch.
From the kernel perspective, this means a process can now
offload a memory copy from or into an SPE local store
without having to run code on the SPE itself.
The transfer will only be performed while the SPE is owned
by one thread that is waiting in the spu_run system call
and the data will be transferred into that thread's
address space, independent of which thread started the
transfer.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
For far, all SPU triggered interrupts always end up on
the first SMT thread, which is a bad solution.
This patch implements setting the affinity to the
CPU that was running last when entering execution on
an SPU. This should result in a significant reduction
in IPI calls and better cache locality for SPE thread
specific data.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
checking bits manually might not be synchonized with
the use of set_bit/clear_bit. Make sure we always use
the correct bitops by removing the unnecessary
identifiers.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
One of my last patches contained a broken line
from splitting out some other changes, this
restores a working version.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch reduces lock complexity of SPU scheduler, particularly
for involuntary preemptive switches. As a result the new code
does a better job of mapping the highest priority tasks to SPUs.
Lock complexity is reduced by using the system default workqueue
to perform involuntary saves. In this way we avoid nasty lock
ordering problems that the previous code had. A "minimum timeslice"
for SPU contexts is also introduced. The intent here is to avoid
thrashing.
While the new scheduler does a better job at prioritization it
still does nothing for fairness.
From: Mark Nutter <mnutter@us.ibm.com>
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch makes it easier to preempt an SPU context by
having the scheduler hold ctx->state_sema for much shorter
periods of time.
As part of this restructuring, the control logic for the "run"
operation is moved from arch/ppc64/kernel/spu_base.c to
fs/spufs/file.c. Of course the base retains "bottom half"
handlers for class{0,1} irqs. The new run loop will re-acquire
an SPU if preempted.
From: Mark Nutter <mnutter@us.ibm.com>
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
spufs is rather noisy when debugging is enabled, this
turns off the messages for production use.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds a scheduler for SPUs to make it possible to use
more logical SPUs than physical ones are present in the
system.
Currently, there is no support for preempting a running
SPU thread, they have to leave the SPU by either triggering
an event on the SPU that causes it to return to the
owning thread or by sending a signal to it.
This patch also adds operations that enable accessing an SPU
in either runnable or saved state. We use an RW semaphore
to protect the state of the SPU from changing underneath
us, while we are holding it readable. In order to change
the state, it is acquired writeable and a context save
or restore is executed before downgrading the semaphore
to read-only.
From: Mark Nutter <mnutter@us.ibm.com>,
Uli Weigand <Ulrich.Weigand@de.ibm.com>
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>