2016-07-20 16:21:08 +08:00
|
|
|
/*
|
2018-02-21 17:56:36 +08:00
|
|
|
* Copyright © 2008-2018 Intel Corporation
|
2016-07-20 16:21:08 +08:00
|
|
|
*
|
|
|
|
* Permission is hereby granted, free of charge, to any person obtaining a
|
|
|
|
* copy of this software and associated documentation files (the "Software"),
|
|
|
|
* to deal in the Software without restriction, including without limitation
|
|
|
|
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
|
|
|
* and/or sell copies of the Software, and to permit persons to whom the
|
|
|
|
* Software is furnished to do so, subject to the following conditions:
|
|
|
|
*
|
|
|
|
* The above copyright notice and this permission notice (including the next
|
|
|
|
* paragraph) shall be included in all copies or substantial portions of the
|
|
|
|
* Software.
|
|
|
|
*
|
|
|
|
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
|
|
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
|
|
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
|
|
|
|
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
|
|
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
|
|
|
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
|
|
|
* IN THE SOFTWARE.
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
|
2018-02-21 17:56:36 +08:00
|
|
|
#ifndef I915_REQUEST_H
|
|
|
|
#define I915_REQUEST_H
|
2016-07-20 16:21:08 +08:00
|
|
|
|
2016-10-25 20:00:45 +08:00
|
|
|
#include <linux/dma-fence.h>
|
2016-07-20 16:21:11 +08:00
|
|
|
|
|
|
|
#include "i915_gem.h"
|
2016-09-09 21:11:54 +08:00
|
|
|
#include "i915_sw_fence.h"
|
2016-07-20 16:21:11 +08:00
|
|
|
|
drm/i915/scheduler: Support user-defined priorities
Use a priority stored in the context as the initial value when
submitting a request. This allows us to change the default priority on a
per-context basis, allowing different contexts to be favoured with GPU
time at the expense of lower importance work. The user can adjust the
context's priority via I915_CONTEXT_PARAM_PRIORITY, with more positive
values being higher priority (they will be serviced earlier, after their
dependencies have been resolved). Any prerequisite work for an execbuf
will have its priority raised to match the new request as required.
Normal users can specify any value in the range of -1023 to 0 [default],
i.e. they can reduce the priority of their workloads (and temporarily
boost it back to normal if so desired).
Privileged users can specify any value in the range of -1023 to 1023,
[default is 0], i.e. they can raise their priority above all overs and
so potentially starve the system.
Note that the existing schedulers are not fair, nor load balancing, the
execution is strictly by priority on a first-come, first-served basis,
and the driver may choose to boost some requests above the range
available to users.
This priority was originally based around nice(2), but evolved to allow
clients to adjust their priority within a small range, and allow for a
privileged high priority range.
For example, this can be used to implement EGL_IMG_context_priority
https://www.khronos.org/registry/egl/extensions/IMG/EGL_IMG_context_priority.txt
EGL_CONTEXT_PRIORITY_LEVEL_IMG determines the priority level of
the context to be created. This attribute is a hint, as an
implementation may not support multiple contexts at some
priority levels and system policy may limit access to high
priority contexts to appropriate system privilege level. The
default value for EGL_CONTEXT_PRIORITY_LEVEL_IMG is
EGL_CONTEXT_PRIORITY_MEDIUM_IMG."
so we can map
PRIORITY_HIGH -> 1023 [privileged, will failback to 0]
PRIORITY_MED -> 0 [default]
PRIORITY_LOW -> -1023
They also map onto the priorities used by VkQueue (and a VkQueue is
essentially a timeline, our i915_gem_context under full-ppgtt).
v2: s/CAP_SYS_ADMIN/CAP_SYS_NICE/
v3: Report min/max user priorities as defines in the uapi, and rebase
internal priorities on the exposed values.
Testcase: igt/gem_exec_schedule
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171003203453.15692-9-chris@chris-wilson.co.uk
2017-10-04 04:34:53 +08:00
|
|
|
#include <uapi/drm/i915_drm.h>
|
|
|
|
|
2016-11-11 18:43:54 +08:00
|
|
|
struct drm_file;
|
|
|
|
struct drm_i915_gem_object;
|
2018-02-21 17:56:36 +08:00
|
|
|
struct i915_request;
|
2016-11-11 18:43:54 +08:00
|
|
|
|
2016-08-05 17:14:11 +08:00
|
|
|
struct intel_wait {
|
|
|
|
struct rb_node node;
|
|
|
|
struct task_struct *tsk;
|
2018-02-21 17:56:36 +08:00
|
|
|
struct i915_request *request;
|
2016-08-05 17:14:11 +08:00
|
|
|
u32 seqno;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct intel_signal_node {
|
|
|
|
struct intel_wait wait;
|
2018-02-22 17:25:44 +08:00
|
|
|
struct list_head link;
|
2016-08-05 17:14:11 +08:00
|
|
|
};
|
|
|
|
|
2016-11-15 04:41:02 +08:00
|
|
|
struct i915_dependency {
|
|
|
|
struct i915_priotree *signaler;
|
|
|
|
struct list_head signal_link;
|
|
|
|
struct list_head wait_link;
|
2016-11-15 04:41:03 +08:00
|
|
|
struct list_head dfs_link;
|
2016-11-15 04:41:02 +08:00
|
|
|
unsigned long flags;
|
|
|
|
#define I915_DEPENDENCY_ALLOC BIT(0)
|
|
|
|
};
|
|
|
|
|
2018-02-21 17:56:36 +08:00
|
|
|
/*
|
|
|
|
* "People assume that time is a strict progression of cause to effect, but
|
|
|
|
* actually, from a nonlinear, non-subjective viewpoint, it's more like a big
|
|
|
|
* ball of wibbly-wobbly, timey-wimey ... stuff." -The Doctor, 2015
|
|
|
|
*
|
|
|
|
* Requests exist in a complex web of interdependencies. Each request
|
2016-11-15 04:41:02 +08:00
|
|
|
* has to wait for some other request to complete before it is ready to be run
|
|
|
|
* (e.g. we have to wait until the pixels have been rendering into a texture
|
|
|
|
* before we can copy from it). We track the readiness of a request in terms
|
|
|
|
* of fences, but we also need to keep the dependency tree for the lifetime
|
|
|
|
* of the request (beyond the life of an individual fence). We use the tree
|
|
|
|
* at various points to reorder the requests whilst keeping the requests
|
|
|
|
* in order with respect to their various dependencies.
|
|
|
|
*/
|
|
|
|
struct i915_priotree {
|
|
|
|
struct list_head signalers_list; /* those before us, we depend upon */
|
|
|
|
struct list_head waiters_list; /* those after us, they depend upon us */
|
drm/i915: Split execlist priority queue into rbtree + linked list
All the requests at the same priority are executed in FIFO order. They
do not need to be stored in the rbtree themselves, as they are a simple
list within a level. If we move the requests at one priority into a list,
we can then reduce the rbtree to the set of priorities. This should keep
the height of the rbtree small, as the number of active priorities can not
exceed the number of active requests and should be typically only a few.
Currently, we have ~2k possible different priority levels, that may
increase to allow even more fine grained selection. Allocating those in
advance seems a waste (and may be impossible), so we opt for allocating
upon first use, and freeing after its requests are depleted. To avoid
the possibility of an allocation failure causing us to lose a request,
we preallocate the default priority (0) and bump any request to that
priority if we fail to allocate it the appropriate plist. Having a
request (that is ready to run, so not leading to corruption) execute
out-of-order is better than leaking the request (and its dependency
tree) entirely.
There should be a benefit to reducing execlists_dequeue() to principally
using a simple list (and reducing the frequency of both rbtree iteration
and balancing on erase) but for typical workloads, request coalescing
should be small enough that we don't notice any change. The main gain is
from improving PI calls to schedule, and the explicit list within a
level should make request unwinding simpler (we just need to insert at
the head of the list rather than the tail and not have to make the
rbtree search more complicated).
v2: Avoid use-after-free when deleting a depleted priolist
v3: Michał found the solution to handling the allocation failure
gracefully. If we disable all priority scheduling following the
allocation failure, those requests will be executed in fifo and we will
ensure that this request and its dependencies are in strict fifo (even
when it doesn't realise it is only a single list). Normal scheduling is
restored once we know the device is idle, until the next failure!
Suggested-by: Michał Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-8-chris@chris-wilson.co.uk
2017-05-17 20:10:03 +08:00
|
|
|
struct list_head link;
|
2016-11-15 04:41:03 +08:00
|
|
|
int priority;
|
drm/i915/scheduler: Support user-defined priorities
Use a priority stored in the context as the initial value when
submitting a request. This allows us to change the default priority on a
per-context basis, allowing different contexts to be favoured with GPU
time at the expense of lower importance work. The user can adjust the
context's priority via I915_CONTEXT_PARAM_PRIORITY, with more positive
values being higher priority (they will be serviced earlier, after their
dependencies have been resolved). Any prerequisite work for an execbuf
will have its priority raised to match the new request as required.
Normal users can specify any value in the range of -1023 to 0 [default],
i.e. they can reduce the priority of their workloads (and temporarily
boost it back to normal if so desired).
Privileged users can specify any value in the range of -1023 to 1023,
[default is 0], i.e. they can raise their priority above all overs and
so potentially starve the system.
Note that the existing schedulers are not fair, nor load balancing, the
execution is strictly by priority on a first-come, first-served basis,
and the driver may choose to boost some requests above the range
available to users.
This priority was originally based around nice(2), but evolved to allow
clients to adjust their priority within a small range, and allow for a
privileged high priority range.
For example, this can be used to implement EGL_IMG_context_priority
https://www.khronos.org/registry/egl/extensions/IMG/EGL_IMG_context_priority.txt
EGL_CONTEXT_PRIORITY_LEVEL_IMG determines the priority level of
the context to be created. This attribute is a hint, as an
implementation may not support multiple contexts at some
priority levels and system policy may limit access to high
priority contexts to appropriate system privilege level. The
default value for EGL_CONTEXT_PRIORITY_LEVEL_IMG is
EGL_CONTEXT_PRIORITY_MEDIUM_IMG."
so we can map
PRIORITY_HIGH -> 1023 [privileged, will failback to 0]
PRIORITY_MED -> 0 [default]
PRIORITY_LOW -> -1023
They also map onto the priorities used by VkQueue (and a VkQueue is
essentially a timeline, our i915_gem_context under full-ppgtt).
v2: s/CAP_SYS_ADMIN/CAP_SYS_NICE/
v3: Report min/max user priorities as defines in the uapi, and rebase
internal priorities on the exposed values.
Testcase: igt/gem_exec_schedule
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171003203453.15692-9-chris@chris-wilson.co.uk
2017-10-04 04:34:53 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
enum {
|
|
|
|
I915_PRIORITY_MIN = I915_CONTEXT_MIN_USER_PRIORITY - 1,
|
|
|
|
I915_PRIORITY_NORMAL = I915_CONTEXT_DEFAULT_PRIORITY,
|
|
|
|
I915_PRIORITY_MAX = I915_CONTEXT_MAX_USER_PRIORITY + 1,
|
|
|
|
|
|
|
|
I915_PRIORITY_INVALID = INT_MIN
|
2016-11-15 04:41:02 +08:00
|
|
|
};
|
|
|
|
|
2018-02-21 17:56:36 +08:00
|
|
|
struct i915_capture_list {
|
|
|
|
struct i915_capture_list *next;
|
2017-04-15 17:39:02 +08:00
|
|
|
struct i915_vma *vma;
|
|
|
|
};
|
|
|
|
|
2016-07-20 16:21:08 +08:00
|
|
|
/**
|
|
|
|
* Request queue structure.
|
|
|
|
*
|
|
|
|
* The request queue allows us to note sequence numbers that have been emitted
|
|
|
|
* and may be associated with active buffers to be retired.
|
|
|
|
*
|
|
|
|
* By keeping this list, we can avoid having to do questionable sequence
|
|
|
|
* number comparisons on buffer last_read|write_seqno. It also allows an
|
|
|
|
* emission time to be associated with the request for tracking how far ahead
|
|
|
|
* of the GPU the submission is.
|
|
|
|
*
|
2016-08-09 16:23:34 +08:00
|
|
|
* When modifying this structure be very aware that we perform a lockless
|
|
|
|
* RCU lookup of it that may race against reallocation of the struct
|
|
|
|
* from the slab freelist. We intentionally do not zero the structure on
|
|
|
|
* allocation so that the lookup can use the dangling pointers (and is
|
|
|
|
* cogniscent that those pointers may be wrong). Instead, everything that
|
|
|
|
* needs to be initialised must be done so explicitly.
|
|
|
|
*
|
2016-07-20 16:21:11 +08:00
|
|
|
* The requests are reference counted.
|
2016-07-20 16:21:08 +08:00
|
|
|
*/
|
2018-02-21 17:56:36 +08:00
|
|
|
struct i915_request {
|
2016-10-25 20:00:45 +08:00
|
|
|
struct dma_fence fence;
|
2016-07-20 16:21:11 +08:00
|
|
|
spinlock_t lock;
|
2016-07-20 16:21:08 +08:00
|
|
|
|
|
|
|
/** On Which ring this request was generated */
|
|
|
|
struct drm_i915_private *i915;
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Context and ring buffer related to this request
|
|
|
|
* Contexts are refcounted, so when this request is associated with a
|
|
|
|
* context, we must increment the context's refcount, to guarantee that
|
|
|
|
* it persists while any request is linked to it. Requests themselves
|
|
|
|
* are also refcounted, so the request will only be freed when the last
|
|
|
|
* reference to it is dismissed, and the code in
|
2018-02-21 17:56:36 +08:00
|
|
|
* i915_request_free() will then decrement the refcount on the
|
2016-07-20 16:21:08 +08:00
|
|
|
* context.
|
|
|
|
*/
|
|
|
|
struct i915_gem_context *ctx;
|
|
|
|
struct intel_engine_cs *engine;
|
2016-08-03 05:50:21 +08:00
|
|
|
struct intel_ring *ring;
|
2016-10-28 20:58:46 +08:00
|
|
|
struct intel_timeline *timeline;
|
2016-07-20 16:21:08 +08:00
|
|
|
struct intel_signal_node signaling;
|
|
|
|
|
2018-02-21 17:56:36 +08:00
|
|
|
/*
|
|
|
|
* Fences for the various phases in the request's lifetime.
|
2016-11-15 04:40:58 +08:00
|
|
|
*
|
|
|
|
* The submit fence is used to await upon all of the request's
|
|
|
|
* dependencies. When it is signaled, the request is ready to run.
|
|
|
|
* It is used by the driver to then queue the request for execution.
|
|
|
|
*/
|
2016-09-09 21:11:54 +08:00
|
|
|
struct i915_sw_fence submit;
|
2017-06-20 18:06:13 +08:00
|
|
|
wait_queue_entry_t submitq;
|
2017-02-23 15:44:13 +08:00
|
|
|
wait_queue_head_t execute;
|
2016-09-09 21:11:54 +08:00
|
|
|
|
2018-02-21 17:56:36 +08:00
|
|
|
/*
|
|
|
|
* A list of everyone we wait upon, and everyone who waits upon us.
|
2016-11-15 04:41:02 +08:00
|
|
|
* Even though we will not be submitted to the hardware before the
|
|
|
|
* submit fence is signaled (it waits for all external events as well
|
|
|
|
* as our own requests), the scheduler still needs to know the
|
|
|
|
* dependency tree for the lifetime of the request (from execbuf
|
|
|
|
* to retirement), i.e. bidirectional dependency information for the
|
|
|
|
* request not tied to individual fences.
|
|
|
|
*/
|
|
|
|
struct i915_priotree priotree;
|
|
|
|
struct i915_dependency dep;
|
|
|
|
|
2018-02-21 17:56:36 +08:00
|
|
|
/**
|
|
|
|
* GEM sequence number associated with this request on the
|
2017-02-23 15:44:14 +08:00
|
|
|
* global execution timeline. It is zero when the request is not
|
|
|
|
* on the HW queue (i.e. not on the engine timeline list).
|
|
|
|
* Its value is guarded by the timeline spinlock.
|
|
|
|
*/
|
2016-10-28 20:58:49 +08:00
|
|
|
u32 global_seqno;
|
|
|
|
|
2016-09-09 21:11:43 +08:00
|
|
|
/** Position in the ring of the start of the request */
|
2016-07-20 16:21:08 +08:00
|
|
|
u32 head;
|
|
|
|
|
|
|
|
/**
|
2016-09-09 21:11:43 +08:00
|
|
|
* Position in the ring of the start of the postfix.
|
|
|
|
* This is required to calculate the maximum available ring space
|
|
|
|
* without overwriting the postfix.
|
2016-07-20 16:21:08 +08:00
|
|
|
*/
|
|
|
|
u32 postfix;
|
|
|
|
|
2016-09-09 21:11:43 +08:00
|
|
|
/** Position in the ring of the end of the whole request */
|
2016-07-20 16:21:08 +08:00
|
|
|
u32 tail;
|
|
|
|
|
2016-09-09 21:11:43 +08:00
|
|
|
/** Position in the ring of the end of any workarounds after the tail */
|
|
|
|
u32 wa_tail;
|
|
|
|
|
|
|
|
/** Preallocate space in the ring for the emitting the request */
|
2016-07-20 16:21:08 +08:00
|
|
|
u32 reserved_space;
|
|
|
|
|
|
|
|
/** Batch buffer related to this request if any (used for
|
|
|
|
* error state dump only).
|
|
|
|
*/
|
2016-08-15 17:49:06 +08:00
|
|
|
struct i915_vma *batch;
|
2018-02-21 17:56:36 +08:00
|
|
|
/**
|
|
|
|
* Additional buffers requested by userspace to be captured upon
|
2017-04-15 17:39:02 +08:00
|
|
|
* a GPU hang. The vma/obj on this list are protected by their
|
|
|
|
* active reference - all objects on this list must also be
|
|
|
|
* on the active_list (of their final request).
|
|
|
|
*/
|
2018-02-21 17:56:36 +08:00
|
|
|
struct i915_capture_list *capture_list;
|
drm/i915: Refactor activity tracking for requests
With the introduction of requests, we amplified the number of atomic
refcounted objects we use and update every execbuffer; from none to
several references, and a set of references that need to be changed. We
also introduced interesting side-effects in the order of retiring
requests and objects.
Instead of independently tracking the last request for an object, track
the active objects for each request. The object will reside in the
buffer list of its most recent active request and so we reduce the kref
interchange to a list_move. Now retirements are entirely driven by the
request, dramatically simplifying activity tracking on the object
themselves, and removing the ambiguity between retiring objects and
retiring requests.
Furthermore with the consolidation of managing the activity tracking
centrally, we can look forward to using RCU to enable lockless lookup of
the current active requests for an object. In the future, we will be
able to query the status or wait upon rendering to an object without
even touching the struct_mutex BKL.
All told, less code, simpler and faster, and more extensible.
v2: Add a typedef for the function pointer for convenience later.
v3: Make the noop retirement callback explicit. Allow passing NULL to
the init_request_active() which is expanded to a common noop function.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-16-git-send-email-chris@chris-wilson.co.uk
2016-08-04 14:52:35 +08:00
|
|
|
struct list_head active_list;
|
2016-07-20 16:21:08 +08:00
|
|
|
|
|
|
|
/** Time at which this request was emitted, in jiffies. */
|
|
|
|
unsigned long emitted_jiffies;
|
|
|
|
|
2017-06-28 20:35:48 +08:00
|
|
|
bool waitboost;
|
|
|
|
|
2016-08-04 14:52:33 +08:00
|
|
|
/** engine->request_list entry for this request */
|
|
|
|
struct list_head link;
|
2016-07-20 16:21:08 +08:00
|
|
|
|
2016-08-04 14:52:36 +08:00
|
|
|
/** ring->request_list entry for this request */
|
|
|
|
struct list_head ring_link;
|
|
|
|
|
2016-07-20 16:21:08 +08:00
|
|
|
struct drm_i915_file_private *file_priv;
|
|
|
|
/** file_priv list entry for this request */
|
2017-03-02 20:25:25 +08:00
|
|
|
struct list_head client_link;
|
2016-07-20 16:21:08 +08:00
|
|
|
};
|
|
|
|
|
2017-12-13 02:06:51 +08:00
|
|
|
#define I915_FENCE_GFP (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
|
|
|
|
|
2016-10-25 20:00:45 +08:00
|
|
|
extern const struct dma_fence_ops i915_fence_ops;
|
2016-07-20 16:21:11 +08:00
|
|
|
|
2016-10-28 20:58:24 +08:00
|
|
|
static inline bool dma_fence_is_i915(const struct dma_fence *fence)
|
2016-07-20 16:21:11 +08:00
|
|
|
{
|
|
|
|
return fence->ops == &i915_fence_ops;
|
|
|
|
}
|
|
|
|
|
2018-02-21 17:56:36 +08:00
|
|
|
struct i915_request * __must_check
|
|
|
|
i915_request_alloc(struct intel_engine_cs *engine,
|
|
|
|
struct i915_gem_context *ctx);
|
|
|
|
void i915_request_retire_upto(struct i915_request *rq);
|
2016-07-20 16:21:08 +08:00
|
|
|
|
2018-02-21 17:56:36 +08:00
|
|
|
static inline struct i915_request *
|
2016-10-25 20:00:45 +08:00
|
|
|
to_request(struct dma_fence *fence)
|
2016-07-20 16:21:11 +08:00
|
|
|
{
|
|
|
|
/* We assume that NULL fence/request are interoperable */
|
2018-02-21 17:56:36 +08:00
|
|
|
BUILD_BUG_ON(offsetof(struct i915_request, fence) != 0);
|
2016-10-28 20:58:24 +08:00
|
|
|
GEM_BUG_ON(fence && !dma_fence_is_i915(fence));
|
2018-02-21 17:56:36 +08:00
|
|
|
return container_of(fence, struct i915_request, fence);
|
2016-07-20 16:21:11 +08:00
|
|
|
}
|
|
|
|
|
2018-02-21 17:56:36 +08:00
|
|
|
static inline struct i915_request *
|
|
|
|
i915_request_get(struct i915_request *rq)
|
2016-07-20 16:21:08 +08:00
|
|
|
{
|
2018-02-21 17:56:36 +08:00
|
|
|
return to_request(dma_fence_get(&rq->fence));
|
2016-07-20 16:21:08 +08:00
|
|
|
}
|
|
|
|
|
2018-02-21 17:56:36 +08:00
|
|
|
static inline struct i915_request *
|
|
|
|
i915_request_get_rcu(struct i915_request *rq)
|
drm/i915: Enable lockless lookup of request tracking via RCU
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.
However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.
v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)
Paul E. McKenney wrote:
Another approach is synchronize_rcu() after some largish number of
requests. The advantage of this approach is that it throttles the
production of callbacks at the source. The corresponding disadvantage
is that it slows things up.
Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it. Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The
idea is to do something like this:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
You would of course do an initial get_state_synchronize_rcu() to
get things going. This would not block unless there was less than
one grace period's worth of time between invocations. But this
assumes a busy system, where there is almost always a grace period
in flight. But you can make that happen as follows:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
call_rcu(&my_rcu_head, noop_function);
Note that you need additional code to make sure that the old callback
has completed before doing a new one. Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).
v3: More comments on compiler and processor order of operations within
the RCU lookup and discover we can use rcu_access_pointer() here instead.
v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04 23:32:41 +08:00
|
|
|
{
|
2018-02-21 17:56:36 +08:00
|
|
|
return to_request(dma_fence_get_rcu(&rq->fence));
|
drm/i915: Enable lockless lookup of request tracking via RCU
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.
However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.
v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)
Paul E. McKenney wrote:
Another approach is synchronize_rcu() after some largish number of
requests. The advantage of this approach is that it throttles the
production of callbacks at the source. The corresponding disadvantage
is that it slows things up.
Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it. Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The
idea is to do something like this:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
You would of course do an initial get_state_synchronize_rcu() to
get things going. This would not block unless there was less than
one grace period's worth of time between invocations. But this
assumes a busy system, where there is almost always a grace period
in flight. But you can make that happen as follows:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
call_rcu(&my_rcu_head, noop_function);
Note that you need additional code to make sure that the old callback
has completed before doing a new one. Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).
v3: More comments on compiler and processor order of operations within
the RCU lookup and discover we can use rcu_access_pointer() here instead.
v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04 23:32:41 +08:00
|
|
|
}
|
|
|
|
|
2016-07-20 16:21:08 +08:00
|
|
|
static inline void
|
2018-02-21 17:56:36 +08:00
|
|
|
i915_request_put(struct i915_request *rq)
|
2016-07-20 16:21:08 +08:00
|
|
|
{
|
2018-02-21 17:56:36 +08:00
|
|
|
dma_fence_put(&rq->fence);
|
2016-07-20 16:21:08 +08:00
|
|
|
}
|
|
|
|
|
2017-02-23 15:44:14 +08:00
|
|
|
/**
|
2018-02-21 17:56:36 +08:00
|
|
|
* i915_request_global_seqno - report the current global seqno
|
2017-02-23 15:44:14 +08:00
|
|
|
* @request - the request
|
|
|
|
*
|
|
|
|
* A request is assigned a global seqno only when it is on the hardware
|
|
|
|
* execution queue. The global seqno can be used to maintain a list of
|
|
|
|
* requests on the same engine in retirement order, for example for
|
|
|
|
* constructing a priority queue for waiting. Prior to its execution, or
|
|
|
|
* if it is subsequently removed in the event of preemption, its global
|
|
|
|
* seqno is zero. As both insertion and removal from the execution queue
|
|
|
|
* may operate in IRQ context, it is not guarded by the usual struct_mutex
|
|
|
|
* BKL. Instead those relying on the global seqno must be prepared for its
|
|
|
|
* value to change between reads. Only when the request is complete can
|
|
|
|
* the global seqno be stable (due to the memory barriers on submitting
|
|
|
|
* the commands to the hardware to write the breadcrumb, if the HWS shows
|
|
|
|
* that it has passed the global seqno and the global seqno is unchanged
|
|
|
|
* after the read, it is indeed complete).
|
|
|
|
*/
|
|
|
|
static u32
|
2018-02-21 17:56:36 +08:00
|
|
|
i915_request_global_seqno(const struct i915_request *request)
|
2017-02-23 15:44:14 +08:00
|
|
|
{
|
|
|
|
return READ_ONCE(request->global_seqno);
|
|
|
|
}
|
|
|
|
|
2018-02-21 17:56:36 +08:00
|
|
|
int i915_request_await_object(struct i915_request *to,
|
2016-09-09 21:11:56 +08:00
|
|
|
struct drm_i915_gem_object *obj,
|
|
|
|
bool write);
|
2018-02-21 17:56:36 +08:00
|
|
|
int i915_request_await_dma_fence(struct i915_request *rq,
|
|
|
|
struct dma_fence *fence);
|
2016-07-20 16:21:08 +08:00
|
|
|
|
2018-02-21 17:56:36 +08:00
|
|
|
void __i915_request_add(struct i915_request *rq, bool flush_caches);
|
|
|
|
#define i915_request_add(rq) \
|
|
|
|
__i915_request_add(rq, false)
|
2016-11-15 04:40:59 +08:00
|
|
|
|
2018-02-21 17:56:36 +08:00
|
|
|
void __i915_request_submit(struct i915_request *request);
|
|
|
|
void i915_request_submit(struct i915_request *request);
|
2017-02-23 15:44:17 +08:00
|
|
|
|
2018-02-21 17:56:36 +08:00
|
|
|
void __i915_request_unsubmit(struct i915_request *request);
|
|
|
|
void i915_request_unsubmit(struct i915_request *request);
|
2016-07-20 16:21:08 +08:00
|
|
|
|
2018-02-21 17:56:36 +08:00
|
|
|
long i915_request_wait(struct i915_request *rq,
|
2016-10-28 20:58:27 +08:00
|
|
|
unsigned int flags,
|
|
|
|
long timeout)
|
drm/i915: Refactor activity tracking for requests
With the introduction of requests, we amplified the number of atomic
refcounted objects we use and update every execbuffer; from none to
several references, and a set of references that need to be changed. We
also introduced interesting side-effects in the order of retiring
requests and objects.
Instead of independently tracking the last request for an object, track
the active objects for each request. The object will reside in the
buffer list of its most recent active request and so we reduce the kref
interchange to a list_move. Now retirements are entirely driven by the
request, dramatically simplifying activity tracking on the object
themselves, and removing the ambiguity between retiring objects and
retiring requests.
Furthermore with the consolidation of managing the activity tracking
centrally, we can look forward to using RCU to enable lockless lookup of
the current active requests for an object. In the future, we will be
able to query the status or wait upon rendering to an object without
even touching the struct_mutex BKL.
All told, less code, simpler and faster, and more extensible.
v2: Add a typedef for the function pointer for convenience later.
v3: Make the noop retirement callback explicit. Allow passing NULL to
the init_request_active() which is expanded to a common noop function.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-16-git-send-email-chris@chris-wilson.co.uk
2016-08-04 14:52:35 +08:00
|
|
|
__attribute__((nonnull(1)));
|
2016-09-09 21:11:50 +08:00
|
|
|
#define I915_WAIT_INTERRUPTIBLE BIT(0)
|
|
|
|
#define I915_WAIT_LOCKED BIT(1) /* struct_mutex held, handle GPU reset */
|
2016-10-28 20:58:27 +08:00
|
|
|
#define I915_WAIT_ALL BIT(2) /* used by i915_gem_object_wait() */
|
drm/i915: Refactor activity tracking for requests
With the introduction of requests, we amplified the number of atomic
refcounted objects we use and update every execbuffer; from none to
several references, and a set of references that need to be changed. We
also introduced interesting side-effects in the order of retiring
requests and objects.
Instead of independently tracking the last request for an object, track
the active objects for each request. The object will reside in the
buffer list of its most recent active request and so we reduce the kref
interchange to a list_move. Now retirements are entirely driven by the
request, dramatically simplifying activity tracking on the object
themselves, and removing the ambiguity between retiring objects and
retiring requests.
Furthermore with the consolidation of managing the activity tracking
centrally, we can look forward to using RCU to enable lockless lookup of
the current active requests for an object. In the future, we will be
able to query the status or wait upon rendering to an object without
even touching the struct_mutex BKL.
All told, less code, simpler and faster, and more extensible.
v2: Add a typedef for the function pointer for convenience later.
v3: Make the noop retirement callback explicit. Allow passing NULL to
the init_request_active() which is expanded to a common noop function.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-16-git-send-email-chris@chris-wilson.co.uk
2016-08-04 14:52:35 +08:00
|
|
|
|
2016-07-20 16:21:08 +08:00
|
|
|
static inline u32 intel_engine_get_seqno(struct intel_engine_cs *engine);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Returns true if seq1 is later than seq2.
|
|
|
|
*/
|
|
|
|
static inline bool i915_seqno_passed(u32 seq1, u32 seq2)
|
|
|
|
{
|
|
|
|
return (s32)(seq1 - seq2) >= 0;
|
|
|
|
}
|
|
|
|
|
2016-10-28 20:58:49 +08:00
|
|
|
static inline bool
|
2018-02-21 17:56:36 +08:00
|
|
|
__i915_request_completed(const struct i915_request *rq, u32 seqno)
|
2016-10-28 20:58:49 +08:00
|
|
|
{
|
2017-02-23 15:44:14 +08:00
|
|
|
GEM_BUG_ON(!seqno);
|
2018-02-21 17:56:36 +08:00
|
|
|
return i915_seqno_passed(intel_engine_get_seqno(rq->engine), seqno) &&
|
|
|
|
seqno == i915_request_global_seqno(rq);
|
2016-10-28 20:58:49 +08:00
|
|
|
}
|
|
|
|
|
2018-02-21 17:56:36 +08:00
|
|
|
static inline bool i915_request_completed(const struct i915_request *rq)
|
2016-10-28 20:58:49 +08:00
|
|
|
{
|
2017-02-23 15:44:14 +08:00
|
|
|
u32 seqno;
|
|
|
|
|
2018-02-21 17:56:36 +08:00
|
|
|
seqno = i915_request_global_seqno(rq);
|
2017-02-23 15:44:14 +08:00
|
|
|
if (!seqno)
|
2016-10-28 20:58:49 +08:00
|
|
|
return false;
|
|
|
|
|
2018-02-21 17:56:36 +08:00
|
|
|
return __i915_request_completed(rq, seqno);
|
2016-07-20 16:21:08 +08:00
|
|
|
}
|
|
|
|
|
2018-02-21 17:56:36 +08:00
|
|
|
static inline bool i915_request_started(const struct i915_request *rq)
|
2018-01-18 21:16:09 +08:00
|
|
|
{
|
|
|
|
u32 seqno;
|
|
|
|
|
2018-02-21 17:56:36 +08:00
|
|
|
seqno = i915_request_global_seqno(rq);
|
2018-01-18 21:16:09 +08:00
|
|
|
if (!seqno)
|
|
|
|
return false;
|
|
|
|
|
2018-02-21 17:56:36 +08:00
|
|
|
return i915_seqno_passed(intel_engine_get_seqno(rq->engine),
|
2018-01-18 21:16:09 +08:00
|
|
|
seqno - 1);
|
|
|
|
}
|
|
|
|
|
2018-01-02 23:12:25 +08:00
|
|
|
static inline bool i915_priotree_signaled(const struct i915_priotree *pt)
|
|
|
|
{
|
2018-02-21 17:56:36 +08:00
|
|
|
const struct i915_request *rq =
|
|
|
|
container_of(pt, const struct i915_request, priotree);
|
2018-01-02 23:12:25 +08:00
|
|
|
|
2018-02-21 17:56:36 +08:00
|
|
|
return i915_request_completed(rq);
|
2018-01-02 23:12:25 +08:00
|
|
|
}
|
|
|
|
|
2018-02-21 17:56:36 +08:00
|
|
|
void i915_retire_requests(struct drm_i915_private *i915);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We treat requests as fences. This is not be to confused with our
|
2016-08-04 14:52:29 +08:00
|
|
|
* "fence registers" but pipeline synchronisation objects ala GL_ARB_sync.
|
|
|
|
* We use the fences to synchronize access from the CPU with activity on the
|
|
|
|
* GPU, for example, we should not rewrite an object's PTE whilst the GPU
|
|
|
|
* is reading them. We also track fences at a higher level to provide
|
|
|
|
* implicit synchronisation around GEM objects, e.g. set-domain will wait
|
|
|
|
* for outstanding GPU rendering before marking the object ready for CPU
|
|
|
|
* access, or a pageflip will wait until the GPU is complete before showing
|
|
|
|
* the frame on the scanout.
|
|
|
|
*
|
|
|
|
* In order to use a fence, the object must track the fence it needs to
|
|
|
|
* serialise with. For example, GEM objects want to track both read and
|
|
|
|
* write access so that we can perform concurrent read operations between
|
|
|
|
* the CPU and GPU engines, as well as waiting for all rendering to
|
|
|
|
* complete, or waiting for the last GPU user of a "fence register". The
|
|
|
|
* object then embeds a #i915_gem_active to track the most recent (in
|
|
|
|
* retirement order) request relevant for the desired mode of access.
|
|
|
|
* The #i915_gem_active is updated with i915_gem_active_set() to track the
|
|
|
|
* most recent fence request, typically this is done as part of
|
|
|
|
* i915_vma_move_to_active().
|
|
|
|
*
|
|
|
|
* When the #i915_gem_active completes (is retired), it will
|
|
|
|
* signal its completion to the owner through a callback as well as mark
|
|
|
|
* itself as idle (i915_gem_active.request == NULL). The owner
|
|
|
|
* can then perform any action, such as delayed freeing of an active
|
|
|
|
* resource including itself.
|
|
|
|
*/
|
drm/i915: Refactor activity tracking for requests
With the introduction of requests, we amplified the number of atomic
refcounted objects we use and update every execbuffer; from none to
several references, and a set of references that need to be changed. We
also introduced interesting side-effects in the order of retiring
requests and objects.
Instead of independently tracking the last request for an object, track
the active objects for each request. The object will reside in the
buffer list of its most recent active request and so we reduce the kref
interchange to a list_move. Now retirements are entirely driven by the
request, dramatically simplifying activity tracking on the object
themselves, and removing the ambiguity between retiring objects and
retiring requests.
Furthermore with the consolidation of managing the activity tracking
centrally, we can look forward to using RCU to enable lockless lookup of
the current active requests for an object. In the future, we will be
able to query the status or wait upon rendering to an object without
even touching the struct_mutex BKL.
All told, less code, simpler and faster, and more extensible.
v2: Add a typedef for the function pointer for convenience later.
v3: Make the noop retirement callback explicit. Allow passing NULL to
the init_request_active() which is expanded to a common noop function.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-16-git-send-email-chris@chris-wilson.co.uk
2016-08-04 14:52:35 +08:00
|
|
|
struct i915_gem_active;
|
|
|
|
|
|
|
|
typedef void (*i915_gem_retire_fn)(struct i915_gem_active *,
|
2018-02-21 17:56:36 +08:00
|
|
|
struct i915_request *);
|
drm/i915: Refactor activity tracking for requests
With the introduction of requests, we amplified the number of atomic
refcounted objects we use and update every execbuffer; from none to
several references, and a set of references that need to be changed. We
also introduced interesting side-effects in the order of retiring
requests and objects.
Instead of independently tracking the last request for an object, track
the active objects for each request. The object will reside in the
buffer list of its most recent active request and so we reduce the kref
interchange to a list_move. Now retirements are entirely driven by the
request, dramatically simplifying activity tracking on the object
themselves, and removing the ambiguity between retiring objects and
retiring requests.
Furthermore with the consolidation of managing the activity tracking
centrally, we can look forward to using RCU to enable lockless lookup of
the current active requests for an object. In the future, we will be
able to query the status or wait upon rendering to an object without
even touching the struct_mutex BKL.
All told, less code, simpler and faster, and more extensible.
v2: Add a typedef for the function pointer for convenience later.
v3: Make the noop retirement callback explicit. Allow passing NULL to
the init_request_active() which is expanded to a common noop function.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-16-git-send-email-chris@chris-wilson.co.uk
2016-08-04 14:52:35 +08:00
|
|
|
|
2016-08-04 14:52:29 +08:00
|
|
|
struct i915_gem_active {
|
2018-02-21 17:56:36 +08:00
|
|
|
struct i915_request __rcu *request;
|
drm/i915: Refactor activity tracking for requests
With the introduction of requests, we amplified the number of atomic
refcounted objects we use and update every execbuffer; from none to
several references, and a set of references that need to be changed. We
also introduced interesting side-effects in the order of retiring
requests and objects.
Instead of independently tracking the last request for an object, track
the active objects for each request. The object will reside in the
buffer list of its most recent active request and so we reduce the kref
interchange to a list_move. Now retirements are entirely driven by the
request, dramatically simplifying activity tracking on the object
themselves, and removing the ambiguity between retiring objects and
retiring requests.
Furthermore with the consolidation of managing the activity tracking
centrally, we can look forward to using RCU to enable lockless lookup of
the current active requests for an object. In the future, we will be
able to query the status or wait upon rendering to an object without
even touching the struct_mutex BKL.
All told, less code, simpler and faster, and more extensible.
v2: Add a typedef for the function pointer for convenience later.
v3: Make the noop retirement callback explicit. Allow passing NULL to
the init_request_active() which is expanded to a common noop function.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-16-git-send-email-chris@chris-wilson.co.uk
2016-08-04 14:52:35 +08:00
|
|
|
struct list_head link;
|
|
|
|
i915_gem_retire_fn retire;
|
2016-08-04 14:52:29 +08:00
|
|
|
};
|
|
|
|
|
drm/i915: Refactor activity tracking for requests
With the introduction of requests, we amplified the number of atomic
refcounted objects we use and update every execbuffer; from none to
several references, and a set of references that need to be changed. We
also introduced interesting side-effects in the order of retiring
requests and objects.
Instead of independently tracking the last request for an object, track
the active objects for each request. The object will reside in the
buffer list of its most recent active request and so we reduce the kref
interchange to a list_move. Now retirements are entirely driven by the
request, dramatically simplifying activity tracking on the object
themselves, and removing the ambiguity between retiring objects and
retiring requests.
Furthermore with the consolidation of managing the activity tracking
centrally, we can look forward to using RCU to enable lockless lookup of
the current active requests for an object. In the future, we will be
able to query the status or wait upon rendering to an object without
even touching the struct_mutex BKL.
All told, less code, simpler and faster, and more extensible.
v2: Add a typedef for the function pointer for convenience later.
v3: Make the noop retirement callback explicit. Allow passing NULL to
the init_request_active() which is expanded to a common noop function.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-16-git-send-email-chris@chris-wilson.co.uk
2016-08-04 14:52:35 +08:00
|
|
|
void i915_gem_retire_noop(struct i915_gem_active *,
|
2018-02-21 17:56:36 +08:00
|
|
|
struct i915_request *request);
|
drm/i915: Refactor activity tracking for requests
With the introduction of requests, we amplified the number of atomic
refcounted objects we use and update every execbuffer; from none to
several references, and a set of references that need to be changed. We
also introduced interesting side-effects in the order of retiring
requests and objects.
Instead of independently tracking the last request for an object, track
the active objects for each request. The object will reside in the
buffer list of its most recent active request and so we reduce the kref
interchange to a list_move. Now retirements are entirely driven by the
request, dramatically simplifying activity tracking on the object
themselves, and removing the ambiguity between retiring objects and
retiring requests.
Furthermore with the consolidation of managing the activity tracking
centrally, we can look forward to using RCU to enable lockless lookup of
the current active requests for an object. In the future, we will be
able to query the status or wait upon rendering to an object without
even touching the struct_mutex BKL.
All told, less code, simpler and faster, and more extensible.
v2: Add a typedef for the function pointer for convenience later.
v3: Make the noop retirement callback explicit. Allow passing NULL to
the init_request_active() which is expanded to a common noop function.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-16-git-send-email-chris@chris-wilson.co.uk
2016-08-04 14:52:35 +08:00
|
|
|
|
|
|
|
/**
|
|
|
|
* init_request_active - prepares the activity tracker for use
|
|
|
|
* @active - the active tracker
|
|
|
|
* @func - a callback when then the tracker is retired (becomes idle),
|
|
|
|
* can be NULL
|
|
|
|
*
|
|
|
|
* init_request_active() prepares the embedded @active struct for use as
|
|
|
|
* an activity tracker, that is for tracking the last known active request
|
|
|
|
* associated with it. When the last request becomes idle, when it is retired
|
|
|
|
* after completion, the optional callback @func is invoked.
|
|
|
|
*/
|
|
|
|
static inline void
|
|
|
|
init_request_active(struct i915_gem_active *active,
|
|
|
|
i915_gem_retire_fn retire)
|
|
|
|
{
|
|
|
|
INIT_LIST_HEAD(&active->link);
|
|
|
|
active->retire = retire ?: i915_gem_retire_noop;
|
|
|
|
}
|
|
|
|
|
2016-08-04 14:52:30 +08:00
|
|
|
/**
|
|
|
|
* i915_gem_active_set - updates the tracker to watch the current request
|
|
|
|
* @active - the active tracker
|
|
|
|
* @request - the request to watch
|
|
|
|
*
|
|
|
|
* i915_gem_active_set() watches the given @request for completion. Whilst
|
|
|
|
* that @request is busy, the @active reports busy. When that @request is
|
|
|
|
* retired, the @active tracker is updated to report idle.
|
|
|
|
*/
|
2016-08-04 14:52:29 +08:00
|
|
|
static inline void
|
|
|
|
i915_gem_active_set(struct i915_gem_active *active,
|
2018-02-21 17:56:36 +08:00
|
|
|
struct i915_request *request)
|
2016-08-04 14:52:29 +08:00
|
|
|
{
|
drm/i915: Refactor activity tracking for requests
With the introduction of requests, we amplified the number of atomic
refcounted objects we use and update every execbuffer; from none to
several references, and a set of references that need to be changed. We
also introduced interesting side-effects in the order of retiring
requests and objects.
Instead of independently tracking the last request for an object, track
the active objects for each request. The object will reside in the
buffer list of its most recent active request and so we reduce the kref
interchange to a list_move. Now retirements are entirely driven by the
request, dramatically simplifying activity tracking on the object
themselves, and removing the ambiguity between retiring objects and
retiring requests.
Furthermore with the consolidation of managing the activity tracking
centrally, we can look forward to using RCU to enable lockless lookup of
the current active requests for an object. In the future, we will be
able to query the status or wait upon rendering to an object without
even touching the struct_mutex BKL.
All told, less code, simpler and faster, and more extensible.
v2: Add a typedef for the function pointer for convenience later.
v3: Make the noop retirement callback explicit. Allow passing NULL to
the init_request_active() which is expanded to a common noop function.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-16-git-send-email-chris@chris-wilson.co.uk
2016-08-04 14:52:35 +08:00
|
|
|
list_move(&active->link, &request->active_list);
|
drm/i915: Enable lockless lookup of request tracking via RCU
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.
However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.
v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)
Paul E. McKenney wrote:
Another approach is synchronize_rcu() after some largish number of
requests. The advantage of this approach is that it throttles the
production of callbacks at the source. The corresponding disadvantage
is that it slows things up.
Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it. Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The
idea is to do something like this:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
You would of course do an initial get_state_synchronize_rcu() to
get things going. This would not block unless there was less than
one grace period's worth of time between invocations. But this
assumes a busy system, where there is almost always a grace period
in flight. But you can make that happen as follows:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
call_rcu(&my_rcu_head, noop_function);
Note that you need additional code to make sure that the old callback
has completed before doing a new one. Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).
v3: More comments on compiler and processor order of operations within
the RCU lookup and discover we can use rcu_access_pointer() here instead.
v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04 23:32:41 +08:00
|
|
|
rcu_assign_pointer(active->request, request);
|
2016-08-04 14:52:29 +08:00
|
|
|
}
|
|
|
|
|
2016-12-08 01:56:47 +08:00
|
|
|
/**
|
|
|
|
* i915_gem_active_set_retire_fn - updates the retirement callback
|
|
|
|
* @active - the active tracker
|
|
|
|
* @fn - the routine called when the request is retired
|
|
|
|
* @mutex - struct_mutex used to guard retirements
|
|
|
|
*
|
|
|
|
* i915_gem_active_set_retire_fn() updates the function pointer that
|
|
|
|
* is called when the final request associated with the @active tracker
|
|
|
|
* is retired.
|
|
|
|
*/
|
|
|
|
static inline void
|
|
|
|
i915_gem_active_set_retire_fn(struct i915_gem_active *active,
|
|
|
|
i915_gem_retire_fn fn,
|
|
|
|
struct mutex *mutex)
|
|
|
|
{
|
|
|
|
lockdep_assert_held(mutex);
|
|
|
|
active->retire = fn ?: i915_gem_retire_noop;
|
|
|
|
}
|
|
|
|
|
2018-02-21 17:56:36 +08:00
|
|
|
static inline struct i915_request *
|
2016-08-04 14:52:31 +08:00
|
|
|
__i915_gem_active_peek(const struct i915_gem_active *active)
|
|
|
|
{
|
2018-02-21 17:56:36 +08:00
|
|
|
/*
|
|
|
|
* Inside the error capture (running with the driver in an unknown
|
drm/i915: Enable lockless lookup of request tracking via RCU
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.
However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.
v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)
Paul E. McKenney wrote:
Another approach is synchronize_rcu() after some largish number of
requests. The advantage of this approach is that it throttles the
production of callbacks at the source. The corresponding disadvantage
is that it slows things up.
Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it. Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The
idea is to do something like this:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
You would of course do an initial get_state_synchronize_rcu() to
get things going. This would not block unless there was less than
one grace period's worth of time between invocations. But this
assumes a busy system, where there is almost always a grace period
in flight. But you can make that happen as follows:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
call_rcu(&my_rcu_head, noop_function);
Note that you need additional code to make sure that the old callback
has completed before doing a new one. Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).
v3: More comments on compiler and processor order of operations within
the RCU lookup and discover we can use rcu_access_pointer() here instead.
v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04 23:32:41 +08:00
|
|
|
* state), we want to bend the rules slightly (a lot).
|
|
|
|
*
|
|
|
|
* Work is in progress to make it safer, in the meantime this keeps
|
|
|
|
* the known issue from spamming the logs.
|
|
|
|
*/
|
|
|
|
return rcu_dereference_protected(active->request, 1);
|
2016-08-04 14:52:31 +08:00
|
|
|
}
|
|
|
|
|
2016-08-09 15:37:01 +08:00
|
|
|
/**
|
|
|
|
* i915_gem_active_raw - return the active request
|
|
|
|
* @active - the active tracker
|
|
|
|
*
|
|
|
|
* i915_gem_active_raw() returns the current request being tracked, or NULL.
|
|
|
|
* It does not obtain a reference on the request for the caller, so the caller
|
|
|
|
* must hold struct_mutex.
|
|
|
|
*/
|
2018-02-21 17:56:36 +08:00
|
|
|
static inline struct i915_request *
|
2016-08-09 15:37:01 +08:00
|
|
|
i915_gem_active_raw(const struct i915_gem_active *active, struct mutex *mutex)
|
|
|
|
{
|
|
|
|
return rcu_dereference_protected(active->request,
|
|
|
|
lockdep_is_held(mutex));
|
|
|
|
}
|
|
|
|
|
2016-08-04 14:52:30 +08:00
|
|
|
/**
|
drm/i915: Refactor activity tracking for requests
With the introduction of requests, we amplified the number of atomic
refcounted objects we use and update every execbuffer; from none to
several references, and a set of references that need to be changed. We
also introduced interesting side-effects in the order of retiring
requests and objects.
Instead of independently tracking the last request for an object, track
the active objects for each request. The object will reside in the
buffer list of its most recent active request and so we reduce the kref
interchange to a list_move. Now retirements are entirely driven by the
request, dramatically simplifying activity tracking on the object
themselves, and removing the ambiguity between retiring objects and
retiring requests.
Furthermore with the consolidation of managing the activity tracking
centrally, we can look forward to using RCU to enable lockless lookup of
the current active requests for an object. In the future, we will be
able to query the status or wait upon rendering to an object without
even touching the struct_mutex BKL.
All told, less code, simpler and faster, and more extensible.
v2: Add a typedef for the function pointer for convenience later.
v3: Make the noop retirement callback explicit. Allow passing NULL to
the init_request_active() which is expanded to a common noop function.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-16-git-send-email-chris@chris-wilson.co.uk
2016-08-04 14:52:35 +08:00
|
|
|
* i915_gem_active_peek - report the active request being monitored
|
2016-08-04 14:52:30 +08:00
|
|
|
* @active - the active tracker
|
|
|
|
*
|
drm/i915: Refactor activity tracking for requests
With the introduction of requests, we amplified the number of atomic
refcounted objects we use and update every execbuffer; from none to
several references, and a set of references that need to be changed. We
also introduced interesting side-effects in the order of retiring
requests and objects.
Instead of independently tracking the last request for an object, track
the active objects for each request. The object will reside in the
buffer list of its most recent active request and so we reduce the kref
interchange to a list_move. Now retirements are entirely driven by the
request, dramatically simplifying activity tracking on the object
themselves, and removing the ambiguity between retiring objects and
retiring requests.
Furthermore with the consolidation of managing the activity tracking
centrally, we can look forward to using RCU to enable lockless lookup of
the current active requests for an object. In the future, we will be
able to query the status or wait upon rendering to an object without
even touching the struct_mutex BKL.
All told, less code, simpler and faster, and more extensible.
v2: Add a typedef for the function pointer for convenience later.
v3: Make the noop retirement callback explicit. Allow passing NULL to
the init_request_active() which is expanded to a common noop function.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-16-git-send-email-chris@chris-wilson.co.uk
2016-08-04 14:52:35 +08:00
|
|
|
* i915_gem_active_peek() returns the current request being tracked if
|
|
|
|
* still active, or NULL. It does not obtain a reference on the request
|
|
|
|
* for the caller, so the caller must hold struct_mutex.
|
2016-08-04 14:52:30 +08:00
|
|
|
*/
|
2018-02-21 17:56:36 +08:00
|
|
|
static inline struct i915_request *
|
2016-08-04 14:52:31 +08:00
|
|
|
i915_gem_active_peek(const struct i915_gem_active *active, struct mutex *mutex)
|
2016-08-04 14:52:30 +08:00
|
|
|
{
|
2018-02-21 17:56:36 +08:00
|
|
|
struct i915_request *request;
|
drm/i915: Refactor activity tracking for requests
With the introduction of requests, we amplified the number of atomic
refcounted objects we use and update every execbuffer; from none to
several references, and a set of references that need to be changed. We
also introduced interesting side-effects in the order of retiring
requests and objects.
Instead of independently tracking the last request for an object, track
the active objects for each request. The object will reside in the
buffer list of its most recent active request and so we reduce the kref
interchange to a list_move. Now retirements are entirely driven by the
request, dramatically simplifying activity tracking on the object
themselves, and removing the ambiguity between retiring objects and
retiring requests.
Furthermore with the consolidation of managing the activity tracking
centrally, we can look forward to using RCU to enable lockless lookup of
the current active requests for an object. In the future, we will be
able to query the status or wait upon rendering to an object without
even touching the struct_mutex BKL.
All told, less code, simpler and faster, and more extensible.
v2: Add a typedef for the function pointer for convenience later.
v3: Make the noop retirement callback explicit. Allow passing NULL to
the init_request_active() which is expanded to a common noop function.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-16-git-send-email-chris@chris-wilson.co.uk
2016-08-04 14:52:35 +08:00
|
|
|
|
2016-08-09 15:37:01 +08:00
|
|
|
request = i915_gem_active_raw(active, mutex);
|
2018-02-21 17:56:36 +08:00
|
|
|
if (!request || i915_request_completed(request))
|
drm/i915: Enable lockless lookup of request tracking via RCU
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.
However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.
v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)
Paul E. McKenney wrote:
Another approach is synchronize_rcu() after some largish number of
requests. The advantage of this approach is that it throttles the
production of callbacks at the source. The corresponding disadvantage
is that it slows things up.
Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it. Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The
idea is to do something like this:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
You would of course do an initial get_state_synchronize_rcu() to
get things going. This would not block unless there was less than
one grace period's worth of time between invocations. But this
assumes a busy system, where there is almost always a grace period
in flight. But you can make that happen as follows:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
call_rcu(&my_rcu_head, noop_function);
Note that you need additional code to make sure that the old callback
has completed before doing a new one. Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).
v3: More comments on compiler and processor order of operations within
the RCU lookup and discover we can use rcu_access_pointer() here instead.
v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04 23:32:41 +08:00
|
|
|
return NULL;
|
|
|
|
|
|
|
|
return request;
|
|
|
|
}
|
|
|
|
|
2016-08-04 14:52:30 +08:00
|
|
|
/**
|
|
|
|
* i915_gem_active_get - return a reference to the active request
|
|
|
|
* @active - the active tracker
|
|
|
|
*
|
|
|
|
* i915_gem_active_get() returns a reference to the active request, or NULL
|
|
|
|
* if the active tracker is idle. The caller must hold struct_mutex.
|
|
|
|
*/
|
2018-02-21 17:56:36 +08:00
|
|
|
static inline struct i915_request *
|
2016-08-04 14:52:31 +08:00
|
|
|
i915_gem_active_get(const struct i915_gem_active *active, struct mutex *mutex)
|
2016-08-04 14:52:30 +08:00
|
|
|
{
|
2018-02-21 17:56:36 +08:00
|
|
|
return i915_request_get(i915_gem_active_peek(active, mutex));
|
2016-08-04 14:52:30 +08:00
|
|
|
}
|
|
|
|
|
drm/i915: Enable lockless lookup of request tracking via RCU
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.
However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.
v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)
Paul E. McKenney wrote:
Another approach is synchronize_rcu() after some largish number of
requests. The advantage of this approach is that it throttles the
production of callbacks at the source. The corresponding disadvantage
is that it slows things up.
Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it. Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The
idea is to do something like this:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
You would of course do an initial get_state_synchronize_rcu() to
get things going. This would not block unless there was less than
one grace period's worth of time between invocations. But this
assumes a busy system, where there is almost always a grace period
in flight. But you can make that happen as follows:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
call_rcu(&my_rcu_head, noop_function);
Note that you need additional code to make sure that the old callback
has completed before doing a new one. Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).
v3: More comments on compiler and processor order of operations within
the RCU lookup and discover we can use rcu_access_pointer() here instead.
v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04 23:32:41 +08:00
|
|
|
/**
|
|
|
|
* __i915_gem_active_get_rcu - return a reference to the active request
|
|
|
|
* @active - the active tracker
|
|
|
|
*
|
|
|
|
* __i915_gem_active_get() returns a reference to the active request, or NULL
|
|
|
|
* if the active tracker is idle. The caller must hold the RCU read lock, but
|
|
|
|
* the returned pointer is safe to use outside of RCU.
|
|
|
|
*/
|
2018-02-21 17:56:36 +08:00
|
|
|
static inline struct i915_request *
|
drm/i915: Enable lockless lookup of request tracking via RCU
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.
However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.
v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)
Paul E. McKenney wrote:
Another approach is synchronize_rcu() after some largish number of
requests. The advantage of this approach is that it throttles the
production of callbacks at the source. The corresponding disadvantage
is that it slows things up.
Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it. Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The
idea is to do something like this:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
You would of course do an initial get_state_synchronize_rcu() to
get things going. This would not block unless there was less than
one grace period's worth of time between invocations. But this
assumes a busy system, where there is almost always a grace period
in flight. But you can make that happen as follows:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
call_rcu(&my_rcu_head, noop_function);
Note that you need additional code to make sure that the old callback
has completed before doing a new one. Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).
v3: More comments on compiler and processor order of operations within
the RCU lookup and discover we can use rcu_access_pointer() here instead.
v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04 23:32:41 +08:00
|
|
|
__i915_gem_active_get_rcu(const struct i915_gem_active *active)
|
|
|
|
{
|
2018-02-21 17:56:36 +08:00
|
|
|
/*
|
|
|
|
* Performing a lockless retrieval of the active request is super
|
2017-01-18 18:53:44 +08:00
|
|
|
* tricky. SLAB_TYPESAFE_BY_RCU merely guarantees that the backing
|
drm/i915: Enable lockless lookup of request tracking via RCU
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.
However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.
v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)
Paul E. McKenney wrote:
Another approach is synchronize_rcu() after some largish number of
requests. The advantage of this approach is that it throttles the
production of callbacks at the source. The corresponding disadvantage
is that it slows things up.
Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it. Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The
idea is to do something like this:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
You would of course do an initial get_state_synchronize_rcu() to
get things going. This would not block unless there was less than
one grace period's worth of time between invocations. But this
assumes a busy system, where there is almost always a grace period
in flight. But you can make that happen as follows:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
call_rcu(&my_rcu_head, noop_function);
Note that you need additional code to make sure that the old callback
has completed before doing a new one. Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).
v3: More comments on compiler and processor order of operations within
the RCU lookup and discover we can use rcu_access_pointer() here instead.
v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04 23:32:41 +08:00
|
|
|
* slab of request objects will not be freed whilst we hold the
|
|
|
|
* RCU read lock. It does not guarantee that the request itself
|
|
|
|
* will not be freed and then *reused*. Viz,
|
|
|
|
*
|
|
|
|
* Thread A Thread B
|
|
|
|
*
|
2018-02-21 17:56:36 +08:00
|
|
|
* rq = active.request
|
|
|
|
* retire(rq) -> free(rq);
|
|
|
|
* (rq is now first on the slab freelist)
|
drm/i915: Enable lockless lookup of request tracking via RCU
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.
However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.
v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)
Paul E. McKenney wrote:
Another approach is synchronize_rcu() after some largish number of
requests. The advantage of this approach is that it throttles the
production of callbacks at the source. The corresponding disadvantage
is that it slows things up.
Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it. Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The
idea is to do something like this:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
You would of course do an initial get_state_synchronize_rcu() to
get things going. This would not block unless there was less than
one grace period's worth of time between invocations. But this
assumes a busy system, where there is almost always a grace period
in flight. But you can make that happen as follows:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
call_rcu(&my_rcu_head, noop_function);
Note that you need additional code to make sure that the old callback
has completed before doing a new one. Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).
v3: More comments on compiler and processor order of operations within
the RCU lookup and discover we can use rcu_access_pointer() here instead.
v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04 23:32:41 +08:00
|
|
|
* active.request = NULL
|
|
|
|
*
|
2018-02-21 17:56:36 +08:00
|
|
|
* rq = new submission on a new object
|
|
|
|
* ref(rq)
|
drm/i915: Enable lockless lookup of request tracking via RCU
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.
However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.
v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)
Paul E. McKenney wrote:
Another approach is synchronize_rcu() after some largish number of
requests. The advantage of this approach is that it throttles the
production of callbacks at the source. The corresponding disadvantage
is that it slows things up.
Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it. Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The
idea is to do something like this:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
You would of course do an initial get_state_synchronize_rcu() to
get things going. This would not block unless there was less than
one grace period's worth of time between invocations. But this
assumes a busy system, where there is almost always a grace period
in flight. But you can make that happen as follows:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
call_rcu(&my_rcu_head, noop_function);
Note that you need additional code to make sure that the old callback
has completed before doing a new one. Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).
v3: More comments on compiler and processor order of operations within
the RCU lookup and discover we can use rcu_access_pointer() here instead.
v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04 23:32:41 +08:00
|
|
|
*
|
|
|
|
* To prevent the request from being reused whilst the caller
|
|
|
|
* uses it, we take a reference like normal. Whilst acquiring
|
|
|
|
* the reference we check that it is not in a destroyed state
|
|
|
|
* (refcnt == 0). That prevents the request being reallocated
|
|
|
|
* whilst the caller holds on to it. To check that the request
|
|
|
|
* was not reallocated as we acquired the reference we have to
|
|
|
|
* check that our request remains the active request across
|
|
|
|
* the lookup, in the same manner as a seqlock. The visibility
|
|
|
|
* of the pointer versus the reference counting is controlled
|
|
|
|
* by using RCU barriers (rcu_dereference and rcu_assign_pointer).
|
|
|
|
*
|
|
|
|
* In the middle of all that, we inspect whether the request is
|
|
|
|
* complete. Retiring is lazy so the request may be completed long
|
|
|
|
* before the active tracker is updated. Querying whether the
|
|
|
|
* request is complete is far cheaper (as it involves no locked
|
|
|
|
* instructions setting cachelines to exclusive) than acquiring
|
|
|
|
* the reference, so we do it first. The RCU read lock ensures the
|
|
|
|
* pointer dereference is valid, but does not ensure that the
|
|
|
|
* seqno nor HWS is the right one! However, if the request was
|
|
|
|
* reallocated, that means the active tracker's request was complete.
|
|
|
|
* If the new request is also complete, then both are and we can
|
|
|
|
* just report the active tracker is idle. If the new request is
|
|
|
|
* incomplete, then we acquire a reference on it and check that
|
|
|
|
* it remained the active request.
|
2016-08-09 16:23:34 +08:00
|
|
|
*
|
|
|
|
* It is then imperative that we do not zero the request on
|
|
|
|
* reallocation, so that we can chase the dangling pointers!
|
2018-02-21 17:56:36 +08:00
|
|
|
* See i915_request_alloc().
|
drm/i915: Enable lockless lookup of request tracking via RCU
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.
However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.
v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)
Paul E. McKenney wrote:
Another approach is synchronize_rcu() after some largish number of
requests. The advantage of this approach is that it throttles the
production of callbacks at the source. The corresponding disadvantage
is that it slows things up.
Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it. Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The
idea is to do something like this:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
You would of course do an initial get_state_synchronize_rcu() to
get things going. This would not block unless there was less than
one grace period's worth of time between invocations. But this
assumes a busy system, where there is almost always a grace period
in flight. But you can make that happen as follows:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
call_rcu(&my_rcu_head, noop_function);
Note that you need additional code to make sure that the old callback
has completed before doing a new one. Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).
v3: More comments on compiler and processor order of operations within
the RCU lookup and discover we can use rcu_access_pointer() here instead.
v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04 23:32:41 +08:00
|
|
|
*/
|
|
|
|
do {
|
2018-02-21 17:56:36 +08:00
|
|
|
struct i915_request *request;
|
drm/i915: Enable lockless lookup of request tracking via RCU
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.
However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.
v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)
Paul E. McKenney wrote:
Another approach is synchronize_rcu() after some largish number of
requests. The advantage of this approach is that it throttles the
production of callbacks at the source. The corresponding disadvantage
is that it slows things up.
Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it. Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The
idea is to do something like this:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
You would of course do an initial get_state_synchronize_rcu() to
get things going. This would not block unless there was less than
one grace period's worth of time between invocations. But this
assumes a busy system, where there is almost always a grace period
in flight. But you can make that happen as follows:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
call_rcu(&my_rcu_head, noop_function);
Note that you need additional code to make sure that the old callback
has completed before doing a new one. Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).
v3: More comments on compiler and processor order of operations within
the RCU lookup and discover we can use rcu_access_pointer() here instead.
v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04 23:32:41 +08:00
|
|
|
|
|
|
|
request = rcu_dereference(active->request);
|
2018-02-21 17:56:36 +08:00
|
|
|
if (!request || i915_request_completed(request))
|
drm/i915: Enable lockless lookup of request tracking via RCU
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.
However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.
v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)
Paul E. McKenney wrote:
Another approach is synchronize_rcu() after some largish number of
requests. The advantage of this approach is that it throttles the
production of callbacks at the source. The corresponding disadvantage
is that it slows things up.
Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it. Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The
idea is to do something like this:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
You would of course do an initial get_state_synchronize_rcu() to
get things going. This would not block unless there was less than
one grace period's worth of time between invocations. But this
assumes a busy system, where there is almost always a grace period
in flight. But you can make that happen as follows:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
call_rcu(&my_rcu_head, noop_function);
Note that you need additional code to make sure that the old callback
has completed before doing a new one. Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).
v3: More comments on compiler and processor order of operations within
the RCU lookup and discover we can use rcu_access_pointer() here instead.
v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04 23:32:41 +08:00
|
|
|
return NULL;
|
|
|
|
|
2018-02-21 17:56:36 +08:00
|
|
|
/*
|
|
|
|
* An especially silly compiler could decide to recompute the
|
|
|
|
* result of i915_request_completed, more specifically
|
2016-08-22 16:55:22 +08:00
|
|
|
* re-emit the load for request->fence.seqno. A race would catch
|
|
|
|
* a later seqno value, which could flip the result from true to
|
|
|
|
* false. Which means part of the instructions below might not
|
|
|
|
* be executed, while later on instructions are executed. Due to
|
|
|
|
* barriers within the refcounting the inconsistency can't reach
|
2018-02-21 17:56:36 +08:00
|
|
|
* past the call to i915_request_get_rcu, but not executing
|
|
|
|
* that while still executing i915_request_put() creates
|
2016-08-22 16:55:22 +08:00
|
|
|
* havoc enough. Prevent this with a compiler barrier.
|
|
|
|
*/
|
|
|
|
barrier();
|
|
|
|
|
2018-02-21 17:56:36 +08:00
|
|
|
request = i915_request_get_rcu(request);
|
drm/i915: Enable lockless lookup of request tracking via RCU
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.
However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.
v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)
Paul E. McKenney wrote:
Another approach is synchronize_rcu() after some largish number of
requests. The advantage of this approach is that it throttles the
production of callbacks at the source. The corresponding disadvantage
is that it slows things up.
Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it. Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The
idea is to do something like this:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
You would of course do an initial get_state_synchronize_rcu() to
get things going. This would not block unless there was less than
one grace period's worth of time between invocations. But this
assumes a busy system, where there is almost always a grace period
in flight. But you can make that happen as follows:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
call_rcu(&my_rcu_head, noop_function);
Note that you need additional code to make sure that the old callback
has completed before doing a new one. Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).
v3: More comments on compiler and processor order of operations within
the RCU lookup and discover we can use rcu_access_pointer() here instead.
v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04 23:32:41 +08:00
|
|
|
|
2018-02-21 17:56:36 +08:00
|
|
|
/*
|
|
|
|
* What stops the following rcu_access_pointer() from occurring
|
|
|
|
* before the above i915_request_get_rcu()? If we were
|
drm/i915: Enable lockless lookup of request tracking via RCU
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.
However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.
v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)
Paul E. McKenney wrote:
Another approach is synchronize_rcu() after some largish number of
requests. The advantage of this approach is that it throttles the
production of callbacks at the source. The corresponding disadvantage
is that it slows things up.
Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it. Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The
idea is to do something like this:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
You would of course do an initial get_state_synchronize_rcu() to
get things going. This would not block unless there was less than
one grace period's worth of time between invocations. But this
assumes a busy system, where there is almost always a grace period
in flight. But you can make that happen as follows:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
call_rcu(&my_rcu_head, noop_function);
Note that you need additional code to make sure that the old callback
has completed before doing a new one. Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).
v3: More comments on compiler and processor order of operations within
the RCU lookup and discover we can use rcu_access_pointer() here instead.
v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04 23:32:41 +08:00
|
|
|
* to read the value before pausing to get the reference to
|
|
|
|
* the request, we may not notice a change in the active
|
|
|
|
* tracker.
|
|
|
|
*
|
|
|
|
* The rcu_access_pointer() is a mere compiler barrier, which
|
|
|
|
* means both the CPU and compiler are free to perform the
|
|
|
|
* memory read without constraint. The compiler only has to
|
|
|
|
* ensure that any operations after the rcu_access_pointer()
|
|
|
|
* occur afterwards in program order. This means the read may
|
|
|
|
* be performed earlier by an out-of-order CPU, or adventurous
|
|
|
|
* compiler.
|
|
|
|
*
|
|
|
|
* The atomic operation at the heart of
|
2018-02-21 17:56:36 +08:00
|
|
|
* i915_request_get_rcu(), see dma_fence_get_rcu(), is
|
drm/i915: Enable lockless lookup of request tracking via RCU
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.
However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.
v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)
Paul E. McKenney wrote:
Another approach is synchronize_rcu() after some largish number of
requests. The advantage of this approach is that it throttles the
production of callbacks at the source. The corresponding disadvantage
is that it slows things up.
Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it. Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The
idea is to do something like this:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
You would of course do an initial get_state_synchronize_rcu() to
get things going. This would not block unless there was less than
one grace period's worth of time between invocations. But this
assumes a busy system, where there is almost always a grace period
in flight. But you can make that happen as follows:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
call_rcu(&my_rcu_head, noop_function);
Note that you need additional code to make sure that the old callback
has completed before doing a new one. Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).
v3: More comments on compiler and processor order of operations within
the RCU lookup and discover we can use rcu_access_pointer() here instead.
v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04 23:32:41 +08:00
|
|
|
* atomic_inc_not_zero() which is only a full memory barrier
|
2018-02-21 17:56:36 +08:00
|
|
|
* when successful. That is, if i915_request_get_rcu()
|
drm/i915: Enable lockless lookup of request tracking via RCU
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.
However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.
v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)
Paul E. McKenney wrote:
Another approach is synchronize_rcu() after some largish number of
requests. The advantage of this approach is that it throttles the
production of callbacks at the source. The corresponding disadvantage
is that it slows things up.
Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it. Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The
idea is to do something like this:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
You would of course do an initial get_state_synchronize_rcu() to
get things going. This would not block unless there was less than
one grace period's worth of time between invocations. But this
assumes a busy system, where there is almost always a grace period
in flight. But you can make that happen as follows:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
call_rcu(&my_rcu_head, noop_function);
Note that you need additional code to make sure that the old callback
has completed before doing a new one. Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).
v3: More comments on compiler and processor order of operations within
the RCU lookup and discover we can use rcu_access_pointer() here instead.
v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04 23:32:41 +08:00
|
|
|
* returns the request (and so with the reference counted
|
|
|
|
* incremented) then the following read for rcu_access_pointer()
|
|
|
|
* must occur after the atomic operation and so confirm
|
|
|
|
* that this request is the one currently being tracked.
|
2016-08-09 16:23:33 +08:00
|
|
|
*
|
|
|
|
* The corresponding write barrier is part of
|
|
|
|
* rcu_assign_pointer().
|
drm/i915: Enable lockless lookup of request tracking via RCU
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.
However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.
v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)
Paul E. McKenney wrote:
Another approach is synchronize_rcu() after some largish number of
requests. The advantage of this approach is that it throttles the
production of callbacks at the source. The corresponding disadvantage
is that it slows things up.
Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it. Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The
idea is to do something like this:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
You would of course do an initial get_state_synchronize_rcu() to
get things going. This would not block unless there was less than
one grace period's worth of time between invocations. But this
assumes a busy system, where there is almost always a grace period
in flight. But you can make that happen as follows:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
call_rcu(&my_rcu_head, noop_function);
Note that you need additional code to make sure that the old callback
has completed before doing a new one. Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).
v3: More comments on compiler and processor order of operations within
the RCU lookup and discover we can use rcu_access_pointer() here instead.
v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04 23:32:41 +08:00
|
|
|
*/
|
|
|
|
if (!request || request == rcu_access_pointer(active->request))
|
|
|
|
return rcu_pointer_handoff(request);
|
|
|
|
|
2018-02-21 17:56:36 +08:00
|
|
|
i915_request_put(request);
|
drm/i915: Enable lockless lookup of request tracking via RCU
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.
However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.
v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)
Paul E. McKenney wrote:
Another approach is synchronize_rcu() after some largish number of
requests. The advantage of this approach is that it throttles the
production of callbacks at the source. The corresponding disadvantage
is that it slows things up.
Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it. Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The
idea is to do something like this:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
You would of course do an initial get_state_synchronize_rcu() to
get things going. This would not block unless there was less than
one grace period's worth of time between invocations. But this
assumes a busy system, where there is almost always a grace period
in flight. But you can make that happen as follows:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
call_rcu(&my_rcu_head, noop_function);
Note that you need additional code to make sure that the old callback
has completed before doing a new one. Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).
v3: More comments on compiler and processor order of operations within
the RCU lookup and discover we can use rcu_access_pointer() here instead.
v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04 23:32:41 +08:00
|
|
|
} while (1);
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* i915_gem_active_get_unlocked - return a reference to the active request
|
|
|
|
* @active - the active tracker
|
|
|
|
*
|
|
|
|
* i915_gem_active_get_unlocked() returns a reference to the active request,
|
|
|
|
* or NULL if the active tracker is idle. The reference is obtained under RCU,
|
|
|
|
* so no locking is required by the caller.
|
|
|
|
*
|
2018-02-21 17:56:36 +08:00
|
|
|
* The reference should be freed with i915_request_put().
|
drm/i915: Enable lockless lookup of request tracking via RCU
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.
However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.
v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)
Paul E. McKenney wrote:
Another approach is synchronize_rcu() after some largish number of
requests. The advantage of this approach is that it throttles the
production of callbacks at the source. The corresponding disadvantage
is that it slows things up.
Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it. Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The
idea is to do something like this:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
You would of course do an initial get_state_synchronize_rcu() to
get things going. This would not block unless there was less than
one grace period's worth of time between invocations. But this
assumes a busy system, where there is almost always a grace period
in flight. But you can make that happen as follows:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
call_rcu(&my_rcu_head, noop_function);
Note that you need additional code to make sure that the old callback
has completed before doing a new one. Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).
v3: More comments on compiler and processor order of operations within
the RCU lookup and discover we can use rcu_access_pointer() here instead.
v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04 23:32:41 +08:00
|
|
|
*/
|
2018-02-21 17:56:36 +08:00
|
|
|
static inline struct i915_request *
|
drm/i915: Enable lockless lookup of request tracking via RCU
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.
However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.
v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)
Paul E. McKenney wrote:
Another approach is synchronize_rcu() after some largish number of
requests. The advantage of this approach is that it throttles the
production of callbacks at the source. The corresponding disadvantage
is that it slows things up.
Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it. Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The
idea is to do something like this:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
You would of course do an initial get_state_synchronize_rcu() to
get things going. This would not block unless there was less than
one grace period's worth of time between invocations. But this
assumes a busy system, where there is almost always a grace period
in flight. But you can make that happen as follows:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
call_rcu(&my_rcu_head, noop_function);
Note that you need additional code to make sure that the old callback
has completed before doing a new one. Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).
v3: More comments on compiler and processor order of operations within
the RCU lookup and discover we can use rcu_access_pointer() here instead.
v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04 23:32:41 +08:00
|
|
|
i915_gem_active_get_unlocked(const struct i915_gem_active *active)
|
|
|
|
{
|
2018-02-21 17:56:36 +08:00
|
|
|
struct i915_request *request;
|
drm/i915: Enable lockless lookup of request tracking via RCU
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.
However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.
v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)
Paul E. McKenney wrote:
Another approach is synchronize_rcu() after some largish number of
requests. The advantage of this approach is that it throttles the
production of callbacks at the source. The corresponding disadvantage
is that it slows things up.
Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it. Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The
idea is to do something like this:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
You would of course do an initial get_state_synchronize_rcu() to
get things going. This would not block unless there was less than
one grace period's worth of time between invocations. But this
assumes a busy system, where there is almost always a grace period
in flight. But you can make that happen as follows:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
call_rcu(&my_rcu_head, noop_function);
Note that you need additional code to make sure that the old callback
has completed before doing a new one. Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).
v3: More comments on compiler and processor order of operations within
the RCU lookup and discover we can use rcu_access_pointer() here instead.
v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04 23:32:41 +08:00
|
|
|
|
|
|
|
rcu_read_lock();
|
|
|
|
request = __i915_gem_active_get_rcu(active);
|
|
|
|
rcu_read_unlock();
|
|
|
|
|
|
|
|
return request;
|
|
|
|
}
|
|
|
|
|
2016-08-04 14:52:30 +08:00
|
|
|
/**
|
|
|
|
* i915_gem_active_isset - report whether the active tracker is assigned
|
|
|
|
* @active - the active tracker
|
|
|
|
*
|
|
|
|
* i915_gem_active_isset() returns true if the active tracker is currently
|
|
|
|
* assigned to a request. Due to the lazy retiring, that request may be idle
|
|
|
|
* and this may report stale information.
|
|
|
|
*/
|
|
|
|
static inline bool
|
|
|
|
i915_gem_active_isset(const struct i915_gem_active *active)
|
|
|
|
{
|
drm/i915: Enable lockless lookup of request tracking via RCU
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.
However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.
v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)
Paul E. McKenney wrote:
Another approach is synchronize_rcu() after some largish number of
requests. The advantage of this approach is that it throttles the
production of callbacks at the source. The corresponding disadvantage
is that it slows things up.
Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it. Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The
idea is to do something like this:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
You would of course do an initial get_state_synchronize_rcu() to
get things going. This would not block unless there was less than
one grace period's worth of time between invocations. But this
assumes a busy system, where there is almost always a grace period
in flight. But you can make that happen as follows:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
call_rcu(&my_rcu_head, noop_function);
Note that you need additional code to make sure that the old callback
has completed before doing a new one. Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).
v3: More comments on compiler and processor order of operations within
the RCU lookup and discover we can use rcu_access_pointer() here instead.
v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04 23:32:41 +08:00
|
|
|
return rcu_access_pointer(active->request);
|
2016-08-04 14:52:30 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
drm/i915: Move GEM activity tracking into a common struct reservation_object
In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)
v2: Reimplement busy-ioctl (by walking the reservation object), postpone
the ABI change for another day. Similarly use the reservation object to
find the last_write request (if active and from i915) for choosing
display CS flips.
Caveats:
* busy-ioctl: busy-ioctl only reports on the native fences, it will not
warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is
being rendered to by external fences. It also will not report the same
busy state as wait-ioctl (or polling on the dma-buf) in the same
circumstances. On the plus side, it does retain reporting of which
*i915* engines are engaged with this object.
* non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering, see
"drm/i915: Restore nonblocking awaits for modesetting"
* dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 66s), mainly due to atomic operations
(maintaining the fence refcounts).
* loss of object-level retirement callbacks, emulated by VMA retirement
tracking.
* minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
2016-10-28 20:58:44 +08:00
|
|
|
* i915_gem_active_wait - waits until the request is completed
|
2016-08-05 17:14:06 +08:00
|
|
|
* @active - the active request on which to wait
|
2016-09-09 21:11:49 +08:00
|
|
|
* @flags - how to wait
|
2016-08-05 17:14:06 +08:00
|
|
|
* @timeout - how long to wait at most
|
|
|
|
* @rps - userspace client to charge for a waitboost
|
|
|
|
*
|
2016-10-28 20:58:28 +08:00
|
|
|
* i915_gem_active_wait() waits until the request is completed before
|
2016-08-05 17:14:06 +08:00
|
|
|
* returning, without requiring any locks to be held. Note that it does not
|
|
|
|
* retire any requests before returning.
|
|
|
|
*
|
|
|
|
* This function relies on RCU in order to acquire the reference to the active
|
|
|
|
* request without holding any locks. See __i915_gem_active_get_rcu() for the
|
|
|
|
* glory details on how that is managed. Once the reference is acquired, we
|
|
|
|
* can then wait upon the request, and afterwards release our reference,
|
|
|
|
* free of any locking.
|
|
|
|
*
|
2018-02-21 17:56:36 +08:00
|
|
|
* This function wraps i915_request_wait(), see it for the full details on
|
2016-08-05 17:14:06 +08:00
|
|
|
* the arguments.
|
|
|
|
*
|
|
|
|
* Returns 0 if successful, or a negative error code.
|
|
|
|
*/
|
|
|
|
static inline int
|
2016-10-28 20:58:28 +08:00
|
|
|
i915_gem_active_wait(const struct i915_gem_active *active, unsigned int flags)
|
2016-08-05 17:14:06 +08:00
|
|
|
{
|
2018-02-21 17:56:36 +08:00
|
|
|
struct i915_request *request;
|
2016-10-28 20:58:27 +08:00
|
|
|
long ret = 0;
|
2016-08-05 17:14:06 +08:00
|
|
|
|
|
|
|
request = i915_gem_active_get_unlocked(active);
|
|
|
|
if (request) {
|
2018-02-21 17:56:36 +08:00
|
|
|
ret = i915_request_wait(request, flags, MAX_SCHEDULE_TIMEOUT);
|
|
|
|
i915_request_put(request);
|
2016-08-05 17:14:06 +08:00
|
|
|
}
|
|
|
|
|
2016-10-28 20:58:27 +08:00
|
|
|
return ret < 0 ? ret : 0;
|
2016-08-05 17:14:06 +08:00
|
|
|
}
|
|
|
|
|
2016-08-04 14:52:30 +08:00
|
|
|
/**
|
|
|
|
* i915_gem_active_retire - waits until the request is retired
|
|
|
|
* @active - the active request on which to wait
|
|
|
|
*
|
|
|
|
* i915_gem_active_retire() waits until the request is completed,
|
|
|
|
* and then ensures that at least the retirement handler for this
|
|
|
|
* @active tracker is called before returning. If the @active
|
|
|
|
* tracker is idle, the function returns immediately.
|
|
|
|
*/
|
|
|
|
static inline int __must_check
|
drm/i915: Refactor activity tracking for requests
With the introduction of requests, we amplified the number of atomic
refcounted objects we use and update every execbuffer; from none to
several references, and a set of references that need to be changed. We
also introduced interesting side-effects in the order of retiring
requests and objects.
Instead of independently tracking the last request for an object, track
the active objects for each request. The object will reside in the
buffer list of its most recent active request and so we reduce the kref
interchange to a list_move. Now retirements are entirely driven by the
request, dramatically simplifying activity tracking on the object
themselves, and removing the ambiguity between retiring objects and
retiring requests.
Furthermore with the consolidation of managing the activity tracking
centrally, we can look forward to using RCU to enable lockless lookup of
the current active requests for an object. In the future, we will be
able to query the status or wait upon rendering to an object without
even touching the struct_mutex BKL.
All told, less code, simpler and faster, and more extensible.
v2: Add a typedef for the function pointer for convenience later.
v3: Make the noop retirement callback explicit. Allow passing NULL to
the init_request_active() which is expanded to a common noop function.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-16-git-send-email-chris@chris-wilson.co.uk
2016-08-04 14:52:35 +08:00
|
|
|
i915_gem_active_retire(struct i915_gem_active *active,
|
2016-08-04 14:52:31 +08:00
|
|
|
struct mutex *mutex)
|
2016-08-04 14:52:30 +08:00
|
|
|
{
|
2018-02-21 17:56:36 +08:00
|
|
|
struct i915_request *request;
|
2016-10-28 20:58:27 +08:00
|
|
|
long ret;
|
drm/i915: Refactor activity tracking for requests
With the introduction of requests, we amplified the number of atomic
refcounted objects we use and update every execbuffer; from none to
several references, and a set of references that need to be changed. We
also introduced interesting side-effects in the order of retiring
requests and objects.
Instead of independently tracking the last request for an object, track
the active objects for each request. The object will reside in the
buffer list of its most recent active request and so we reduce the kref
interchange to a list_move. Now retirements are entirely driven by the
request, dramatically simplifying activity tracking on the object
themselves, and removing the ambiguity between retiring objects and
retiring requests.
Furthermore with the consolidation of managing the activity tracking
centrally, we can look forward to using RCU to enable lockless lookup of
the current active requests for an object. In the future, we will be
able to query the status or wait upon rendering to an object without
even touching the struct_mutex BKL.
All told, less code, simpler and faster, and more extensible.
v2: Add a typedef for the function pointer for convenience later.
v3: Make the noop retirement callback explicit. Allow passing NULL to
the init_request_active() which is expanded to a common noop function.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-16-git-send-email-chris@chris-wilson.co.uk
2016-08-04 14:52:35 +08:00
|
|
|
|
2016-08-09 15:37:01 +08:00
|
|
|
request = i915_gem_active_raw(active, mutex);
|
drm/i915: Refactor activity tracking for requests
With the introduction of requests, we amplified the number of atomic
refcounted objects we use and update every execbuffer; from none to
several references, and a set of references that need to be changed. We
also introduced interesting side-effects in the order of retiring
requests and objects.
Instead of independently tracking the last request for an object, track
the active objects for each request. The object will reside in the
buffer list of its most recent active request and so we reduce the kref
interchange to a list_move. Now retirements are entirely driven by the
request, dramatically simplifying activity tracking on the object
themselves, and removing the ambiguity between retiring objects and
retiring requests.
Furthermore with the consolidation of managing the activity tracking
centrally, we can look forward to using RCU to enable lockless lookup of
the current active requests for an object. In the future, we will be
able to query the status or wait upon rendering to an object without
even touching the struct_mutex BKL.
All told, less code, simpler and faster, and more extensible.
v2: Add a typedef for the function pointer for convenience later.
v3: Make the noop retirement callback explicit. Allow passing NULL to
the init_request_active() which is expanded to a common noop function.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-16-git-send-email-chris@chris-wilson.co.uk
2016-08-04 14:52:35 +08:00
|
|
|
if (!request)
|
|
|
|
return 0;
|
|
|
|
|
2018-02-21 17:56:36 +08:00
|
|
|
ret = i915_request_wait(request,
|
2016-09-09 21:11:50 +08:00
|
|
|
I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED,
|
2016-10-28 20:58:27 +08:00
|
|
|
MAX_SCHEDULE_TIMEOUT);
|
|
|
|
if (ret < 0)
|
drm/i915: Refactor activity tracking for requests
With the introduction of requests, we amplified the number of atomic
refcounted objects we use and update every execbuffer; from none to
several references, and a set of references that need to be changed. We
also introduced interesting side-effects in the order of retiring
requests and objects.
Instead of independently tracking the last request for an object, track
the active objects for each request. The object will reside in the
buffer list of its most recent active request and so we reduce the kref
interchange to a list_move. Now retirements are entirely driven by the
request, dramatically simplifying activity tracking on the object
themselves, and removing the ambiguity between retiring objects and
retiring requests.
Furthermore with the consolidation of managing the activity tracking
centrally, we can look forward to using RCU to enable lockless lookup of
the current active requests for an object. In the future, we will be
able to query the status or wait upon rendering to an object without
even touching the struct_mutex BKL.
All told, less code, simpler and faster, and more extensible.
v2: Add a typedef for the function pointer for convenience later.
v3: Make the noop retirement callback explicit. Allow passing NULL to
the init_request_active() which is expanded to a common noop function.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-16-git-send-email-chris@chris-wilson.co.uk
2016-08-04 14:52:35 +08:00
|
|
|
return ret;
|
|
|
|
|
|
|
|
list_del_init(&active->link);
|
drm/i915: Enable lockless lookup of request tracking via RCU
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.
However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.
v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)
Paul E. McKenney wrote:
Another approach is synchronize_rcu() after some largish number of
requests. The advantage of this approach is that it throttles the
production of callbacks at the source. The corresponding disadvantage
is that it slows things up.
Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it. Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The
idea is to do something like this:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
You would of course do an initial get_state_synchronize_rcu() to
get things going. This would not block unless there was less than
one grace period's worth of time between invocations. But this
assumes a busy system, where there is almost always a grace period
in flight. But you can make that happen as follows:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
call_rcu(&my_rcu_head, noop_function);
Note that you need additional code to make sure that the old callback
has completed before doing a new one. Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).
v3: More comments on compiler and processor order of operations within
the RCU lookup and discover we can use rcu_access_pointer() here instead.
v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04 23:32:41 +08:00
|
|
|
RCU_INIT_POINTER(active->request, NULL);
|
|
|
|
|
drm/i915: Refactor activity tracking for requests
With the introduction of requests, we amplified the number of atomic
refcounted objects we use and update every execbuffer; from none to
several references, and a set of references that need to be changed. We
also introduced interesting side-effects in the order of retiring
requests and objects.
Instead of independently tracking the last request for an object, track
the active objects for each request. The object will reside in the
buffer list of its most recent active request and so we reduce the kref
interchange to a list_move. Now retirements are entirely driven by the
request, dramatically simplifying activity tracking on the object
themselves, and removing the ambiguity between retiring objects and
retiring requests.
Furthermore with the consolidation of managing the activity tracking
centrally, we can look forward to using RCU to enable lockless lookup of
the current active requests for an object. In the future, we will be
able to query the status or wait upon rendering to an object without
even touching the struct_mutex BKL.
All told, less code, simpler and faster, and more extensible.
v2: Add a typedef for the function pointer for convenience later.
v3: Make the noop retirement callback explicit. Allow passing NULL to
the init_request_active() which is expanded to a common noop function.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-16-git-send-email-chris@chris-wilson.co.uk
2016-08-04 14:52:35 +08:00
|
|
|
active->retire(active, request);
|
|
|
|
|
|
|
|
return 0;
|
2016-08-04 14:52:30 +08:00
|
|
|
}
|
|
|
|
|
2016-08-04 14:52:29 +08:00
|
|
|
#define for_each_active(mask, idx) \
|
|
|
|
for (; mask ? idx = ffs(mask) - 1, 1 : 0; mask &= ~BIT(idx))
|
|
|
|
|
2018-02-21 17:56:36 +08:00
|
|
|
#endif /* I915_REQUEST_H */
|