Commit Graph

11 Commits

Author SHA1 Message Date
Chris Wilson 103b76eeff drm/i915: Use i915_global_register()
Rather than manually add every new global into each hook, use
i915_global_register() function and keep a list of registered globals to
invoke instead.

However, I haven't found a way for random drivers to add an .init table
to avoid having to manually add ourselves to i915_globals_init() each
time.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190305213830.18094-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2019-03-06 10:00:50 +00:00
Chris Wilson f9e9e9de58 drm/i915: Prioritise non-busywait semaphore workloads
We don't want to busywait on the GPU if we have other work to do. If we
give non-busywaiting workloads higher (initial) priority than workloads
that require a busywait, we will prioritise work that is ready to run
immediately. We then also have to be careful that we don't give earlier
semaphores an accidental boost because later work doesn't wait on other
rings, hence we keep a history of semaphore usage of the dependency chain.

v2: Stop rolling the bits into a chain and just use a flag in case this
request or any of our dependencies use a semaphore. The rolling around
was contagious as Tvrtko was heard to fall off his chair.

Testcase: igt/gem_exec_schedule/semaphore
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190301170901.8340-4-chris@chris-wilson.co.uk
2019-03-01 17:45:11 +00:00
Chris Wilson b5773a3616 drm/i915/execlists: Suppress mere WAIT preemption
WAIT is occasionally suppressed by virtue of preempted requests being
promoted to NEWCLIENT if they have not all ready received that boost.
Make this consistent for all WAIT boosts that they are not allowed to
preempt executing contexts and are merely granted the right to be at the
front of the queue for the next execution slot. This is in keeping with
the desire that the WAIT boost be a minor tweak that does not give
excessive promotion to its user and open ourselves to trivial abuse.

The problem with the inconsistent WAIT preemption becomes more apparent
as the preemption is propagated across the engines, where one engine may
preempt and the other not, and we be relying on the exact execution
order being consistent across engines (e.g. using HW semaphores to
coordinate parallel execution).

v2: Also protect GuC submission from false preemption loops.
v3: Build bug safeguards and better debug messages for st.
v4: Do the priority bumping in unsubmit (i.e. on preemption/reset
unwind), applying it earlier during submit causes out-of-order execution
combined with execute fences.
v5: Call sw_fence_fini for our dummy request (Matthew)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190228220639.3173-1-chris@chris-wilson.co.uk
2019-02-28 23:10:43 +00:00
Chris Wilson 32eb6bcfdd drm/i915: Make request allocation caches global
As kmem_caches share the same properties (size, allocation/free behaviour)
for all potential devices, we can use global caches. While this
potential has worse fragmentation behaviour (one can argue that
different devices would have different activity lifetimes, but you can
also argue that activity is temporal across the system) it is the
default behaviour of the system at large to amalgamate matching caches.

The benefit for us is much reduced pointer dancing along the frequent
allocation paths.

v2: Defer shrinking until after a global grace period for futureproofing
multiple consumers of the slab caches, similar to the current strategy
for avoiding shrinking too early.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190228102035.5857-1-chris@chris-wilson.co.uk
2019-02-28 11:07:56 +00:00
Chris Wilson babfb1b55c drm/i915: Skip scanning for signalers if we are already inflight
When a request has its priority changed, we traverse the graph of all of
its signalers to raise their priorities to match (priority inheritance).
If the request has already started executing its payload, we know that
all of its signalers must have signaled and we do not need to process
our list of signalers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190226102404.29153-1-chris@chris-wilson.co.uk
2019-02-27 10:45:31 +00:00
Chris Wilson ed7dc67774 drm/i915: Reacquire priolist cache after dropping the engine lock
If we drop the engine lock, we may run execlists_dequeue which may free
the priolist. Therefore if we ever drop the execution lock on the
engine, we have to discard our cache and refetch the priolist to ensure
we do not use a stale pointer.

[  506.418935] [IGT] gem_exec_whisper: starting subtest contexts-priority
[  593.240825] general protection fault: 0000 [#1] SMP
[  593.240863] CPU: 1 PID: 494 Comm: gem_exec_whispe Tainted: G     U            5.0.0-rc6+ #100
[  593.240879] Hardware name:  /NUC6CAYB, BIOS AYAPLCEL.86A.0029.2016.1124.1625 11/24/2016
[  593.240965] RIP: 0010:__i915_schedule+0x1fe/0x320 [i915]
[  593.240981] Code: 48 8b 0c 24 48 89 c3 49 8b 45 28 49 8b 75 20 4c 89 3c 24 48 89 46 08 48 89 30 48 8b 43 08 48 89 4b 08 49 89 5d 20 49 89 45 28 <48> 89 08 45 39 a7 b8 03 00 00 7d 44 45 89 a7 b8 03 00 00 49 8b 85
[  593.240999] RSP: 0018:ffffc90000057a60 EFLAGS: 00010046
[  593.241013] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8882582d7870 RCX: ffff88826baba6f0
[  593.241026] RDX: 0000000000000000 RSI: ffff8882582d6e70 RDI: ffff888273482194
[  593.241049] RBP: ffffc90000057a68 R08: ffff8882582d7680 R09: ffff8882582d7840
[  593.241068] R10: 0000000000000000 R11: ffffea00095ebe08 R12: 0000000000000728
[  593.241105] R13: ffff88826baba6d0 R14: ffffc90000057a40 R15: ffff888273482158
[  593.241120] FS:  00007f4613fb3900(0000) GS:ffff888277a80000(0000) knlGS:0000000000000000
[  593.241133] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  593.241146] CR2: 00007f57d3c66a84 CR3: 000000026e2b6000 CR4: 00000000001406e0
[  593.241158] Call Trace:
[  593.241233]  i915_schedule+0x1f/0x30 [i915]
[  593.241326]  i915_request_add+0x1a9/0x290 [i915]
[  593.241393]  i915_gem_do_execbuffer+0x45f/0x1150 [i915]
[  593.241411]  ? init_object+0x49/0x80
[  593.241425]  ? ___slab_alloc.constprop.91+0x4b8/0x4e0
[  593.241491]  ? i915_gem_execbuffer2_ioctl+0x99/0x380 [i915]
[  593.241563]  ? i915_gem_execbuffer_ioctl+0x270/0x270 [i915]
[  593.241629]  i915_gem_execbuffer2_ioctl+0x1bb/0x380 [i915]
[  593.241705]  ? i915_gem_execbuffer_ioctl+0x270/0x270 [i915]
[  593.241724]  drm_ioctl_kernel+0x81/0xd0
[  593.241738]  drm_ioctl+0x1a7/0x310
[  593.241803]  ? i915_gem_execbuffer_ioctl+0x270/0x270 [i915]
[  593.241819]  ? __update_load_avg_se+0x1c9/0x240
[  593.241834]  ? pick_next_entity+0x7e/0x120
[  593.241851]  do_vfs_ioctl+0x88/0x5d0
[  593.241880]  ksys_ioctl+0x35/0x70
[  593.241894]  __x64_sys_ioctl+0x11/0x20
[  593.241907]  do_syscall_64+0x44/0xf0
[  593.241924]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  593.241940] RIP: 0033:0x7f4615ffe757
[  593.241952] Code: 00 00 90 48 8b 05 39 a7 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 09 a7 0c 00 f7 d8 64 89 01 48
[  593.241970] RSP: 002b:00007ffc1030ddf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  593.241984] RAX: ffffffffffffffda RBX: 00007ffc10324420 RCX: 00007f4615ffe757
[  593.241997] RDX: 00007ffc1030e220 RSI: 0000000040406469 RDI: 0000000000000003
[  593.242010] RBP: 00007ffc1030e220 R08: 00007f46160c9208 R09: 00007f46160c9240
[  593.242022] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000040406469
[  593.242038] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
[  593.242058] Modules linked in: i915 intel_gtt drm_kms_helper prime_numbers

v2: Track the local engine cache and explicitly clear it when switching
engine locks.

Fixes: a02eb975be ("drm/i915/execlists: Cache the priolist when rescheduling")
Testcase: igt/gem_exec_whisper/contexts-priority # rare!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190211204647.26723-1-chris@chris-wilson.co.uk
2019-02-11 22:12:21 +00:00
Chris Wilson 52c0fdb25c drm/i915: Replace global breadcrumbs with per-context interrupt tracking
A few years ago, see commit 688e6c7258 ("drm/i915: Slaughter the
thundering i915_wait_request herd"), the issue of handling multiple
clients waiting in parallel was brought to our attention. The
requirement was that every client should be woken immediately upon its
request being signaled, without incurring any cpu overhead.

To handle certain fragility of our hw meant that we could not do a
simple check inside the irq handler (some generations required almost
unbounded delays before we could be sure of seqno coherency) and so
request completion checking required delegation.

Before commit 688e6c7258, the solution was simple. Every client
waiting on a request would be woken on every interrupt and each would do
a heavyweight check to see if their request was complete. Commit
688e6c7258 introduced an rbtree so that only the earliest waiter on
the global timeline would woken, and would wake the next and so on.
(Along with various complications to handle requests being reordered
along the global timeline, and also a requirement for kthread to provide
a delegate for fence signaling that had no process context.)

The global rbtree depends on knowing the execution timeline (and global
seqno). Without knowing that order, we must instead check all contexts
queued to the HW to see which may have advanced. We trim that list by
only checking queued contexts that are being waited on, but still we
keep a list of all active contexts and their active signalers that we
inspect from inside the irq handler. By moving the waiters onto the fence
signal list, we can combine the client wakeup with the dma_fence
signaling (a dramatic reduction in complexity, but does require the HW
being coherent, the seqno must be visible from the cpu before the
interrupt is raised - we keep a timer backup just in case).

Having previously fixed all the issues with irq-seqno serialisation (by
inserting delays onto the GPU after each request instead of random delays
on the CPU after each interrupt), we can rely on the seqno state to
perfom direct wakeups from the interrupt handler. This allows us to
preserve our single context switch behaviour of the current routine,
with the only downside that we lose the RT priority sorting of wakeups.
In general, direct wakeup latency of multiple clients is about the same
(about 10% better in most cases) with a reduction in total CPU time spent
in the waiter (about 20-50% depending on gen). Average herd behaviour is
improved, but at the cost of not delegating wakeups on task_prio.

v2: Capture fence signaling state for error state and add comments to
warm even the most cold of hearts.
v3: Check if the request is still active before busywaiting
v4: Reduce the amount of pointer misdirection with list_for_each_safe
and using a local i915_request variable inside the loops
v5: Add a missing pluralisation to a purely informative selftest message.

References: 688e6c7258 ("drm/i915: Slaughter the thundering i915_wait_request herd")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-2-chris@chris-wilson.co.uk
2019-01-29 21:45:22 +00:00
Chris Wilson c9a6462288 drm/i915/execlists: Suppress preempting self
In order to avoid preempting ourselves, we currently refuse to schedule
the tasklet if we reschedule an inflight context. However, this glosses
over a few issues such as what happens after a CS completion event and
we then preempt the newly executing context with itself, or if something
else causes a tasklet_schedule triggering the same evaluation to
preempt the active context with itself.

However, when we avoid preempting ELSP[0], we still retain the preemption
value as it may match a second preemption request within the same time period
that we need to resolve after the next CS event. However, since we only
store the maximum preemption priority seen, it may not match the
subsequent event and so we should double check whether or not we
actually do need to trigger a preempt-to-idle by comparing the top
priorities from each queue. Later, this gives us a hook for finer
control over deciding whether the preempt-to-idle is justified.

The sequence of events where we end up preempting for no avail is:

1. Queue requests/contexts A, B
2. Priority boost A; no preemption as it is executing, but keep hint
3. After CS switch, B is less than hint, force preempt-to-idle
4. Resubmit B after idling

v2: We can simplify a bunch of tests based on the knowledge that PI will
ensure that earlier requests along the same context will have the highest
priority.
v3: Demonstrate the stale preemption hint with a selftest

References: a2bf92e8cc ("drm/i915/execlists: Avoid kicking priority on the current context")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-4-chris@chris-wilson.co.uk
2019-01-29 20:00:05 +00:00
Chris Wilson 4d97cbe019 drm/i915: Rename execlists->queue_priority to queue_priority_hint
After noticing that we trigger preemption events for currently executing
requests, as well as requests that complete before the preemption and
attempting to suppress those preemption events, it is wise to not
consider the queue_priority to be authoritative. As we only track the
maximum priority seen between dequeue passes, if the maximum priority
request is no longer available for dequeuing (it completed or is even
executing on another engine), we have no knowledge of the previous
queue_priority as it would require us to keep a full history of enqueued
requests -- but we already have that history in the priolists!

Rename the queue_priority to queue_priority_hint so that we do not
confuse it as being exactly the maximum priority in the queue, but merely
an indication that we have seen a new maximum priority value and as such
we should check whether it should preempt the currently running request.

v2: s/preempt_priority_hint/queue_priority_hint/ as preempt implies it
being only used for the singular task of preemption and not the wider
question of waking up due to a change in the queue.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-3-chris@chris-wilson.co.uk
2019-01-29 20:00:03 +00:00
Chris Wilson e9eaf82d97 drm/i915: Priority boost for waiting clients
Latency is in the eye of the beholder. In the case where a client stops
and waits for the gpu, give that request chain a small priority boost
(not so that it overtakes higher priority clients, to preserve the
external ordering) so that ideally the wait completes earlier.

v2: Tvrtko recommends to keep the boost-from-user-stall as small as
possible and to allow new client flows to be preferred for interactivity
over stalls.

Testcase: igt/gem_sync/switch-default
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181001144755.7978-3-chris@chris-wilson.co.uk
2018-10-01 20:34:24 +01:00
Chris Wilson e2f3496e93 drm/i915: Pull scheduling under standalone lock
Currently, the backend scheduling code abuses struct_mutex into order to
have a global lock to manipulate a temporary list (without widespread
allocation) and to protect against list modifications. This is an
extraneous coupling to struct_mutex and further can not extend beyond
the local device.

Pull all the code that needs to be under the one true lock into
i915_scheduler.c, and make it so.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181001144755.7978-2-chris@chris-wilson.co.uk
2018-10-01 20:34:21 +01:00