linux

Commit Graph

Author	SHA1	Message	Date
Chris Wilson	fe49789fab	drm/i915: Deconstruct execute fence On reflection, we are only using the execute fence as a waitqueue on the global_seqno and not using it for dependency tracking between fences (unlike the submit and dma fences). By only treating it as a waitqueue, we can then treat it similar to the other waitqueues during submit, making the code simpler. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-8-chris@chris-wilson.co.uk	2017-02-23 14:49:31 +00:00
Chris Wilson	541ca6ed79	drm/i915: Inline __i915_gem_request_wait_for_execute() It had only one callsite and existed to keep the code clearer. Now having shared the wait-on-error between phases and with plans to change the wait-for-execute in the next few patches, remove the out of line wait loop and move it into the main body of i915_wait_request. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-7-chris@chris-wilson.co.uk	2017-02-23 14:49:30 +00:00
Chris Wilson	7de53bf7e6	drm/i915: Add ourselves to the gpu error waitqueue for the entire wait Add ourselves to the gpu error waitqueue earlier on, even before we determine we have to wait on the seqno. This is so that we can then share the waitqueue between stages in subsequent patches. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-6-chris@chris-wilson.co.uk	2017-02-23 14:49:29 +00:00
Chris Wilson	4b36b2e506	drm/i915: Use a local to shorten req->i915->gpu_error.wait_queue Use a local variable to avoid having to type out the full name of the gpu_error wait_queue. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-5-chris@chris-wilson.co.uk	2017-02-23 14:49:29 +00:00
Chris Wilson	12d3173b2e	drm/i915: Move reserve_seqno() next to unreserve_seqno() Move the companion functions next to each other. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-4-chris@chris-wilson.co.uk	2017-02-23 14:49:28 +00:00
Chris Wilson	9b6586ae9f	drm/i915: Keep a global seqno per-engine Replace the global device seqno with one for each engine, and account for in-flight seqno on each separately. This is consistent with dma-fence as each timeline has separate fence-contexts for each engine and a seqno is only ordered within a fence-context (i.e. seqno do not need to be ordered wrt to other engines, just ordered within a single engine). This is required to enable request rewinding for preemption on individual engines (we have to rewind the global seqno to avoid overflow, and we do not have to rewind all engines just to preempt one.) v2: Rename active_seqno to inflight_seqnos to more clearly indicate that it is a counter and not equivalent to the existing seqno. Update functions that operated on active_seqno similarly. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-3-chris@chris-wilson.co.uk	2017-02-23 14:49:26 +00:00
Tvrtko Ursulin	354d036fcf	drm/i915/tracepoints: Add request submit and execute tracepoints These new tracepoints are emitted once the request is ready to be submitted to the GPU and once the request is about to be submitted to the GPU, respectively. Former condition triggers as soon as all the fences and dependencies have been resolved, and the latter once the backend is about to submit it to the GPU. New tracepoint are enabled via the new DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option which is disabled by default to alleviate the performance impact concerns. v2: Move execute tracepoint to __i915_gem_request_submit. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-02-21 13:18:14 +00:00
Tvrtko Ursulin	9369250298	drm/i915/tracepoints: Tidy i915_gem_request_wait_begin Provide the same information as the other request event classes. v2: Pass in flags so we can properly report the blocking status. (Chris Wilson) v3: Log hex with 0x prefix for clarity. v4: Derive blocking status from flags. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-02-21 13:18:04 +00:00
Chris Wilson	c33ed067d1	drm/i915: Break i915_spin_request() if we see an interrupt If an interrupt has been posted, and we were spinning on the active seqno waiting for it to advance but it did not, then we can expect that it will not see its advance in the immediate future and should call into the irq-seqno barrier. We can stop spinning at this point, and leave the difficulty of handling the coherency to the caller. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170217151304.16665-3-chris@chris-wilson.co.uk	2017-02-17 15:31:14 +00:00
Tvrtko Ursulin	73dec95e6b	drm/i915: Emit to ringbuffer directly This removes the usage of intel_ring_emit in favour of directly writing to the ring buffer. intel_ring_emit was preventing the compiler for optimising fetch and increment of the current ring buffer pointer and therefore generating very verbose code for every write. It had no useful purpose since all ringbuffer operations are started and ended with intel_ring_begin and intel_ring_advance respectively, with no bail out in the middle possible, so it is fine to increment the tail in intel_ring_begin and let the code manage the pointer itself. Useless instruction removal amounts to approximately two and half kilobytes of saved text on my build. Not sure if this has any measurable performance implications but executing a ton of useless instructions on fast paths cannot be good. v2: * Change return from intel_ring_begin to error pointer by popular demand. * Move tail increment to intel_ring_advance to enable some error checking. v3: * Move tail advance back into intel_ring_begin. * Rebase and tidy. v4: * Complete rebase after a few months since v3. v5: * Remove unecessary cast and fix !debug compile. (Chris Wilson) v6: * Make intel_ring_offset take request as well. * Fix recording of request postfix plus a sprinkle of asserts. (Chris Wilson) v7: * Use intel_ring_offset to get the postfix. (Chris Wilson) * Convert GVT code as well. v8: * Rename out++ to cs++. v9: * Fix GVT out to cs conversion in GVT. v10: * Rebase for new intel_ring_begin in selftests. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170214113242.29241-1-tvrtko.ursulin@linux.intel.com	2017-02-14 14:30:46 +00:00
Chris Wilson	c835c55083	drm/i915: Add selftests for i915_gem_request Simple starting point for adding seltests for i915_gem_request, first mock a device (with engines and contexts) that allows us to construct and execute a request, along with waiting for the request to complete. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170213171558.20942-10-chris@chris-wilson.co.uk	2017-02-13 20:45:35 +00:00
Chris Wilson	969bb72cbf	drm/i915: Check for timeout completion when waiting for the rq to submitted We first wait for a request to be submitted to hw and assigned a seqno, before we can wait for the hw to signal completion (otherwise we don't know the hw id we need to wait upon). Whilst waiting for the request to be submitted, we may exceed the user's timeout and need to propagate the error back. v2: Make ETIME into an error from wait_for_execute for consistent exit handling. Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: `4680816be3` ("drm/i915: Wait first for submission, before waiting for request completion") Testcase: igt/gem_wait/basic-await Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+ Cc: stable@vger.kernel.org Link: http://patchwork.freedesktop.org/patch/msgid/20170208181238.7232-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2017-02-09 09:26:32 +00:00
Chris Wilson	6ffb7d0756	drm/i915: Construct a request even if the GPU is currently hung As we now have the ability to directly reset the GPU from the waiter (and so do not need to drop the lock in order to let the reset proceed) and also do not lose requests over a reset, we can now simply queue the request to occur after the reset rather than roundtripping to userspace (or worse failing with EIO). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170114162334.10271-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2017-01-16 12:21:20 +00:00
Chris Wilson	c781c978e7	drm/i915: Add a sanity check that no request is submitted in the middle It is an error to start a new request on the same timeline (ringbuffer) as the current one before the current is submitted. If there are two requests emitting to the ringbuffer at the same time, the operation is undefined. We can catch this by checking for the timeline having a later seqno than ours when we come to submit our request. Currently we have this check at the end of __i915_add_request, but having an early check as well isolates a failure in the caller versus a failure in sealing the request (i.e. from inside __i915_add_request itself). For example, CI is currently tripping over this late assertion on ctg/ilk: [ 100.329399] [IGT] gem_cs_tlb: starting subtest basic-default [ 100.336333] ------------[ cut here ]------------ [ 100.336341] kernel BUG at drivers/gpu/drm/i915/i915_gem_request.c:908! [ 100.336347] invalid opcode: 0000 [#1] PREEMPT SMP [ 100.336351] Modules linked in: snd_hda_intel i915 snd_hda_codec_generic snd_hda_codec snd_hwdep snd_hda_core snd_pcm coretemp mei_me lpc_ich mei e1000e ptp pps_core [last unloaded: i915] [ 100.336373] CPU: 0 PID: 6308 Comm: gem_cs_tlb Tainted: G U 4.10.0-rc3-CI-CI_DRM_2045+ #1 [ 100.336380] Hardware name: LENOVO 7465CTO/7465CTO, BIOS 6DET44WW (2.08 ) 04/22/2009 [ 100.336386] task: ffff88012b738040 task.stack: ffffc90000560000 [ 100.336441] RIP: 0010:__i915_add_request+0x4aa/0x510 [i915] [ 100.336445] RSP: 0018:ffffc90000563ac0 EFLAGS: 00010212 [ 100.336451] RAX: 0000000000005d52 RBX: ffff880133bb84c0 RCX: 0000000000000001 [ 100.336456] RDX: 0000000080000001 RSI: ffff88012b738860 RDI: 00000000ffffffff [ 100.336461] RBP: ffffc90000563b00 R08: ffff880133bb8780 R09: 0000000000000000 [ 100.336466] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88012f53d950 [ 100.336472] R13: ffff88012a2b0af8 R14: ffff88012a5b0008 R15: ffff88012f53d960 [ 100.336477] FS: 00007f0d19da38c0(0000) GS:ffff88013bc00000(0000) knlGS:0000000000000000 [ 100.336483] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 100.336488] CR2: 00007f0d17706000 CR3: 000000012aa3e000 CR4: 00000000000406f0 [ 100.336496] Call Trace: [ 100.336527] i915_gem_switch_to_kernel_context+0x131/0x1b0 [i915] [ 100.336559] i915_gem_evict_vm+0x202/0x2b0 [i915] [ 100.336590] i915_gem_execbuffer_reserve.isra.9+0x3ae/0x440 [i915] [ 100.336623] i915_gem_do_execbuffer.isra.15+0x6d9/0x1b20 [i915] [ 100.336656] i915_gem_execbuffer2+0xc0/0x250 [i915] [ 100.336666] drm_ioctl+0x200/0x450 [ 100.336697] ? i915_gem_execbuffer+0x330/0x330 [i915] [ 100.336708] do_vfs_ioctl+0x90/0x6e0 [ 100.336716] ? up_read+0x1a/0x40 [ 100.336723] ? trace_hardirqs_on_caller+0x122/0x1b0 [ 100.336730] SyS_ioctl+0x3c/0x70 [ 100.336738] entry_SYSCALL_64_fastpath+0x1c/0xb1 [ 100.336745] RIP: 0033:0x7f0d187cb357 [ 100.336750] RSP: 002b:00007ffe0b2f7c28 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 100.336761] RAX: ffffffffffffffda RBX: 00007ffe0b2f7d60 RCX: 00007f0d187cb357 [ 100.336768] RDX: 00007ffe0b2f7d00 RSI: 0000000040406469 RDI: 0000000000000003 [ 100.336775] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000022 [ 100.336782] R10: 0000000000000007 R11: 0000000000000246 R12: 0000000000000002 [ 100.336789] R13: 0000000000419101 R14: 00007ffe0b2f7d60 R15: 00007ffe0b2f7d50 [ 100.336797] Code: 5f 74 1e e9 d4 fb ff ff e8 bc 1e 9c e0 e9 ae fb ff ff 4c 89 e7 e8 77 22 fd ff e9 88 fd ff ff 0f 0b e8 a3 1e 9c e0 e9 b1 fb ff ff <0f> 0b 0f 0b e8 fd af ab e0 85 c0 75 c2 48 c7 c2 80 2c 71 a0 be [ 100.336877] RIP: __i915_add_request+0x4aa/0x510 [i915] RSP: ffffc90000563ac0 [ 100.336886] ---[ end trace 22b36545479e5eb7 ]--- Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170111140858.1922-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2017-01-11 14:31:48 +00:00
Daniel Vetter	a402eae64d	Linux 4.10-rc2 -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJYaYNlAAoJEHm+PkMAQRiGtCUH/18PMUJpHqRKjxL3Yscw+QZC RmGlD/hwBRLUSgiTCfURNGKP4QZv2kQW7BGsGC72oL01lmxozsU72ixUIO+wXzDY K2b0OOKGZZWzFtaVm7Qs+5JhHAEKZcT046mLD8sjJuqkrFAhmNLKdwHjihKBEkm9 J3s2tpdXdN0x/Uyga/GY9khEYIrvLPeBoKSz+JXcQKdC0iq3/+PMpWnN47QCNScr 7azojkJkj/rs2cqVdOi7Wbh6PSqIvPsl8E3qJefpaVJF/IQaU1pFdy5g8kYm4V7T fr6HgIbuN4EQWdN/5cgKrUdpQyV7D8iYx02klk4R8WgfS0QMYoUcsg+XsTd02TI= =OhGe -----END PGP SIGNATURE----- Merge tag 'v4.10-rc2' into drm-intel-next-queued Backmerge Linux 4.10-rc2 to resync with our -fixes cherry-picks. I've done the backmerge directly because Dave is on vacation. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2017-01-04 11:35:18 +01:00
Chris Wilson	f73e73999d	drm/i915: Swap if(enable_execlists) in i915_gem_request_alloc for a vfunc A fairly trivial move of a matching pair of routines (for preparing a request for construction) onto an engine vfunc. The ulterior motive is to be able to create a mock request implementation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-7-chris@chris-wilson.co.uk	2016-12-18 16:18:56 +00:00
Chris Wilson	e8a9c58fcd	drm/i915: Unify active context tracking between legacy/execlists/guc The requests conversion introduced a nasty bug where we could generate a new request in the middle of constructing a request if we needed to idle the system in order to evict space for a context. The request to idle would be executed (and waited upon) before the current one, creating a minor havoc in the seqno accounting, as we will consider the current request to already be completed (prior to deferred seqno assignment) but ring->last_retired_head would have been updated and still could allow us to overwrite the current request before execution. We also employed two different mechanisms to track the active context until it was switched out. The legacy method allowed for waiting upon an active context (it could forcibly evict any vma, including context's), but the execlists method took a step backwards by pinning the vma for the entire active lifespan of the context (the only way to evict was to idle the entire GPU, not individual contexts). However, to circumvent the tricky issue of locking (i.e. we cannot take struct_mutex at the time of i915_gem_request_submit(), where we would want to move the previous context onto the active tracker and unpin it), we take the execlists approach and keep the contexts pinned until retirement. The benefit of the execlists approach, more important for execlists than legacy, was the reduction in work in pinning the context for each request - as the context was kept pinned until idle, it could short circuit the pinning for all active contexts. We introduce new engine vfuncs to pin and unpin the context respectively. The context is pinned at the start of the request, and only unpinned when the following request is retired (this ensures that the context is idle and coherent in main memory before we unpin it). We move the engine->last_context tracking into the retirement itself (rather than during request submission) in order to allow the submission to be reordered or unwound without undue difficultly. And finally an ulterior motive for unifying context handling was to prepare for mock requests. v2: Rename to last_retired_context, split out legacy_context tracking for MI_SET_CONTEXT. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-3-chris@chris-wilson.co.uk	2016-12-18 16:18:50 +00:00
Linus Torvalds	9439b3710d	Main pull request for drm for 4.10 kernel -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJYT3qqAAoJEAx081l5xIa+dLMP/2dqBybSAeWlPmAwVenIHRtS KFNktISezFSY/LBcIP2mHkFJmjTKBMZFxWnyEJL9NmFUD1cS2WMyNnC1282h/+rD +P8Bsmzmt/daV4UTFxVDpzlmVlavAyakNi6FnSQfAfmf+3PB1yzU3gn8ld9pU/if h7KEp9fDn9eYZreTRfCUloI2yoVpD9d0DG3uaGDN/N0kGUnCC6TZT5ig5j2JO016 fYf/DqoYAk3ItWF9WK/uG7qJIGi37afCpQq+kbSSJk+p3HjJqu8JUe9jzqYdl7j9 26TGSY5o9WLhZkxDgbcCIJzcFJhMmXgMdhjil9lqaHmnNG5FPFU7g8DK1CZqbel9 m8+aRPn1EgxIahMgdl8NblW1pfO2Kco0tZmoP5vXx1uqhivd67h0hiQqp66WxOJd i2yMLncaCEv8M161CVEgtzuI5a7nCfaZv7J9ArzbkD/huBwu51IZgTs7Dz4njgvz VPB5FBTB/ZYteErUNoh6gjF0hLngWvvJSPvuzT+EFO7yypek0IJ28GTdbxYSP+jR 13697s5Itigf/D3KUdRRGsWRzyVVN9n+djkl//sy5ddL9eOlKSKEga4ujOUjTWaW hTvAxpK9GmJS/Iun5jIP6f75zDbi+e8FWUeB/OI2lPtnApaSKdXBTPXsco2RnTEV +G6XrH8IMEIsTxOk7hWU =7s/c -----END PGP SIGNATURE----- Merge tag 'drm-for-v4.10' of git://people.freedesktop.org/~airlied/linux Pull drm updates from Dave Airlie: "This is the main pull request for drm for 4.10 kernel. New drivers: - ZTE VOU display driver (zxdrm) - Amlogic Meson Graphic Controller GXBB/GXL/GXM SoCs (meson) - MXSFB support (mxsfb) Core: - Format handling has been reworked - Better atomic state debugging - drm_mm leak debugging - Atomic explicit fencing support - fbdev helper ops - Documentation updates - MST fbcon fixes Bridge: - Silicon Image SiI8620 driver Panel: - Add support for new simple panels i915: - GVT Device model - Better HDMI2.0 support on skylake - More watermark fixes - GPU idling rework for suspend/resume - DP Audio workarounds - Scheduler prep-work - Opregion CADL handling - GPU scheduler and priority boosting amdgfx/radeon: - Support for virtual devices - New VM manager for non-contig VRAM buffers - UVD powergating - SI register header cleanup - Cursor fixes - Powermanagement fixes nouveau: - Powermangement reworks for better voltage/clock changes - Atomic modesetting support - Displayport Multistream (MST) support. - GP102/104 hang and cursor fixes - GP106 support hisilicon: - hibmc support (BMC chip for aarch64 servers) armada: - add tracing support for overlay change - refactor plane support - de-midlayer the driver omapdrm: - Timing code cleanups rcar-du: - R8A7792/R8A7796 support - Misc fixes. sunxi: - A31 SoC display engine support imx-drm: - YUV format support - Cleanup plane atomic update mali-dp: - Misc fixes dw-hdmi: - Add support for HDMI i2c master controller tegra: - IOMMU support fixes - Error handling fixes tda998x: - Fix connector registration - Improved robustness - Fix infoframe/audio compliance virtio: - fix busid issues - allocate more vbufs qxl: - misc fixes and cleanups. vc4: - Fragment shader threading - ETC1 support - VEC (tv-out) support msm: - A5XX GPU support - Lots of atomic changes tilcdc: - Misc fixes and cleanups. etnaviv: - Fix dma-buf export path - DRAW_INSTANCED support - fix driver on i.MX6SX exynos: - HDMI refactoring fsl-dcu: - fbdev changes" * tag 'drm-for-v4.10' of git://people.freedesktop.org/~airlied/linux: (1343 commits) drm/nouveau/kms/nv50: fix atomic regression on original G80 drm/nouveau/bl: Do not register interface if Apple GMUX detected drm/nouveau/bl: Assign different names to interfaces drm/nouveau/bios/dp: fix handling of LevelEntryTableIndex on DP table 4.2 drm/nouveau/ltc: protect clearing of comptags with mutex drm/nouveau/gr/gf100-: handle GPC/TPC/MPC trap drm/nouveau/core: recognise GP106 chipset drm/nouveau/ttm: wait for bo fence to signal before unmapping vmas drm/nouveau/gr/gf100-: FECS intr handling is not relevant on proprietary ucode drm/nouveau/gr/gf100-: properly ack all FECS error interrupts drm/nouveau/fifo/gf100-: recover from host mmu faults drm: Add fake controlD* symlinks for backwards compat drm/vc4: Don't use drm_put_dev drm/vc4: Document VEC DT binding drm/vc4: Add support for the VEC (Video Encoder) IP drm: Add TV connector states to drm_connector_state drm: Turn DRM_MODE_SUBCONNECTOR_xx definitions into an enum drm/vc4: Fix ->clock_select setting for the VEC encoder drm/amdgpu/dce6: Set MASTER_UPDATE_MODE to 0 in resume_mc_access as well drm/amdgpu: use pin rather than pin_restricted in a few cases ...	2016-12-13 09:35:09 -08:00
Chris Wilson	0e932c080c	drm/i915: Hold a reference on the request for its fence chain Currently, we have an active reference for the request until it is retired. Though it cannot be retired before it has been executed by hardware, the request may be completed before we have finished processing the execute fence, i.e. we may continue to process that fence as we free the request. Fixes: `5590af3e11` ("drm/i915: Drive request submission through fence callbacks") Fixes: `23902e49c9` ("drm/i915: Split request submit/execute phase into two") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161125131718.20978-3-chris@chris-wilson.co.uk (cherry picked from commit `48bc2a4a42`) Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2016-12-05 11:00:32 +02:00
Chris Wilson	fc1584059d	drm/i915: Integrate i915_sw_fence with debugobjects Add the tracking required to enable debugobjects for fences to improve error detection in BAT. The debugobject interface lets us track the lifetime and phases of the fences even while being embedded into larger structs, i.e. to check they are not used after they have been released. v2: Don't populate the stubs, debugobjects checks for a NULL pointer and treats it equivalently. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161125131718.20978-4-chris@chris-wilson.co.uk	2016-11-25 13:49:26 +00:00
Chris Wilson	48bc2a4a42	drm/i915: Hold a reference on the request for its fence chain Currently, we have an active reference for the request until it is retired. Though it cannot be retired before it has been executed by hardware, the request may be completed before we have finished processing the execute fence, i.e. we may continue to process that fence as we free the request. Fixes: `5590af3e11` ("drm/i915: Drive request submission through fence callbacks") Fixes: `23902e49c9` ("drm/i915: Split request submit/execute phase into two") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161125131718.20978-3-chris@chris-wilson.co.uk	2016-11-25 13:49:25 +00:00
Chris Wilson	1618bdb89b	drm/i915: Assert no external observers when unwind a failed request alloc Before we return the request back to the kmem_cache after a failed i915_gem_request_alloc(), we should assert that it has not been added to any global state tracking. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161125131718.20978-2-chris@chris-wilson.co.uk	2016-11-25 13:49:24 +00:00
Chris Wilson	4ffd6e0cfe	drm/i915: Add is-completed assert to request retire entrypoint While we will check that the request is completed prior to being retired, by placing an assert that the request is complete at the entrypoint of the function we can more clearly document the function's preconditions. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161125131718.20978-1-chris@chris-wilson.co.uk	2016-11-25 13:49:23 +00:00
Joonas Lahtinen	4c266edb4c	drm/i915: Rename i915_gem_timeline.next_seqno to .seqno Rename i915_gem_timeline member 'next_seqno' into 'seqno' as the variable is pre-increment. We've already had two bugs due to the confusing name, second is fixed as follow-up patch. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20161124144750.2610-1-chris@chris-wilson.co.uk	2016-11-25 07:01:11 +00:00
Mika Kuoppala	bc1d53c647	drm/i915: Wipe hang stats as an embedded struct Bannable property, banned status, guilty and active counts are properties of i915_gem_context. Make them so. v2: rebase Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1479309634-28574-1-git-send-email-mika.kuoppala@intel.com	2016-11-21 14:36:56 +02:00
Mika Kuoppala	e5e1fc47ea	drm/i915: Use request retirement as context progress As hangcheck score was removed, the active decay of score was removed also. This removed feature for hangcheck to detect if the gpu client was accidentally or maliciously causing intermittent hangs. Reinstate the scoring as a per context property, so that if one context starts to act unfavourably, ban it. v2: ban_period_secs as a gate to score check (Chris) v3: decay in proper spot. scores as tunables (Chris) Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>	2016-11-21 14:36:40 +02:00
Chris Wilson	786d290cae	drm/i915: Check that each request phase is completed before retiring Trying to chase an impossible bug (ivb): [ 207.765411] [drm:i915_reset_and_wakeup [i915]] resetting chip [ 207.765734] [drm:i915_gem_reset [i915]] resetting render ring to restart from tail of request 0x4ee834 [ 207.765791] [drm:intel_print_rc6_info [i915]] Enabling RC6 states: RC6 on RC6p on RC6pp off [ 207.767213] [drm:intel_guc_setup [i915]] GuC fw status: path (null), fetch NONE, load NONE [ 207.767515] kernel BUG at drivers/gpu/drm/i915/i915_gem_request.c:203! [ 207.767551] invalid opcode: 0000 [#1] PREEMPT SMP [ 207.767576] Modules linked in: snd_hda_intel i915 cdc_ncm usbnet mii x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel lpc_ich snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec snd_hwdep snd_hda_core mei_me mei snd_pcm sdhci_pci sdhci mmc_core e1000e ptp pps_core [last unloaded: i915] [ 207.767808] CPU: 3 PID: 8855 Comm: gem_ringfill Tainted: G U 4.9.0-rc5-CI-Patchwork_3052+ #1 [ 207.767854] Hardware name: LENOVO 2356GCG/2356GCG, BIOS G7ET31WW (1.13 ) 07/02/2012 [ 207.767894] task: ffff88012c82a740 task.stack: ffffc9000383c000 [ 207.767927] RIP: 0010:[<ffffffffa00a0a3a>] [<ffffffffa00a0a3a>] i915_gem_request_retire+0x2a/0x4b0 [i915] [ 207.767999] RSP: 0018:ffffc9000383fb20 EFLAGS: 00010293 [ 207.768027] RAX: 00000000004ee83c RBX: ffff880135dcb480 RCX: 00000000004ee83a [ 207.768062] RDX: ffff88012fea42a8 RSI: 0000000000000001 RDI: ffff88012c82af68 [ 207.768095] RBP: ffffc9000383fb48 R08: 0000000000000000 R09: 0000000000000000 [ 207.768129] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880135dcb480 [ 207.768163] R13: ffff88012fea42a8 R14: 0000000000000000 R15: 00000000000001d8 [ 207.768200] FS: 00007f955f658740(0000) GS:ffff88013e2c0000(0000) knlGS:0000000000000000 [ 207.768239] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 207.768258] CR2: 0000555899725930 CR3: 00000001316f6000 CR4: 00000000001406e0 [ 207.768286] Stack: [ 207.768299] ffff880135dcb480 ffff880135dcbe00 ffff88012fea42a8 0000000000000000 [ 207.768350] 00000000000001d8 ffffc9000383fb70 ffffffffa00a1339 0000000000000000 [ 207.768402] ffff88012f296c88 00000000000003f0 ffffc9000383fbb0 ffffffffa00b582d [ 207.768453] Call Trace: [ 207.768493] [<ffffffffa00a1339>] i915_gem_request_retire_upto+0x49/0x90 [i915] [ 207.768553] [<ffffffffa00b582d>] intel_ring_begin+0x15d/0x2d0 [i915] [ 207.768608] [<ffffffffa00b59cb>] intel_ring_alloc_request_extras+0x2b/0x40 [i915] [ 207.768667] [<ffffffffa00a2fd9>] i915_gem_request_alloc+0x359/0x440 [i915] [ 207.768723] [<ffffffffa008bd03>] i915_gem_do_execbuffer.isra.15+0x783/0x1a10 [i915] [ 207.768766] [<ffffffff811a6a2e>] ? __might_fault+0x3e/0x90 [ 207.768816] [<ffffffffa008d380>] i915_gem_execbuffer2+0xc0/0x250 [i915] [ 207.768854] [<ffffffff815532a6>] drm_ioctl+0x1f6/0x480 [ 207.768900] [<ffffffffa008d2c0>] ? i915_gem_execbuffer+0x330/0x330 [i915] [ 207.768939] [<ffffffff81202f6e>] do_vfs_ioctl+0x8e/0x690 [ 207.768972] [<ffffffff818193ac>] ? retint_kernel+0x2d/0x2d [ 207.769004] [<ffffffff810d6ef2>] ? trace_hardirqs_on_caller+0x122/0x1b0 [ 207.769039] [<ffffffff812035ac>] SyS_ioctl+0x3c/0x70 [ 207.769068] [<ffffffff818189ae>] entry_SYSCALL_64_fastpath+0x1c/0xb1 [ 207.769103] Code: 90 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 8b 35 fa 7b e1 e1 85 f6 0f 85 55 03 00 00 41 8b 84 24 80 02 00 00 85 c0 75 02 <0f> 0b 49 8b 94 24 a8 00 00 00 48 8b 8a e0 01 00 00 8b 89 c0 00 [ 207.769400] RIP [<ffffffffa00a0a3a>] i915_gem_request_retire+0x2a/0x4b0 [i915] [ 207.769463] RSP <ffffc9000383fb20> Let's add a couple more BUG_ONs before this to ascertain that the request did make it to hardware. The impossible part of this stacktrace is that request must have been considered completed by the i915_request_wait() before we tried to retire it. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161118143412.26508-1-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.auld@intel.com>	2016-11-18 20:49:46 +00:00
Chris Wilson	4302055b29	drm/i915: Be more careful to drop the GT wakeref Since we can retire requests from multiple paths, we cannot assume that i915_gem_retire_requests() is the sole path on which we can transition to gt.active_requests == 0. A consequence of this is that we would skip the function if we had already retired all the requests and not scheduled the idle worker. This is fallout from changing the routine from considering active_engines (for which it was the only consumer) to active_requests. v2: Move kicking the idle working to i915_gem_request_retire() otherwise we could postpone the idle callback everytime we called retire_requests even though we did no work. v3: We only need to move the idle work kicking! v4: Drop the BUG_ON(!awake) as we may be called from the shrinker in the middle of constructing a request before we have marked the device awake. v5: Add a BUG_ON() for active_requests underflow upon retirement (Joonas) Fixes: `28176ef4cf` ("drm/i915: Reserve space in the global seqno during request allocation") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161115164620.17185-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2016-11-18 11:47:37 +00:00
Christian Borntraeger	f2f09a4cee	locking/core: Remove cpu_relax_lowlatency() users With the s390 special case of a yielding cpu_relax() implementation gone, we can now remove all users of cpu_relax_lowlatency() and replace them with cpu_relax(). Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Noam Camus <noamc@ezchip.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1477386195-32736-5-git-send-email-borntraeger@de.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-16 10:15:10 +01:00
Chris Wilson	9f792ebafe	drm/i915: Store the execution priority on the context In order to support userspace defining different levels of importance to different contexts, and in particular the preferred order of execution, store a priority value on each context. By default, the kernel's context, which is used for idling and other background tasks, is given minimum priority (i.e. all user contexts will execute before the kernel). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-9-chris@chris-wilson.co.uk	2016-11-14 21:01:22 +00:00
Chris Wilson	20311bd350	drm/i915/scheduler: Execute requests in order of priorities Track the priority of each request and use it to determine the order in which we submit requests to the hardware via execlists. The priority of the request is determined by the user (eventually via the context) but may be overridden at any time by the driver. When we set the priority of the request, we bump the priority of all of its dependencies to match - so that a high priority drawing operation is not stuck behind a background task. When the request is ready to execute (i.e. we have signaled the submit fence following completion of all its dependencies, including third party fences), we put the request into a priority sorted rbtree to be submitted to the hardware. If the request is higher priority than all pending requests, it will be submitted on the next context-switch interrupt as soon as the hardware has completed the current request. We do not currently preempt any current execution to immediately run a very high priority request, at least not yet. One more limitation, is that this is first implementation is for execlists only so currently limited to gen8/gen9. v2: Replace recursive priority inheritance bumping with an iterative depth-first search list. v3: list_next_entry() for walking lists v4: Explain how the dfs solves the recursion problem with PI. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-8-chris@chris-wilson.co.uk	2016-11-14 21:01:21 +00:00
Chris Wilson	52e5420907	drm/i915/scheduler: Record all dependencies upon request construction The scheduler needs to know the dependencies of each request for the lifetime of the request, as it may choose to reschedule the requests at any time and must ensure the dependency tree is not broken. This is in additional to using the fence to only allow execution after all dependencies have been completed. One option was to extend the fence to support the bidirectional dependency tracking required by the scheduler. However the mismatch in lifetimes between the submit fence and the request essentially meant that we had to build a completely separate struct (and we could not simply reuse the existing waitqueue in the fence for one half of the dependency tracking). The extra dependency tracking simply did not mesh well with the fence, and keeping it separate both keeps the fence implementation simpler and allows us to extend the dependency tracking into a priority tree (whilst maintaining support for reordering the tree). To avoid the additional allocations and list manipulations, the use of the priotree is disabled when there are no schedulers to use it. v2: Create a dedicated slab for i915_dependency. Rename the lists. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-7-chris@chris-wilson.co.uk	2016-11-14 21:00:28 +00:00
Chris Wilson	0de9136dbb	drm/i915/scheduler: Signal the arrival of a new request The start of the scheduler, add a hook into request submission for the scheduler to see the arrival of new requests and prepare its runqueues. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-6-chris@chris-wilson.co.uk	2016-11-14 21:00:26 +00:00
Chris Wilson	d55ac5bf97	drm/i915: Defer transfer onto execution timeline to actual hw submission Defer the transfer from the client's timeline onto the execution timeline from the point of readiness to the point of actual submission. For example, in execlists, a request is finally submitted to hardware when the hardware is ready, and only put onto the hardware queue when the request is ready. By deferring the transfer, we ensure that the timeline is maintained in retirement order if we decide to queue the requests onto the hardware in a different order than fifo. v2: Rebased onto distinct global/user timeline lock classes. v3: Play with the position of the spin_lock(). v4: Nesting finally resolved with distinct sw_fence lock classes. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-4-chris@chris-wilson.co.uk	2016-11-14 21:00:23 +00:00
Chris Wilson	23902e49c9	drm/i915: Split request submit/execute phase into two In order to support deferred scheduling, we need to differentiate between when the request is ready to run (i.e. the submit fence is signaled) and when the request is actually run (a new execute fence). This is typically split between the request itself wanting to wait upon others (for which we use the submit fence) and the CPU wanting to wait upon the request, for which we use the execute fence to be sure the hardware is ready to signal completion. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-3-chris@chris-wilson.co.uk	2016-11-14 21:00:22 +00:00
Chris Wilson	bb89485e99	drm/i915: Create distinct lockclasses for execution vs user timelines In order to simplify the lockdep annotation, as they become more complex in the future with deferred execution and multiple paths through the same functions, create a separate lockclass for the user timeline and the hardware execution timeline. We should only ever be locking the user timeline and the execution timeline in parallel so we only need to create two lock classes, rather than a separate class for every timeline. v2: Rename the lock classes to be more consistent with other lockdep. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-2-chris@chris-wilson.co.uk	2016-11-14 21:00:21 +00:00
Chris Wilson	6a5d1db98e	drm/i915: Spin until breadcrumb threads are complete When we need to reset the global seqno on wraparound, we have to wait until the current rbtrees are drained (or otherwise the next waiter will be out of sequence). The current mechanism to kick and spin until complete, may exit too early as it would break if the target thread was currently running. Instead, we must wake up the threads, but keep spinning until the trees have been deleted. In order to appease Tvrtko, busy spin rather than yield(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161108143719.32215-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2016-11-09 15:01:52 +00:00
Imre Deak	5bd11a34e4	drm/i915: Avoid early GPU idling due to already pending idle work Atm, in case an idle work handler is already pending but haven't yet started to run, retiring a new request will not extend the active period as required, rather simply leaves the pending idle work to be scheduled at the original expiration time. This may lead to idling the GPU too early. Fix this by using the delayed-work scheduler alternative which makes sure the handler's expiration time is extended in this case. Cc: Chris Wilson <chris@chris-wilson.co.uk> Requested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1478510405-11799-1-git-send-email-imre.deak@intel.com	2016-11-07 14:48:04 +02:00
Chris Wilson	80b204bce8	drm/i915: Enable multiple timelines With the infrastructure converted over to tracking multiple timelines in the GEM API whilst preserving the efficiency of using a single execution timeline internally, we can now assign a separate timeline to every context with full-ppgtt. v2: Add a comment to indicate the xfer between timelines upon submission. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-35-chris@chris-wilson.co.uk	2016-10-28 20:53:57 +01:00
Chris Wilson	f2d13290e3	drm/i915: Defer setting of global seqno on request to submission Defer the assignment of the global seqno on a request to its submission. In the next patch, we will only allocate the global seqno at that time, here we are just enabling the wait-for-submission before wait-for-seqno paths. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-34-chris@chris-wilson.co.uk	2016-10-28 20:53:56 +01:00
Chris Wilson	28176ef4cf	drm/i915: Reserve space in the global seqno during request allocation A restriction on our global seqno is that they cannot wrap, and that we cannot use the value 0. This allows us to detect when a request has not yet been submitted, its global seqno is still 0, and ensures that hardware semaphores are monotonic as required by older hardware. To meet these restrictions when we defer the assignment of the global seqno, we must check that we have an available slot in the global seqno space during request construction. If that test fails, we wait for all requests to be completed and reset the hardware back to 0. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-33-chris@chris-wilson.co.uk	2016-10-28 20:53:56 +01:00
Chris Wilson	85e17f5974	drm/i915: Move the global sync optimisation to the timeline Currently we try to reduce the number of synchronisations (now the number of requests we need to wait upon) by noting that if we have earlier waited upon a request, all subsequent requests in the timeline will be after the wait. This only applies to requests in this timeline, as other timelines will not be ordered by that waiter. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-30-chris@chris-wilson.co.uk	2016-10-28 20:53:54 +01:00
Chris Wilson	caddfe7192	drm/i915: Defer breadcrumb emission Move the actual emission of the breadcrumb for closing the request from i915_add_request() to the submit callback. (It can be moved later when required.) This allows us to defer the allocation of the global_seqno from request construction to actual submission, allowing us to emit the requests out of order (wrt to the order of their construction, they still will only be executed one all of their dependencies are resolved including that all earlier requests on their timeline have been submitted.) We have to specialise how we then emit the request in order to write into the preallocated space, rather than at the tail of the ringbuffer (which will have been advanced by the addition of new requests). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-29-chris@chris-wilson.co.uk	2016-10-28 20:53:54 +01:00
Chris Wilson	98f29e8d90	drm/i915: Record space required for breadcrumb emission In the next patch, we will use deferred breadcrumb emission. That requires reserving sufficient space in the ringbuffer to emit the breadcrumb, which first requires us to know how large the breadcrumb is. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-28-chris@chris-wilson.co.uk	2016-10-28 20:53:53 +01:00
Chris Wilson	9b81d556b1	drm/i915: Rename ->emit_request to ->emit_breadcrumb Now that the emission of the request tail and its submission to hardware are two separate steps, engine->emit_request() is confusing. engine->emit_request() is called to emit the breadcrumb commands for the request into the ring, name it such (engine->emit_breadcrumb). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-27-chris@chris-wilson.co.uk	2016-10-28 20:53:53 +01:00
Chris Wilson	65e4760e39	drm/i915: Introduce a global_seqno for each request Though we will have multiple timelines, we still have a single timeline of execution. This we can use to provide an execution and retirement order of requests. This keeps tracking execution of requests simple, and vital for preserving a single waiter (i.e. so that we can order the waiters so that only the earliest to wakeup need be woken). To accomplish this we distinguish the seqno used to order requests per-context (external) and that used internally for execution. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-26-chris@chris-wilson.co.uk	2016-10-28 20:53:53 +01:00
Chris Wilson	4680816be3	drm/i915: Wait first for submission, before waiting for request completion In future patches, we will no longer be able to wait on a static global seqno and instead have to break our wait up into phases. First we wait for the global seqno assignment (upon submission to hardware), and once submitted we wait for the hardware to complete. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-25-chris@chris-wilson.co.uk	2016-10-28 20:53:52 +01:00
Chris Wilson	73cb97010d	drm/i915: Combine seqno + tracking into a global timeline struct Our timelines are more than just a seqno. They also provide an ordered list of requests to be executed. Due to the restriction of handling individual address spaces, we are limited to a timeline per address space but we use a fence context per engine within. Our first step to introducing independent timelines per context (i.e. to allow each context to have a queue of requests to execute that have a defined set of dependencies on other requests) is to provide a timeline abstraction for the global execution queue. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-23-chris@chris-wilson.co.uk	2016-10-28 20:53:51 +01:00
Chris Wilson	d07f0e59b2	drm/i915: Move GEM activity tracking into a common struct reservation_object In preparation to support many distinct timelines, we need to expand the activity tracking on the GEM object to handle more than just a request per engine. We already use the struct reservation_object on the dma-buf to handle many fence contexts, so integrating that into the GEM object itself is the preferred solution. (For example, we can now share the same reservation_object between every consumer/producer using this buffer and skip the manual import/export via dma-buf.) v2: Reimplement busy-ioctl (by walking the reservation object), postpone the ABI change for another day. Similarly use the reservation object to find the last_write request (if active and from i915) for choosing display CS flips. Caveats: * busy-ioctl: busy-ioctl only reports on the native fences, it will not warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is being rendered to by external fences. It also will not report the same busy state as wait-ioctl (or polling on the dma-buf) in the same circumstances. On the plus side, it does retain reporting of which i915 engines are engaged with this object. * non-blocking atomic modesets take a step backwards as the wait for render completion blocks the ioctl. This is fixed in a subsequent patch to use a fence instead for awaiting on the rendering, see "drm/i915: Restore nonblocking awaits for modesetting" * dynamic array manipulation for shared-fences in reservation is slower than the previous lockless static assignment (e.g. gem_exec_lut_handle runtime on ivb goes from 42s to 66s), mainly due to atomic operations (maintaining the fence refcounts). * loss of object-level retirement callbacks, emulated by VMA retirement tracking. * minor loss of object-level last activity information from debugfs, could be replaced with per-vma information if desired Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk	2016-10-28 20:53:50 +01:00
Chris Wilson	4c7d62c6b8	drm/i915: Markup GEM API with lockdep asserts Add lockdep_assert_held(struct_mutex) to the API preamble of the internal GEM interfaces. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-9-chris@chris-wilson.co.uk	2016-10-28 20:53:45 +01:00

1 2 3

103 Commits