linux

Commit Graph

Author	SHA1	Message	Date
John Harrison	dc4be6071a	drm/i915: Add explicit request management to i915_gem_init_hw() Now that a single per ring loop is being done for all the different intialisation steps in i915_gem_init_hw(), it is possible to add proper request management as well. The last remaining issue is that the context enable call eventually ends up within *_render_state_init() and this does its own private _i915_add_request() call. This patch adds explicit request creation and submission to the top level loop and removes the add_request() from deep within the sub-functions. v2: Updated for removal of batch_obj from add_request call in previous patch. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:02:08 +02:00
John Harrison	a3fbe05a61	drm/i915: Don't tag kernel batches as user batches The render state initialisation code does an explicit i915_add_request() call to commit the init commands. It was passing in the initialisation batch buffer to add_request() as the batch object parameter. However, the batch object entry in the request structure (which is all that parameter is used for) is meant for keeping track of user generated batch buffers for blame tagging during GPU hangs. This patch clears the batch object parameter so that kernel generated batch buffers are not tagged as being user generated. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:02:07 +02:00
John Harrison	5b4a60c276	drm/i915: Add flag to i915_add_request() to skip the cache flush In order to explcitly track all GPU work (and completely remove the outstanding lazy request), it is necessary to add extra i915_add_request() calls to various places. Some of these do not need the implicit cache flush done as part of the standard batch buffer submission process. This patch adds a flag to _add_request() to specify whether the flush is required or not. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:02:04 +02:00
John Harrison	8a8edb5917	drm/i915: Update execbuffer_move_to_active() to take a request structure The plan is to pass requests around as the basic submission tracking structure rather than rings and contexts. This patch updates the execbuffer_move_to_active() code path. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:02:03 +02:00
John Harrison	535fbe8233	drm/i915: Update move_to_gpu() to take a request structure The plan is to pass requests around as the basic submission tracking structure rather than rings and contexts. This patch updates the move_to_gpu() code paths. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:02:03 +02:00
John Harrison	95c24161cd	drm/i915: Update the dispatch tracepoint to use params->request Updated a couple of trace points to use the now cached request pointer rather than extracting it from the ring. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:02:02 +02:00
John Harrison	217e46b576	drm/i915: Update alloc_request to return the allocated request The alloc_request() function does not actually return the newly allocated request. Instead, it must be pulled from ring->outstanding_lazy_request. This patch fixes this so that code can create a request and start using it knowing exactly which request it actually owns. v2: Updated for new i915_gem_request_alloc() scheme. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:02:00 +02:00
John Harrison	adeca76d8e	drm/i915: Simplify i915_gem_execbuffer_retire_commands() parameters Shrunk the parameter list of i915_gem_execbuffer_retire_commands() to a single structure as everything it requires is available in the execbuff_params object. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:02:00 +02:00
John Harrison	5f19e2bffa	drm/i915: Merged the many do_execbuf() parameters into a structure The do_execbuf() function takes quite a few parameters. The actual set of parameters is going to change with the conversion to passing requests around. Further, it is due to grow massively with the arrival of the GPU scheduler. This patch simplifies the prototype by passing a parameter structure instead. Changing the parameter set in the future is then simply a matter of adding/removing items to the structure. Note that the structure does not contain absolutely everything that is passed in. This is because the intention is to use this structure more extensively later in this patch series and more especially in the GPU scheduler that is coming soon. The latter requires hanging on to the structure as the final hardware submission can be delayed until long after the execbuf IOCTL has returned to user land. Thus it is unsafe to put anything in the structure that is local to the IOCTL call itself - such as the 'args' parameter. All entries must be copies of data or pointers to structures that are reference counted in some way and guaranteed to exist for the duration of the batch buffer's life. v2: Rebased to newer tree and updated for changes to the command parser. Specifically, a code shuffle has required saving the batch start address in the params structure. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:01:59 +02:00
John Harrison	40e895ceca	drm/i915: Set context in request from creation even in legacy mode In execlist mode, the context object pointer is written in to the request structure (and reference counted) at the point of request creation. In legacy mode, this only happens inside i915_add_request(). This patch updates the legacy code path to match the execlist version. This allows all the intermediate code between request creation and request submission to get at the context object given only a request structure. Thus negating the need to pass context pointers here, there and everywhere. v2: Moved the context reference so it does not need to be undone if the get_seqno() fails. v3: Fixed execlist mode always hitting a warning about invalid last_contexts (which don't exist in execlist mode). v4: Updated for new i915_gem_request_alloc() scheme. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:01:58 +02:00
John Harrison	bf7dc5b709	drm/i915: i915_add_request must not fail The i915_add_request() function is called to keep track of work that has been written to the ring buffer. It adds epilogue commands to track progress (seqno updates and such), moves the request structure onto the right list and other such house keeping tasks. However, the work itself has already been written to the ring and will get executed whether or not the add request call succeeds. So no matter what goes wrong, there isn't a whole lot of point in failing the call. At the moment, this is fine(ish). If the add request does bail early on and not do the housekeeping, the request will still float around in the ring->outstanding_lazy_request field and be picked up next time. It means multiple pieces of work will be tagged as the same request and driver can't actually wait for the first piece of work until something else has been submitted. But it all sort of hangs together. This patch series is all about removing the OLR and guaranteeing that each piece of work gets its own personal request. That means that there is no more 'hoovering up of forgotten requests'. If the request does not get tracked then it will be leaked. Thus the add request call _must_ not fail. The previous patch should have already ensured that it _will_ not fail by removing the potential for running out of ring space. This patch enforces the rule by actually removing the early exit paths and the return code. Note that if something does manage to fail and the epilogue commands don't get written to the ring, the driver will still hang together. The request will be added to the tracking lists. And as in the old case, any subsequent work will generate a new seqno which will suffice for marking the old one as complete. v2: Improved WARNings (Tomas Elf review request). For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:01:57 +02:00
John Harrison	29b1b415fc	drm/i915: Reserve ring buffer space for i915_add_request() commands It is a bad idea for i915_add_request() to fail. The work will already have been send to the ring and will be processed, but there will not be any tracking or management of that work. The only way the add request call can fail is if it can't write its epilogue commands to the ring (cache flushing, seqno updates, interrupt signalling). The reasons for that are mostly down to running out of ring buffer space and the problems associated with trying to get some more. This patch prevents that situation from happening in the first place. When a request is created, it marks sufficient space as reserved for the epilogue commands. Thus guaranteeing that by the time the epilogue is written, there will be plenty of space for it. Note that a ring_begin() call is required to actually reserve the space (and do any potential waiting). However, that is not currently done at request creation time. This is because the ring_begin() code can allocate a request. Hence calling begin() from the request allocation code would lead to infinite recursion! Later patches in this series remove the need for begin() to do the allocate. At that point, it becomes safe for the allocate to call begin() and really reserve the space. Until then, there is a potential for insufficient space to be available at the point of calling i915_add_request(). However, that would only be in the case where the request was created and immediately submitted without ever calling ring_begin() and adding any work to that request. Which should never happen. And even if it does, and if that request happens to fall down the tiny window of opportunity for failing due to being out of ring space then does it really matter because the request wasn't doing anything in the first place? v2: Updated the 'reserved space too small' warning to include the offending sizes. Added a 'cancel' operation to clean up when a request is abandoned. Added re-initialisation of tracking state after a buffer wrap to keep the sanity checks accurate. v3: Incremented the reserved size to accommodate Ironlake (after finally managing to run on an ILK system). Also fixed missing wrap code in LRC mode. v4: Added extra comment and removed duplicate WARN (feedback from Tomas). For: VIZ-5115 CC: Tomas Elf <tomas.elf@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:01:56 +02:00
Arun Siluvery	c82435bbe5	drm/i915/gen8: Add WaFlushCoherentL3CacheLinesAtContextSwitch workaround In Indirect context w/a batch buffer, +WaFlushCoherentL3CacheLinesAtContextSwitch:bdw v2: Add LRI commands to set/reset bit that invalidates coherent lines, update WA to include programming restrictions and exclude CHV as it is not required (Ville) v3: Avoid unnecessary read when it can be done by reading register once (Chris). Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dave Gordon <david.s.gordon@intel.com> Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:01:41 +02:00
Arun Siluvery	7ad00d1ac1	drm/i915/gen8: Add WaDisableCtxRestoreArbitration workaround In Indirect and Per context w/a batch buffer, +WaDisableCtxRestoreArbitration Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dave Gordon <david.s.gordon@intel.com> Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:01:41 +02:00
Arun Siluvery	c4db759919	drm/i915/gen8: Re-order init pipe_control in lrc mode Some of the WA applied using WA batch buffers perform writes to scratch page. In the current flow WA are initialized before scratch obj is allocated. This patch reorders intel_init_pipe_control() to have a valid scratch obj before we initialize WA. v2: Check for valid scratch page before initializing WA as some of them perform writes to it. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dave Gordon <david.s.gordon@intel.com> Signed-off-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:01:40 +02:00
Arun Siluvery	17ee950df3	drm/i915/gen8: Add infrastructure to initialize WA batch buffers Some of the WA are to be applied during context save but before restore and some at the end of context save/restore but before executing the instructions in the ring, WA batch buffers are created for this purpose and these WA cannot be applied using normal means. Each context has two registers to load the offsets of these batch buffers. If they are non-zero, HW understands that it need to execute these batches. v1: In this version two separate ring_buffer objects were used to load WA instructions for indirect and per context batch buffers and they were part of every context. v2: Chris suggested to include additional page in context and use it to load these WA instead of creating separate objects. This will simplify lot of things as we need not explicity pin/unpin them. Thomas Daniel further pointed that GuC is planning to use a similar setup to share data between GuC and driver and WA batch buffers can probably share that page. However after discussions with Dave who is implementing GuC changes, he suggested to use an independent page for the reasons - GuC area might grow and these WA are initialized only once and are not changed afterwards so we can share them share across all contexts. The page is updated with WA during render ring init. This has an advantage of not adding more special cases to default_context. We don't know upfront the number of WA we will applying using these batch buffers. For this reason the size was fixed earlier but it is not a good idea. To fix this, the functions that load instructions are modified to report the no of commands inserted and the size is now calculated after the batch is updated. A macro is introduced to add commands to these batch buffers which also checks for overflow and returns error. We have a full page dedicated for these WA so that should be sufficient for good number of WA, anything more means we have major issues. The list for Gen8 is small, same for Gen9 also, maybe few more gets added going forward but not close to filling entire page. Chris suggested a two-pass approach but we agreed to go with single page setup as it is a one-off routine and simpler code wins. One additional option is offset field which is helpful if we would like to have multiple batches at different offsets within the page and select them based on some criteria. This is not a requirement at this point but could help in future (Dave). Chris provided some helpful macros and suggestions which further simplified the code, they will also help in reducing code duplication when WA for other Gen are added. Add detailed comments explaining restrictions. Use do {} while(0) for wa_ctx_emit() macro. (Many thanks to Chris, Dave and Thomas for their reviews and inputs) Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dave Gordon <david.s.gordon@intel.com> Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-06-23 14:01:39 +02:00
Michel Thierry	d63f820f39	drm/i915: Remove unnecessary null check in execlists_context_unqueue commit `53292cdb06` ("drm/i915: Workaround to avoid lite restore with HEAD==TAIL") added a check for req0 != null which is unnecessary. The only way req0 could be null is if the list was empty, and this is already addressed at the beginning of execlists_context_unqueue(). Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-05-27 13:20:51 +02:00
Chris Wilson	03ade51185	drm/i915: Inline check required for object syncing prior to execbuf This trims a little overhead from the common case of not needing to synchronize between rings. v2: execlists is special and likes to duplicate code. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-05-21 15:11:43 +02:00
Chris Wilson	b47161858b	drm/i915: Implement inter-engine read-read optimisations Currently, we only track the last request globally across all engines. This prevents us from issuing concurrent read requests on e.g. the RCS and BCS engines (or more likely the render and media engines). Without semaphores, we incur costly stalls as we synchronise between rings - greatly impacting the current performance of Broadwell versus Haswell in certain workloads (like video decode). With the introduction of reference counted requests, it is much easier to track the last request per ring, as well as the last global write request so that we can optimise inter-engine read read requests (as well as better optimise certain CPU waits). v2: Fix inverted readonly condition for nonblocking waits. v3: Handle non-continguous engine array after waits v4: Rebase, tidy, rewrite ring list debugging v5: Use obj->active as a bitfield, it looks cool v6: Micro-optimise, mostly involving moving code around v7: Fix retire-requests-upto for execlists (and multiple rq->ringbuf) v8: Rebase v9: Refactor i915_gem_object_sync() to allow the compiler to better optimise it. Benchmark: igt/gem_read_read_speed hsw:gt3e (with semaphores): Before: Time to read-read 1024k: 275.794µs After: Time to read-read 1024k: 123.260µs hsw:gt3e (w/o semaphores): Before: Time to read-read 1024k: 230.433µs After: Time to read-read 1024k: 124.593µs bdw-u (w/o semaphores): Before After Time to read-read 1x1: 26.274µs 10.350µs Time to read-read 128x128: 40.097µs 21.366µs Time to read-read 256x256: 77.087µs 42.608µs Time to read-read 512x512: 281.999µs 181.155µs Time to read-read 1024x1024: 1196.141µs 1118.223µs Time to read-read 2048x2048: 5639.072µs 5225.837µs Time to read-read 4096x4096: 22401.662µs 21137.067µs Time to read-read 8192x8192: 89617.735µs 85637.681µs Testcase: igt/gem_concurrent_blit (read-read and friends) Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [v8] [danvet: s/\<rq\>/req/g] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-05-21 15:11:42 +02:00
Peter Antoine	779949f4b1	drm/i915: Warn when execlists changes context without IRQs If an batch ends while the IRQs are not turned on the notification can go missing and the GPU can hang. So generate a warning in this case. Signed-off-by: Peter Antoine <peter.antoine@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-05-20 11:25:44 +02:00
Dan Carpenter	3126a660f3	drm/i915: checking IS_ERR() instead of NULL We switched from calling i915_gem_alloc_context_obj() to calling i915_gem_alloc_object() so the error handling needs to be updated to check for NULL instead of IS_ERR(). Fixes: `149c86e74f` ('drm/i915: Allocate context objects from stolen') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-05-08 13:03:23 +02:00
Dave Airlie	e1dee1973c	Merge tag 'drm-intel-next-2015-04-23-fixed' of git://anongit.freedesktop.org/drm-intel into drm-next drm-intel-next-2015-04-23: - dither support for ns2501 dvo (Thomas Richter) - some polish for the gtt code and fixes to finally enable the cmd parser on hsw - first pile of bxt stage 1 enabling (too many different people to list ...) - more psr fixes from Rodrigo - skl rotation support from Chandra - more atomic work from Ander and Matt - pile of cleanups and micro-ops for execlist from Chris drm-intel-next-2015-04-10: - cdclk handling cleanup and fixes from Ville - more prep patches for olr removal from John Harrison - gmbus pin naming rework from Jani (prep for bxt) - remove ->new_config from Ander (more atomic conversion work) - rps (boost) tuning and unification with byt/bsw from Chris - cmd parser batch bool tuning from Chris - gen8 dynamic pte allocation (Michel Thierry, based on work from Ben Widawsky) - execlist tuning (not yet all of it) from Chris - add drm_plane_from_index (Chandra) - various small things all over * tag 'drm-intel-next-2015-04-23-fixed' of git://anongit.freedesktop.org/drm-intel: (204 commits) drm/i915/gtt: Allocate va range only if vma is not bound drm/i915: Enable cmd parser to do secure batch promotion for aliasing ppgtt drm/i915: fix intel_prepare_ddi drm/i915: factor out ddi_get_encoder_port drm/i915/hdmi: check port in ibx_infoframe_enabled drm/i915/hdmi: fix vlv infoframe port check drm/i915: Silence compiler warning in dvo drm/i915: Update DRIVER_DATE to 20150423 drm/i915: Enable dithering on NatSemi DVO2501 for Fujitsu S6010 rm/i915: Move i915_get_ggtt_vma_pages into ggtt_bind_vma drm/i915: Don't try to outsmart gcc in i915_gem_gtt.c drm/i915: Unduplicate i915_ggtt_unbind/bind_vma drm/i915: Move ppgtt_bind/unbind around drm/i915: move i915_gem_restore_gtt_mappings around drm/i915: Fix up the vma aliasing ppgtt binding drm/i915: Remove misleading comment around bind_to_vm drm/i915: Don't use atomics for pg_dirty_rings drm/i915: Don't look at pg_dirty_rings for aliasing ppgtt drm/i915/skl: Support Y tiling in MMIO flips drm/i915: Fixup kerneldoc for struct intel_context ... Conflicts: drivers/gpu/drm/i915/i915_drv.c	2015-05-08 20:51:06 +10:00
Michel Thierry	53292cdb06	drm/i915: Workaround to avoid lite restore with HEAD==TAIL WaIdleLiteRestore is an execlists-only workaround, and requires the driver to ensure that any context always has HEAD!=TAIL when attempting lite restore. Add two extra MI_NOOP instructions at the end of each request, but keep the requests tail pointing before the MI_NOOPs. We may not need to executed them, and this is why request->tail is sampled before adding these extra instructions. If we submit a context to the ELSP which has previously been submitted, move the tail pointer past the MI_NOOPs. This ensures HEAD!=TAIL. v2: Move overallocation to gen8_emit_request, and added note about sampling request->tail in commit message (Chris). v3: Remove redundant request->tail assignment in __i915_add_request, in lrc mode this is already set in execlists_context_queue. Do not add wa implementation details inside gem (Chris). v4: Apply the wa whenever the req has been resubmitted and update comment (Chris). Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> Signed-off-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2015-04-23 23:56:52 +03:00
Daniel Vetter	c5fe557dde	Merge branch 'topic/bxt-stage1' into drm-intel-next-queued Separate topic branch for bxt didn't work out since we needed to refactor the gmbus code a bit to make it look decent. So backmerge. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2015-04-14 14:00:56 +02:00
Imre Deak	9647ff36ae	drm/i915/gen9: fix PIPE_CONTROL flush for VS_INVALIDATE On GEN9+ per specification a NULL PIPE_CONTROL needs to be emitted before any PIPE_CONTROL command with the VS_INVALIDATE flag set. Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Nick Hoath <nicholas.hoath@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-14 13:55:20 +02:00
Michel Thierry	a6631bc8d6	drm/i915: Remove unused variable from execlists_context_queue After commit `d7b9ca2f7a` ("drm/i915: Remove request->uniq") dev_priv is no longer needed. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-10 16:34:44 +02:00
Chris Wilson	149c86e74f	drm/i915: Allocate context objects from stolen As we never expose context objects directly to userspace, we can forgo allocating a first-class GEM object for them and prefer to use the limited resource of reserved/stolen memory for them. Note this means that their initial contents are undefined. However, a downside of using stolen objects for execlists is that we cannot access the physical address directly (thanks MCH!) which prevents their use. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-10 10:41:24 +02:00
Chris Wilson	d7b9ca2f7a	drm/i915: Remove request->uniq We already assign a unique identifier to every request: seqno. That someone felt like adding a second one without even mentioning why and tweaking ABI smells very fishy. Fixes regression from commit `b3a38998f0` Author: Nick Hoath <nicholas.hoath@intel.com> Date: Thu Feb 19 16:30:47 2015 +0000 drm/i915: Fix a use after free, and unbalanced refcounting v2: Rebase Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Nick Hoath <nicholas.hoath@intel.com> Cc: Thomas Daniel <thomas.daniel@intel.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Jani Nikula <jani.nikula@intel.com> [danvet: Fixup because different merge order.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-10 10:41:11 +02:00
Chris Wilson	a6111f7b66	drm/i915: Reduce locking in execlist command submission This eliminates six needless spin lock/unlock pairs when writing out ELSP. v2: Respin with my preferred colour. v3: Mostly back to the original colour Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [v1] Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-10 10:31:44 +02:00
Daniel Vetter	19ee66af15	drm/i915: Remove unused variable in intel_lrc.c Already tagged this one and 0-day builder is failing me. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2015-04-10 10:18:47 +02:00
Chris Wilson	595e1eeb26	drm/i915: Remove vestigal DRI1 ring quiescing code After the removal of DRI1, all access to the rings are through requests and so we can always be sure that there is a request to wait upon to free up available space. The fallback code only existed so that we could quiesce the GPU following unmediated access by DRI1. v2: Rebase Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-10 08:56:15 +02:00
Chris Wilson	4bb1bedb28	drm/i915: Use the global runtime-pm wakelock for a busy GPU for execlists When we submit a request to the GPU, we first take the rpm wakelock, and only release it once the GPU has been idle for a small period of time after all requests have been complete. This means that we are sure no new interrupt can arrive whilst we do not hold the rpm wakelock and so can drop the individual get/put around every single request inside execlists. Note: to close one potential issue we should mark the GPU as busy earlier in __i915_add_request. To elaborate: The issue is that we emit the irq signalling sequence before we grab the rpm reference, which means we could miss the resulting interrupt (since that's not set up when suspended). The only bad side effect is a missed interrupt, gt mmio writes automatically wake up the hw itself. But otoh we have an umbrella rpm reference for the entirety of execbuf, as long as that's there we're covered. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Explain a bit more about the add_request issue, which after some irc chatting with Chris turns out to not be an issue really.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-10 08:56:14 +02:00
Chris Wilson	b5eba37283	drm/i915: Use simpler form of spin_lock_irq(execlist_lock) We can use the simpler spinlock form to disable interrupts as we are always outside of an irq/softirq handler. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-10 08:56:14 +02:00
Michel Thierry	d7b2633dba	drm/i915/gen8: Dynamic page table allocations This finishes off the dynamic page tables allocations, in the legacy 3 level style that already exists. Most everything has already been setup to this point, the patch finishes off the enabling by setting the appropriate function pointers. In LRC mode, contexts need to know the PDPs when they are populated. With dynamic page table allocations, these PDPs may not exist yet. Check if PDPs have been allocated and use the scratch page if they do not exist yet. Before submission, update the PDPs in the logic ring context as PDPs have been allocated. v2: Update aliasing/true ppgtt allocate/teardown/clear functions for gen 6 & 7. v3: Rebase. v4: Remove BUG() from ppgtt_unbind_vma, but keep checking that either teardown_va_range or clear_range functions exist (Daniel). v5: Similar to gen6, in init, gen8_ppgtt_clear_range call is only needed for aliasing ppgtt. Zombie tracking was originally added for teardown function and is no longer required. v6: Update err_out case in gen8_alloc_va_range (missed from lastest rebase). v7: Rebase after s/page_tables/page_table/. v8: Updated scratch_pt check after scratch flag was removed in previous patch. v9: Note that lrc mode needs to be updated to support init state without any PDP. v10: Unmap correct page_table in gen8_alloc_va_range's error case, clean-up gen8_aliasing_ppgtt_init (remove duplicated map), and initialize PTs during page table allocation. v11: Squashed LRC enabling commit, otherwise LRC mode would be left broken until it was updated to handle the init case without any PDP. v12: Do not overallocate new_pts bitmap, make alloc_gen8_temp_bitmaps static and don't abuse of inline functions. (Mika) Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+) Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-10 08:56:13 +02:00
Michel Thierry	e5815a2e05	drm/i915/gen8: Split out mappings When we do dynamic page table allocations for gen8, we'll need to have more control over how and when we map page tables, similar to gen6. In particular, DMA mappings for page directories/tables occur at allocation time. This patch adds the functionality and calls it at init, which should have no functional change. The PDPEs are still a special case for now. We'll need a function for that in the future as well. v2: Handle renamed unmap_and_free_page functions. v3: Updated after teardown_va logic was removed. v4: Rebase after s/page_tables/page_table/. v5: No longer allocate all PDPs in GEN8+ systems with less than 4GB of memory, and update populate_lr_context to handle this new case (proper tracking will be added later in the patch series). v6: Assign lrc page directory pointer addresses using a macro. (Mika) Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+) Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-10 08:56:12 +02:00
Chris Wilson	06fbca713e	drm/i915: Split the batch pool by engine I woke up one morning and found 50k objects sitting in the batch pool and every search seemed to iterate the entire list... Painting the screen in oils would provide a more fluid display. One issue with the current design is that we only check for retirements on the current ring when preparing to submit a new batch. This means that we can have thousands of "active" batches on another ring that we have to walk over. The simplest way to avoid that is to split the pools per ring and then our LRU execution ordering will also ensure that the inactive buffers remain at the front. v2: execlists still requires duplicate code. v3: execlists requires more duplicate code Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-10 08:56:04 +02:00
Arun Siluvery	51847fb99f	drm/i915: Do not set L3-LLC Coherency bit in ctx descriptor According to Spec this is a reserved bit for Gen9+ and should not be set. Change-Id: I0215fb7057b94139b7a2f90ecc7a0201c0c93ad4 Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-10 08:55:58 +02:00
John Harrison	dbe4646d6e	drm/i915: Fix for ringbuf space wait in LRC mode The legacy and LRC code paths have an almost identical procedure for waiting for space in the ring buffer. They both search for a request in the free list that will advance the tail to a point where sufficient space is available. They then wait for that request, retire it and recalculate the free space value. Unfortunately, a bug in the LRC side meant that the resulting free space might not be as large as expected and indeed, might not be sufficient. This is because it was testing against the value of request->tail not request->postfix. Whereas, when a request is retired, ringbuf->tail is updated to req->postfix not req->tail. Another significant difference between the two is that the LRC one did not trust the wait for request to work! It redid the is there enough space available test and would fail the call if insufficient. Whereas, the legacy version just said 'return 0' - it assumed the preceeding code works. This difference meant that the LRC version still worked even with the bug - it just fell back to the polling wait path. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Thomas Daniel <thomas.daniel@intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-01 07:54:43 +02:00
John Harrison	6689cb2b62	drm/i915: Move common request allocation code into a common function The request allocation code is largely duplicated between legacy mode and execlist mode. The actual difference between the two versions of the code is pretty minimal. This patch moves the common code out into a separate function. This is then called by the execution specific version prior to setting up the one different value. For: VIZ-5190 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-01 07:54:30 +02:00
John Harrison	bc0dce3fd0	drm/i915: Make intel_logical_ring_begin() static The only usage of intel_logical_ring_begin() is within intel_lrc.c so it can be made static. To avoid a forward declaration at the top of the file, it and bunch of other functions have been shuffled upwards. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-01 07:54:17 +02:00
Dave Airlie	a8c6ecb3be	Linux 4.0-rc3 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJU/NacAAoJEHm+PkMAQRiGdUcIAJU5dHclwd9HRc7LX5iOwYN6 mN0aCsYjMD8Pjx2VcPCgJvkIoESQO5pkwYpFFWCwILup1bVEidqXfr8EPOdThzdh kcaT0FwUvd19K+0jcKVNCX1RjKBtlUfUKONk6sS2x4RrYZpv0Ur8Gh+yXV8iMWtf fAusNEYlxQJvEz5+NSKw86EZTr4VVcykKLNvj+/t/JrXEuue7IG8EyoAO/nLmNd2 V/TUKKttqpE6aUVBiBDmcMQl2SUVAfp5e+KJAHmizdDpSE80nU59UC1uyV8VCYdM qwHXgttLhhKr8jBPOkvUxl4aSXW7S0QWO8TrMpNdEOeB3ZB8AKsiIuhe1JrK0ro= =Xkue -----END PGP SIGNATURE----- Merge tag 'v4.0-rc3' into drm-next Linux 4.0-rc3 backmerge to fix two i915 conflicts, and get some mainline bug fixes needed for my testing box Conflicts: drivers/gpu/drm/i915/i915_drv.h drivers/gpu/drm/i915/intel_display.c	2015-03-09 19:58:30 +10:00
Dave Airlie	8dd0eb3566	Merge tag 'drm-intel-next-2015-02-27' of git://anongit.freedesktop.org/drm-intel into drm-next - Y tiling support for scanout from Tvrtko&Damien - Remove more UMS support - some small prep patches for OLR removal from John Harrison - first few patches for dynamic pagetable allocation from Ben Widawsky, rebased by tons of other people - DRRS support patches (Sonika&Vandana) - fbc patches from Paulo - make sure our vblank callbacks aren't called when the pipes are off - various patches all over * tag 'drm-intel-next-2015-02-27' of git://anongit.freedesktop.org/drm-intel: (61 commits) drm/i915: Update DRIVER_DATE to 20150227 drm/i915: Clarify obj->map_and_fenceable drm/i915/skl: Allow Y (and Yf) frame buffer creation drm/i915/skl: Update watermarks for Y tiling drm/i915/skl: Updated watermark programming drm/i915/skl: Adjust get_plane_config() to support Yb/Yf tiling drm/i915/skl: Teach pin_and_fence_fb_obj() about Y tiling constraints drm/i915/skl: Adjust intel_fb_align_height() for Yb/Yf tiling drm/i915/skl: Allow scanning out Y and Yf fbs drm/i915/skl: Add new displayable tiling formats drm/i915: Remove DRIVER_MODESET checks from modeset code drm/i915: Remove regfile code&data for UMS suspend/resume drm/i915: Remove DRIVER_MODESET checks from gem code drm/i915: Remove DRIVER_MODESET checks in the gpu reset code drm/i915: Remove DRIVER_MODESET checks from suspend/resume code drm/i915: Remove DRIVER_MODESET checks in load/unload/close code drm/i915: fix a printk format drm/i915: Add media rc6 residency file to sysfs drm/i915: Add missing description to parameter in alloc_pt_range drm/i915: Removed the read of RP_STATE_CAP from sysfs/debugfs functions ...	2015-03-09 19:41:15 +10:00
Dave Airlie	7547af9186	Merge tag 'drm-intel-next-2015-02-14' of git://anongit.freedesktop.org/drm-intel into drm-next - use the atomic helpers for plane_upate/disable hooks (Matt Roper) - refactor the initial plane config code (Damien) - ppgtt prep patches for dynamic pagetable alloc (Ben Widawsky, reworked and rebased by a lot of other people) - framebuffer modifier support from Tvrtko Ursulin, drm core code from Rob Clark - piles of workaround patches for skl from Damien and Nick Hoath - vGPU support for xengt on the client side (Yu Zhang) - and the usual smaller things all over * tag 'drm-intel-next-2015-02-14' of git://anongit.freedesktop.org/drm-intel: (88 commits) drm/i915: Update DRIVER_DATE to 20150214 drm/i915: Remove references to previously removed UMS config option drm/i915/skl: Use a LRI for WaDisableDgMirrorFixInHalfSliceChicken5 drm/i915/skl: Fix always true comparison in a revision id check drm/i915/skl: Implement WaEnableLbsSlaRetryTimerDecrement drm/i915/skl: Implement WaSetDisablePixMaskCammingAndRhwoInCommonSliceChicken drm/i915: Add process identifier to requests drm/i915/skl: Implement WaBarrierPerformanceFixDisable drm/i915/skl: Implement WaCcsTlbPrefetchDisable:skl drm/i915/skl: Implement WaDisableChickenBitTSGBarrierAckForFFSliceCS drm/i915/skl: Implement WaDisableHDCInvalidation drm/i915/skl: Implement WaDisableLSQCROPERFforOCL drm/i915/skl: Implement WaDisablePartialResolveInVc drm/i915/skl: Introduce a SKL specific init_workarounds() drm/i915/skl: Document that we implement WaRsClearFWBitsAtReset drm/i915/skl: Implement WaSetGAPSunitClckGateDisable drm/i915/skl: Make the init clock gating function skylake specific drm/i915/skl: Provide a gen9 specific init_render_ring() drm/i915/skl: Document the WM read latency W/A with its name drm/i915/skl: Also detect eDRAM on SKL ...	2015-03-05 09:41:09 +10:00
John Harrison	98e1bd4ae6	drm/i915: Cache ringbuf pointer in request structure In execlist mode, the ringbuf is a function of the ring and context whereas in legacy mode, it is derived from the ring alone. Thus the calculation required to determine the ringbuf pointer from the ring (and context) also needs to test execlist mode or not. This is messy. Further, the request structure holds a pointer to both the ring and the context for which it was created. Thus, given a request, it is possible to derive the ringbuf in either legacy or execlist mode. Hence it is necessary to pass just the request in to all the low level functions rather than some combination of request, ring, context and ringbuf. However, rather than recalculating it each time, it is much simpler to just cache the ringbuf pointer in the request structure itself. Caching the pointer means the calculation is done once at request creation time and all further code and simply read it directly from the request structure. OTC-Jira: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> [danvet: Drop contentless comment in lrc alloc request entirely. And spelling fix in the commit message.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-02-25 22:53:10 +01:00
John Harrison	5e4be7bda1	drm/i915: Add missing trace point to LRC execbuff code path There is a trace point in the legacy execbuffer execution path that is missing from the execlist path. Trace points are extremely useful for debugging and are used by various automated validation tests. Hence, this patch adds the missing trace point back in. OTC-Jira: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-02-25 22:48:21 +01:00
John Harrison	8e004efc16	drm/i915: Rename 'flags' to 'dispatch_flags' for better code reading There is a flags word that is passed through the execbuffer code path all the way from initial decoding of the user parameters down to the very final dispatch buffer call. It is simply called 'flags'. Unfortuantely, there are many other flags words floating around in the same blocks of code. Even more once the GPU scheduler arrives. This patch makes it more obvious exactly which flags word is which by renaming 'flags' to 'dispatch_flags'. Note that the bit definitions for this flags word already have an 'I915_DISPATCH_' prefix on them and so are not quite so ambiguous. OTC-Jira: VIZ-1587 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> [danvet: Resolve conflict with Chris' rework of the bb parsing.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-02-25 22:43:29 +01:00
Ben Widawsky	06fda602db	drm/i915: Create page table allocators As we move toward dynamic page table allocation, it becomes much easier to manage our data structures if break do things less coarsely by breaking up all of our actions into individual tasks. This makes the code easier to write, read, and verify. Aside from the dissection of the allocation functions, the patch statically allocates the page table structures without a page directory. This remains the same for all platforms, The patch itself should not have much functional difference. The primary noticeable difference is the fact that page tables are no longer allocated, but rather statically declared as part of the page directory. This has non-zero overhead, but things gain additional complexity as a result. This patch exists for a few reasons: 1. Splitting out the functions allows easily combining GEN6 and GEN8 code. Page tables have no difference based on GEN8. As we'll see in a future patch when we add the DMA mappings to the allocations, it requires only one small change to make work, and error handling should just fall into place. 2. Unless we always want to allocate all page tables under a given PDE, we'll have to eventually break this up into an array of pointers (or pointer to pointer). 3. Having the discrete functions is easier to review, and understand. All allocations and frees now take place in just a couple of locations. Reviewing, and catching leaks should be easy. 4. Less important: the GFP flags are confined to one location, which makes playing around with such things trivial. v2: Updated commit message to explain why this patch exists v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/ v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel) v5: Added additional safety checks in gen8 clear/free/unmap. v6: Use WARN_ON and return -EINVAL in alloc_pt_range (Mika). v7: Make err_out loop symmetrical to the way we allocate in alloc_pt_range. Also s/page_tables/page_table and correct commit message (Mika) Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+) Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-02-25 16:53:43 +01:00
Ben Widawsky	7324cc0491	drm/i915: Complete page table structures Move the remaining members over to the new page table structures. This can be squashed with the previous commit if desire. The reasoning is the same as that patch. I simply felt it is easier to review if split. v2: In lrc: s/ppgtt->pd_dma_addr[i]/ppgtt->pdp.page_directory[i].daddr/ v3: Rebase. v4: Rebased after s/page_tables/page_table/. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+) Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-02-25 16:53:07 +01:00
Nick Hoath	b3a38998f0	drm/i915: Fix a use after free, and unbalanced refcounting When converting from implicitly tracked execlist queue items to ref counted requests, not all frees of requests were replaced with unrefs, and extraneous refs/unrefs of contexts were added. Correct the unbalanced refcount & replace the frees. Remove a noisy warning when hitting the request creation path. drm_i915_gem_request and intel_context are both kref reference counted structures. Upon allocation, drm_i915_gem_request's ref count should be bumped using kref_init. When a context is assigned to the request, the context's reference count should be bumped using i915_gem_context_reference. i915_gem_request_reference will reduce the context reference count when the request is freed. Problem introduced in commit `6d3d8274bc` Author: Nick Hoath <nicholas.hoath@intel.com> AuthorDate: Thu Jan 15 13:10:39 2015 +0000 drm/i915: Subsume intel_ctx_submit_request in to drm_i915_gem_request v2: Added comments explaining how the ctx pointer and the request object should be ref-counted. Removed noisy warning. v3: Cleaned up the language used in the commit & the header description (Thanks David Gordon) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88652 Signed-off-by: Nick Hoath <nicholas.hoath@intel.com> Reviewed-by: Thomas Daniel <thomas.daniel@intel.com> Reviewed-by: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2015-02-24 15:18:37 +02:00
Thomas Daniel	3e5b6f05a2	drm/i915: Reset logical ring contexts' head and tail during GPU reset Work was getting left behind in LRC contexts during reset. This causes a hang if the GPU is reset when HEAD==TAIL because the context's ringbuffer head and tail don't get reset and retiring a request doesn't alter them, so the ring still appears full. Added a function intel_lr_context_reset() to reset head and tail on a LRC and its ringbuffer. Call intel_lr_context_reset() for each context in i915_gem_context_reset() when in execlists mode. Testcase: igt/pm_rps --run-subtest reset #bdw Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88096 Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> Reviewed-by: Dave Gordon <david.s.gordon@intel.com> [danvet: Flatten control flow in the lrc reset code a notch.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-02-24 00:19:37 +01:00

1 2 3

136 Commits