linux

Commit Graph

Author	SHA1	Message	Date
Daniel Vetter	b45305fce5	drm/i915: Implement workaround for broken CS tlb on i830/845 Now that Chris Wilson demonstrated that the key for stability on early gen 2 is to simple _never_ exchange the physical backing storage of batch buffers I've tried a stab at a kernel solution. Doesn't look too nefarious imho, now that I don't try to be too clever for my own good any more. v2: After discussing the various techniques, we've decided to always blit batches on the suspect devices, but allow userspace to opt out of the kernel workaround assume full responsibility for providing coherent batches. The principal reason is that avoiding the blit does improve performance in a few key microbenchmarks and also in cairo-trace replays. Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: - Drop the hunk which uses HAS_BROKEN_CS_TLB to implement the ring wrap w/a. Suggested by Chris Wilson. - Also add the ACTHD check from Chris Wilson for the error state dumping, so that we still catch batches when userspace opts out of the w/a.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-12-17 17:27:02 +01:00
Chris Wilson	9d7730914f	drm/i915: Preallocate next seqno before touching the ring Based on the work by Mika Kuoppala, we realised that we need to handle seqno wraparound prior to committing our changes to the ring. The most obvious point then is to grab the seqno inside intel_ring_begin(), and then to reuse that seqno for all ring operations until the next request. As intel_ring_begin() can fail, the callers must already be prepared to handle such failure and so we can safely add further checks. This patch looks like it should be split up into the interface changes and the tweaks to move seqno wrapping from the execbuffer into the core seqno increment. However, I found no easy way to break it into incremental steps without introducing further broken behaviour. v2: Mika found a silly mistake and a subtle error in the existing code; inside i915_gem_retire_requests() we were resetting the sync_seqno of the target ring based on the seqno from this ring - which are only related by the order of their allocation, not retirement. Hence we were applying the optimisation that the rings were synchronised too early, fortunately the only real casualty there is the handling of seqno wrapping. v3: Do not forget to reset the sync_seqno upon module reinitialisation, ala resume. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=863861 Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> [v2] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-11-29 11:43:52 +01:00
Chris Wilson	be7cb6347e	drm/i915: Remove bogus test for a present execbuffer The intention of checking obj->gtt_offset!=0 is to verify that the target object was listed in the execbuffer and had been bound into the GTT. This is guarranteed by the earlier rearrangement to split the execbuffer operation into reserve and relocation phases and then verified by the check that the target handle had been processed during the reservation phase. However, the actual checking of obj->gtt_offset==0 is bogus as we can indeed reference an object at offset 0. For instance, the framebuffer installed by the BIOS often resides at offset 0 - causing EINVAL as we legimately try to render using the stolen fb. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-11-21 17:45:03 +01:00
Ben Widawsky	e76e9aebcd	drm/i915: Stop using AGP layer for GEN6+ As a quick hack we make the old intel_gtt structure mutable so we can fool a bunch of the existing code which depends on elements in that data structure. We can/should try to remove this in a subsequent patch. This should preserve the old gtt init behavior which upon writing these patches seems incorrect. The next patch will fix these things. The one exception is VLV which doesn't have the preserved flush control write behavior. Since we want to do that for all GEN6+ stuff, we'll handle that in a later patch. Mainstream VLV support doesn't actually exist yet anyway. v2: Update the comment to remove the "voodoo" Check that the last pte written matches what we readback v3: actually kill cache_level_to_agp_type since most of the flags will disappear in an upcoming patch v4: v3 was actually not what we wanted (Daniel) Make the ggtt bind assertions better and stricter (Chris) Fix some uncaught errors at gtt init (Chris) Some other random stuff that Chris wanted v5: check for i==0 in gen6_ggtt_bind_object to shut up gcc (Ben) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by [v4]: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Make the cache_level -> agp_flags conversion for pre-gen6 a tad more robust by mapping everything != CACHE_NONE to the cached agp flag - we have a 1:1 uncached mapping, but different modes of cacheable (at least on later generations). Suggested by Chris Wilson.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-11-11 23:51:42 +01:00
Daniel Vetter	c2fb791692	Linux 3.7-rc2 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJQgvdwAAoJEHm+PkMAQRiG+3AH/i2XsqqN3VctL0nnbWfvds+Q aKulfIdJTjKiVAsawPUtRqReZ8ijiebrgA/53lZLlrFOoPPQ5+LHmnSyQF6gErOY NuAE1lijXDRM1pwBlhvOBbAj26wUobGjqONFJ9OkKr758Ue8ds/Q7UdxyEgmYgmg tvVMzfRcICzryUV3PcqL+3cNPpCUdT6wGGRJ9DCv/jvGiWKExWhOle5oltrmxk+D NsqRcws5pEubfHE4J8BvNWr8lE1kHfYVhrJETiLJUiN2XAJcbI4Jy7rU/3EGteNS 0HMZdaPPjV874lohdM70X2225SbYrCVkAYB5hnZCTeC3tYyCawBBPMQoyAiOcmU= =+861 -----END PGP SIGNATURE----- Merge tag 'v3.7-rc2' into drm-intel-next-queued Linux 3.7-rc2 Backmerge to solve two ugly conflicts: - uapi. We've already added new ioctl definitions for -next. Do I need to say more? - wc support gtt ptes. We've had to revert this for snb+ for 3.7 and also fix a few other things in the code. Now we know how to make it work on snb+, but to avoid losing the other fixes do the backmerge first before re-enabling wc gtt ptes on snb+. And a few other minor things, among them git getting confused in intel_dp.c and seemingly causing a conflict out of nothing ... Conflicts: drivers/gpu/drm/i915/i915_reg.h drivers/gpu/drm/i915/intel_display.c drivers/gpu/drm/i915/intel_dp.c drivers/gpu/drm/i915/intel_modes.c include/drm/i915_drm.h Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-10-22 14:34:51 +02:00
Chris Wilson	d7d4eeddb8	drm/i915: Allow DRM_ROOT_ONLY\|DRM_MASTER to submit privileged batchbuffers With the introduction of per-process GTT space, the hardware designers thought it wise to also limit the ability to write to MMIO space to only a "secure" batch buffer. The ability to rewrite registers is the only way to program the hardware to perform certain operations like scanline waits (required for tear-free windowed updates). So we either have a choice of adding an interface to perform those synchronized updates inside the kernel, or we permit certain processes the ability to write to the "safe" registers from within its command stream. This patch exposes the ability to submit a SECURE batch buffer to DRM_ROOT_ONLY\|DRM_MASTER processes. v2: Haswell split up bit8 into a ppgtt bit (still bit8) and a security bit (bit 13, accidentally not set). Also add a comment explaining why secure batches need a global gtt binding. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1) [danvet: added hsw fixup.] Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-10-17 21:06:59 +02:00
Linus Torvalds	612a9aab56	Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux Pull drm merge (part 1) from Dave Airlie: "So first of all my tree and uapi stuff has a conflict mess, its my fault as the nouveau stuff didn't hit -next as were trying to rebase regressions out of it before we merged. Highlights: - SH mobile modesetting driver and associated helpers - some DRM core documentation - i915 modesetting rework, haswell hdmi, haswell and vlv fixes, write combined pte writing, ilk rc6 support, - nouveau: major driver rework into a hw core driver, makes features like SLI a lot saner to implement, - psb: add eDP/DP support for Cedarview - radeon: 2 layer page tables, async VM pte updates, better PLL selection for > 2 screens, better ACPI interactions The rest is general grab bag of fixes. So why part 1? well I have the exynos pull req which came in a bit late but was waiting for me to do something they shouldn't have and it looks fairly safe, and David Howells has some more header cleanups he'd like me to pull, that seem like a good idea, but I'd like to get this merge out of the way so -next dosen't get blocked." Tons of conflicts mostly due to silly include line changes, but mostly mindless. A few other small semantic conflicts too, noted from Dave's pre-merged branch. * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (447 commits) drm/nv98/crypt: fix fuc build with latest envyas drm/nouveau/devinit: fixup various issues with subdev ctor/init ordering drm/nv41/vm: fix and enable use of "real" pciegart drm/nv44/vm: fix and enable use of "real" pciegart drm/nv04/dmaobj: fixup vm target handling in preparation for nv4x pcie drm/nouveau: store supported dma mask in vmmgr drm/nvc0/ibus: initial implementation of subdev drm/nouveau/therm: add support for fan-control modes drm/nouveau/hwmon: rename pwm0* to pmw1* to follow hwmon's rules drm/nouveau/therm: calculate the pwm divisor on nv50+ drm/nouveau/fan: rewrite the fan tachometer driver to get more precision, faster drm/nouveau/therm: move thermal-related functions to the therm subdev drm/nouveau/bios: parse the pwm divisor from the perf table drm/nouveau/therm: use the EXTDEV table to detect i2c monitoring devices drm/nouveau/therm: rework thermal table parsing drm/nouveau/gpio: expose the PWM/TOGGLE parameter found in the gpio vbios table drm/nouveau: fix pm initialization order drm/nouveau/bios: check that fixed tvdac gpio data is valid before using it drm/nouveau: log channel debug/error messages from client object rather than drm client drm/nouveau: have drm debugging macros build on top of core macros ...	2012-10-03 23:29:23 -07:00
David Howells	760285e7e7	UAPI: (Scripted) Convert #include "..." to #include <path/...> in drivers/gpu/ Convert #include "..." to #include <path/...> in drivers/gpu/. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Dave Airlie <airlied@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>	2012-10-02 18:01:07 +01:00
David Howells	4126d5d61f	UAPI: (Scripted) Remove redundant DRM UAPI header #inclusions from drivers/gpu/. Remove redundant DRM UAPI header #inclusions from drivers/gpu/. Remove redundant #inclusions of core DRM UAPI headers (drm.h, drm_mode.h and drm_sarea.h). They are now #included via drmP.h and drm_crtc.h via a preceding patch. Without this patch and the patch to make include the UAPI headers from the core headers, after the UAPI split, the DRM C sources cannot find these UAPI headers because the DRM code relies on specific -I flags to make #include "..." work on headers in include/drm/ - but that does not work after the UAPI split without adding more -I flags. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Dave Airlie <airlied@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>	2012-10-02 18:01:05 +01:00
Chris Wilson	41783eea1a	drm/i915: Assert that the exec object lookup table is a power-of-two As we make the simplification of using a power-of-two size for the execbuffer handle-to-object TLB, we should validate that this is actually true and so clarify that premise. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-09-20 14:23:11 +02:00
Chris Wilson	ba7a64587b	drm/i915: Drop the misleading cast to the wrong user pointer type The exec_list is of type drm_i915_gem_exec_object2 and so casting it to a drm_i915_gem_relocation_entry is very confusing! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-09-20 14:23:05 +02:00
Chris Wilson	9da3da660d	drm/i915: Replace the array of pages with a scatterlist Rather than have multiple data structures for describing our page layout in conjunction with the array of pages, we can migrate all users over to a scatterlist. One major advantage, other than unifying the page tracking structures, this offers is that we replace the vmalloc'ed array (which can be up to a megabyte in size) with a chain of individual pages which helps reduce memory pressure. The disadvantage is that we then do not have a simple array to iterate, or to access randomly. The common case for this is in the relocation processing, which will typically fit within a single scatterlist page and so be almost the same cost as the simple array. For iterating over the array, the extra function call could be optimised away, but in reality is an insignificant cost of either binding the pages, or performing the pwrite/pread. v2: Fix drm_clflush_sg() to not invoke wbinvd as well! And fix the trivial compile error from rebasing. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-09-20 14:22:57 +02:00
Chris Wilson	7788a76520	drm/i915: Avoid unbinding due to an interrupted pin_and_fence during execbuffer If we need to stall in order to complete the pin_and_fence operation during execbuffer reservation, there is a high likelihood that the operation will be interrupted by a signal (thanks X!). In order to simplify the cleanup along that error path, the object was unconditionally unbound and the error propagated. However, being interrupted here is far more common than I would like and so we can strive to avoid the extra work by eliminating the forced unbind. v2: In discussion over the indecent colour of the new functions and unwind path, we realised that we can use the new unreserve function to clean up the code even further. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-08-24 21:02:51 +02:00
Chris Wilson	504c7267a1	drm/i915: Use cpu relocations if the object is in the GTT but not mappable This prevents the case of unbinding the object in order to process the relocations through the GTT and then rebinding it only to then proceed to use cpu relocations as the object is now in the CPU write domain. By choosing to use cpu relocations up front, we can therefore avoid the rebind penalty. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-08-24 02:29:16 +02:00
Chris Wilson	86a1ee26bb	drm/i915: Only pwrite through the GTT if there is space in the aperture Avoid stalling and waiting for the GPU by checking to see if there is sufficient inactive space in the aperture for us to bind the buffer prior to writing through the GTT. If there is inadequate space we will have to stall waiting for the GPU, and incur overheads moving objects about. Instead, only incur the clflush overhead on the target object by writing through shmem. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-08-24 02:03:33 +02:00
Chris Wilson	6c085a728c	drm/i915: Track unbound pages When dealing with a working set larger than the GATT, or even the mappable aperture when touching through the GTT, we end up with evicting objects only to rebind them at a new offset again later. Moving an object into and out of the GTT requires clflushing the pages, thus causing a double-clflush penalty for rebinding. To avoid having to clflush on rebinding, we can track the pages as they are evicted from the GTT and only relinquish those pages on memory pressure. As usual, if it were not for the handling of out-of-memory condition and having to manually shrink our own bo caches, it would be a net reduction of code. Alas. Note: The patch also contains a few changes to the last-hope evict_everything logic in i916_gem_execbuffer.c - we no longer try to only evict the purgeable stuff in a first try (since that's superflous and only helps in OOM corner-cases, not fragmented-gtt trashing situations). Also, the extraction of the get_pages retry loop from bind_to_gtt (and other callsites) to get_pages should imo have been a separate patch. v2: Ditch the newly added put_pages (for unbound objects only) in i915_gem_reset. A quick irc discussion hasn't revealed any important reason for this, so if we need this, I'd like to have a git blame'able explanation for it. v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Split out code movements and rant a bit in the commit message with a few Notes. Done v2] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-08-21 14:34:11 +02:00
Daniel Vetter	a22ddff8be	Linux 3.6-rc2 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQEcBAABAgAGBQJQLWtvAAoJEHm+PkMAQRiG/DYH+wd0FqfEuYkYk4KPyAPuhKpX zX7HYfLvyJE/ZYIdrhjq1E6Xm2KNr7gtX7/Rdzi2W38M9sjbYzwG1UGIw51qnxWy yZJH9BGkfyQgQPeuDGohfB6DkDy2JWr2eqMDvakjOwgBsIzji0PQD/f3UvndhtUa c+tTj/kjavHE1Yr2Wy6OnRZz3Uc0hIMn/Q0JqtbCs3LUgEV1KA4OEAe56XNz4Ku4 WE+FFaGFPvtriQsQON+ohPS5IC8jzQGK/0vbrJ4lWjFnZy4gvZXnborTOwD0WSQG fbsNuxp1AaM2/pqfMwXm1w0ADvwOITHNiwwXf9id6DoK81QwTFpUdvKpn6yB6gQ= =rurr -----END PGP SIGNATURE----- Merge tag 'v3.6-rc2' into drm-intel-next Backmerge Linux 3.6-rc2 to resolve a few funny conflicts before we put even more madness on top: - drivers/gpu/drm/i915/i915_irq.c: Just a spurious WARN removed in -fixes, that has been changed in a variable-rename in -next, too. - drivers/gpu/drm/i915/intel_ringbuffer.c: -next remove scratch_addr (since all their users have been extracted in another fucntion), -fixes added another user for a hw workaroudn. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-08-17 09:01:08 +02:00
Eric Anholt	e844b990b1	drm/i915: Don't forget to apply SNB PIPE_CONTROL GTT workaround. If a buffer that was the target of a PIPE_CONTROL from userland was a reused one that hadn't been evicted which had not previously had this workaround applied, then the early return for a correct presumed_offset in this function meant we would not bind it into the GTT and the write would land somewhere else. Fixes reproducible failures with GL_EXT_timer_query usage in apitrace, and I also expect it to fix the intermittent OQ issues on snb that danvet's been working on. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48019 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=52932 Signed-off-by: Eric Anholt <eric@anholt.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Carl Worth <cworth@cworth.org> Tested-by: Carl Worth <cworth@cworth.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-08-05 21:45:01 +02:00
Chris Wilson	f047e395dd	drm/i915: Avoid concurrent access when marking the device as idle/busy As suggested by Daniel, rip out the independent timers for device and crtc busyness and integrate the manual powermanagement of the display engine into the GEM core and its request tracking. The benefits are that the code is a lot smaller, fewer moving parts and should fit more neatly into the overall activity tracking of the driver. v2: Complete overhaul and removal of the racy timers and workers. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:56 +02:00
Chris Wilson	a7b9761d0a	drm/i915: Split i915_gem_flush_ring() into seperate invalidate/flush funcs By moving the function to intel_ringbuffer and currying the appropriate parameter, hopefully we make the callsites easier to read and understand. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:55 +02:00
Chris Wilson	016fd0c1ae	drm/i915: Clear the pending_gpu_fenced_access flag at the start of execbuffer Otherwise once we use the buffer with a BLT command on gen2/3, we will always regard future command submissions as continuing the fenced access. However, now that we flush/invalidate between every batch we can drop this pessimism. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:55 +02:00
Daniel Vetter	6ac42f4148	drm/i915: Replace the complex flushing logic with simple invalidate/flush all Now that we unconditionally flush and invalidate between every batch buffer, we no longer need the complex logic to decide which domains require flushing. Remove it and rejoice. v2 (danvet): Keep around the flip waiting logic. It's gross and broken, I know, but we can't just kill that thing ... even if we just keep it around as a reminder that things are broken. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:54 +02:00
Chris Wilson	69c2fc8913	drm/i915: Remove the per-ring write list This is now handled by a global flag to ensure we emit a flush before the next serialisation point (if we failed to queue one previously). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:53 +02:00
Chris Wilson	0201f1ecf4	drm/i915: Replace the pending_gpu_write flag with an explicit seqno As we always flush the GPU cache prior to emitting the breadcrumb, we no longer have to worry about the deferred flush causing the pending_gpu_write to be delayed. So we can instead utilize the known last_write_seqno to hopefully minimise the wait times. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:52 +02:00
Chris Wilson	3bb73aba1e	drm/i915: Allow late allocation of request for i915_add_request() Request preallocation was added to i915_add_request() in order to support the overlay. However, not all users care and can quite happily ignore the failure to allocate the request as they will simply repeat the request in the future. By pushing the allocation down into i915_add_request(), we can then remove some rather ugly error handling in the callers. v2: Nullify request->file_priv otherwise we chase a garbage pointer when retiring requests. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:51 +02:00
Eric Anholt	0da5cec1de	drm/i915: Set the context before setting up regs for the context. Fixes failures in transform feedback on gen7 because our SOL_RESET flag was setting the transform feedback offsets in the old context (occasionally happened to be ours) instead of the new context. Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 10:39:59 +02:00
Chris Wilson	09cf7c9a12	drm/i915: Insert a flush between batches if the breadcrumb was dropped If we drop the breadcrumb request after a batch due to a signal for example we aim to fix it up at the next opportunity. In this case we emit a second batchbuffer with no waits upon the first and so no opportunity to insert the missing request, so we need to emit the missing flush for coherency. (Note that that invalidating the render cache is the same as flushing it, so there should have been no observable corruption.) Note that beside simply adding the missing flush, avoiding potential render corruption, this will also fix at least parts of the problem introduced by some funny interaction of these two commits: commit `de2b998552` Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Wed Jul 4 22:52:50 2012 +0200 drm/i915: don't return a spurious -EIO from intel_ring_begin which allowed intel_ring_begin to return -ERESTARTSYS and commit `cc889e0f6c` Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Wed Jun 13 20:45:19 2012 +0200 drm/i915: disable flushing_list/gpu_write_list which essentially disabled the flushing list. The issue happens when we submit a batch & emit it, but get interrupted (thanks to the first patch) while trying to emit the flush. On the next batch we still assume that the full gpu domain handling is in effect and hence compute the invalidate&flushing domains. But thanks to the 2nd patch we totally ignore these and only invalidate all gpu domains, presuming that any required flushes have been issued already. Which is wrong and eventually results in us updating the new write_domain values with the computed pending_write_domain values, which leaves an object with write_domain == 0 on the gpu_write_list. As soon as we try to unbind that object, things blow up. Fix this by emitting the missing flush according to the new ring->gpu_caches_dirty flag. Note that this does _not_ fix all the current cases where we end up with an object on the flushing_list that can't be flushed. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=52040 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Add bug explanation to commit message.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-20 12:21:40 +02:00
Daniel Vetter	cc889e0f6c	drm/i915: disable flushing_list/gpu_write_list This is just the minimal patch to disable all this code so that we can do decent amounts of QA before we rip it all out. The complicating thing is that we need to flush the gpu caches after the batchbuffer is emitted. Which is past the point of no return where execbuffer can't fail any more (otherwise we risk submitting the same batch multiple times). Hence we need to add a flag to track whether any caches associated with that ring are dirty. And emit the flush in add_request if that's the case. Note that this has a quite a few behaviour changes: - Caches get flushed/invalidated unconditionally. - Invalidation now happens after potential inter-ring sync. I've bantered around a bit with Chris on irc whether this fixes anything, and it might or might not. The only thing clear is that with these changes it's much easier to reason about correctness. Also rip out a lone get_next_request_seqno in the execbuffer retire_commands function. I've dug around and I couldn't figure out why that is still there, with the outstanding lazy request stuff it shouldn't be necessary. v2: Chris Wilson complained that I also invalidate the read caches when flushing after a batchbuffer. Now optimized. v3: Added some comments to explain the new flushing behaviour. Cc: Eric Anholt <eric@anholt.net> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-06-20 13:54:28 +02:00
Ben Widawsky	6e0a69dbc8	drm/i915/context: switch contexts with execbuf2 Use the rsvd1 field in execbuf2 to specify the context ID associated with the workload. This will allow the driver to do the proper context switch when/if needed. v2: Add checks for context switches on rings not supporting contexts. Before the code would silently ignore such requests. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>	2012-06-14 17:36:21 +02:00
Chris Wilson	a15817cf16	drm/i915: Check whether the ring is initialised prior to dispatch Rather than use the magic feature tests HAS_BLT/HAS_BSD just check whether the ring we are about to dispatch the execbuffer on is initialised. v2: Use intel_ring_initialized() Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-19 22:39:53 +02:00
Chris Wilson	acb87dfb4b	drm/i915: Limit calling mark-busy only for potential scanouts The principle of intel_mark_busy() is that we want to spot the transition of when the display engine is being used in order to bump powersaving modes and increase display clocks. As such it is only important when the display is changing, i.e. when rendering to the scanout or other sprite/plane, and these are characterised by being pinned. v2: Mark the whole device as busy on execbuffer and pageflips as well and rebase against dinq for the minor bug fix to be immediately applicable. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: fix compile fail.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-08 15:10:34 +02:00
Daniel Vetter	5e13a0c5ec	Merge remote-tracking branch 'airlied/drm-core-next' into drm-intel-next-queued Backmerge of drm-next to resolve a few ugly conflicts and to get a few fixes from 3.4-rc6 (which drm-next has already merged). Note that this merge also restricts the stencil cache lra evict policy workaround to snb (as it should) - I had to frob the code anyway because the CM0_MASK_SHIFT define died in the masked bit cleanups. We need the backmerge to get Paulo Zanoni's infoframe regression fix for gm45 - further bugfixes from him touch the same area and would needlessly conflict. Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-08 13:39:59 +02:00
Daniel Vetter	dc257cf154	Linux 3.4-rc6 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQEcBAABAgAGBQJPpvY9AAoJEHm+PkMAQRiGpEoIAJgbu+Y8gITnBK/wh9O6zy3S 5jie5KK4YWdbJsvO58WbNr3CyVIwGIqQ2dUZLiU59aBVLarlGw8xor0MmW+cZwhp 6fBHaf0qDYAV0MZjD+mnnExOiCRyISa2lPmsfu9dAWywh5KGe6/oAP6/qcXIyok3 KZyl3qQf4ENpaZPHwZPXCEkUvtuyHgNiszN+QXEadA3s19Ot4VGe9A3VGw+GNrSm JqFIq3acQAbKa5BYaqf7TQC02v2FI7//eqt6QHxTqbE6a7LGbTvLfX3HlJ2mnfqa 1R6QHhM4y4OZDHbaMT2raHZ8WuLXzhehJzhP8Co7AHFOKwVKOb5XbcUr2RrukMU= =HkMd -----END PGP SIGNATURE----- Merge tag 'v3.4-rc6' into drm-intel-next Conflicts: drivers/gpu/drm/i915/intel_display.c Ok, this is a fun story of git totally messing things up. There /shouldn't/ be any conflict in here, because the fixes in -rc6 do only touch functions that have not been changed in -next. The offending commits in drm-next are 14415745b2..1fa611065 which simply move a few functions from intel_display.c to intel_pm.c. The problem seems to be that git diff gets completely confused: $ git diff 14415745b2..1fa611065 is a nice mess in intel_display.c, and the diff leaks into totally unrelated functions, whereas $git diff --minimal 14415745b2..1fa611065 is exactly what we want. Unfortunately there seems to be no way to teach similar smarts to the merge diff and conflict generation code, because with the minimal diff there really shouldn't be any conflicts. For added hilarity, every time something in that area changes the + and - lines in the diff move around like crazy, again resulting in new conflicts. So I fear this mess will stay with us for a little longer (and might result in another backmerge down the road). Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-07 14:02:14 +02:00
Daniel Vetter	6ebebc9206	drm/i915: disallow clip rects on gen5+ Unfortunately there has been dri1 userspace that used gem to manage the gtt and hence also needed cliprects in the execbuf ioctl. So we can't ever remove that code without breaking the ioctl abi. But at least we can disable it on gen5+, because these horrible versions of mesa have not supported these chips. Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:29 +02:00
Ben Widawsky	b2da9fe5d5	drm/i915: remove do_retire from i915_wait_request This originates from a hack by me to quickly fix a bug in an earlier patch where we needed control over whether or not waiting on a seqno actually did any retire list processing. Since the two operations aren't clearly related, we should pull the parameter out of the wait function, and make the caller responsible for retiring if the action is desired. The only function call site which did not get an explicit retire_request call (on purpose) is i915_gem_inactive_shrink(). That code was already calling retire_request a second time. v2: don't modify any behavior excepit i915_gem_inactive_shrink(Daniel) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:20 +02:00
Xi Wang	44afb3a043	drm/i915: fix integer overflow in i915_gem_do_execbuffer() On 32-bit systems, a large args->num_cliprects from userspace via ioctl may overflow the allocation size, leading to out-of-bounds access. This vulnerability was introduced in commit `432e58ed` ("drm/i915: Avoid allocation for execbuffer object list"). Signed-off-by: Xi Wang <xi.wang@gmail.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-23 22:32:15 +02:00
Xi Wang	ed8cd3b2cd	drm/i915: fix integer overflow in i915_gem_execbuffer2() On 32-bit systems, a large args->buffer_count from userspace via ioctl may overflow the allocation size, leading to out-of-bounds access. This vulnerability was introduced in commit `8408c282` ("drm/i915: First try a normal large kmalloc for the temporary exec buffers"). Signed-off-by: Xi Wang <xi.wang@gmail.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-23 22:32:02 +02:00
Chris Wilson	06d9813157	drm/i915: Remove the pipelined parameter from get_fence() We never succeeded in getting pipelined fencing to work (unresolved spurious GPU hangs), so begin the process of dismantling and removal the broken code. Step 1 is the removal of the pipeline parameter to get_fence(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-18 13:15:43 +02:00
Chris Wilson	7b09638f45	drm/i915: Always flush tiling changes before accessing through the GTT As we defer updating the fence register from set-tiling to the point of use, we need to declare every access through the GTT as either fenced or unfenced. This patches fixes an old bug in the execbuffer relocation processing which could conceivably be hit by a pathological userspace. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-18 10:48:38 +02:00
Ben Widawsky	2911a35b2e	drm/i915: use semaphores for the display plane In theory this will have performance and power improvements. Performance because we don't need to stall when the scanout BO is busy, and power because we don't have to stall when the BO is busy (and the ring can even go to sleep if the HW supports it). v2: squash 2 patches into 1 (me) un-inline the enable_semaphores function (Daniel) remove comment about SNB hangs from i915_gem_object_sync (Chris) rename intel_enable_semaphores to i915_semaphore_is_enabled (me) removed page flip comment; "no why" (Chris) To address other comments from Daniel (irc): update the comment to say 'vt-d is crap, don't enable semaphores' - I think you misinterpreted Chris' comment, it already exists. checking out whether we can pageflip on the render ring on ivb (didn't work on early silicon) - We don't want to enable workarounds for early silicon unless we have to. - I can't find any references in the docs about this. optionally use it if the fb is already busy on the render ring - This should be how the code already worked, unless I am misunderstanding your meaning. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-12 21:14:05 +02:00
Chris Wilson	9a5a53b392	drm/i915: Reorganise rules for get_fence/put_fence By simplifying the rules to calling get_fence when writing to the through the GTT in a tiled manner, and calling put_fence before writing to the object through the GTT in a linear manner, the code becomes clearer and there is less chance of making a mistake. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> [danvet: fixed up conflict with ppgtt code and spelling in a new comment.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-12 21:14:04 +02:00
Dave Airlie	effbc4fd8e	Merge branch 'drm-intel-next' of git://people.freedesktop.org/~danvet/drm-intel into drm-core-next Daniel Vetter wrote First pull request for 3.5-next, slightly large than usual because new things kept coming in since the last pull for 3.4. Highlights: - first batch of hw enablement for vlv (Jesse et al) and hsw (Eugeni). pci ids are not yet added, and there's still quite a few patches to merge (mostly modesetting). To make QA easier I've decided to merge this stuff in pieces. - loads of cleanups and prep patches spurred by the above. Especially vlv is a real frankenstein chip, but also hsw is stretching our driver's code design. Expect more to come in this area for 3.5. - more gmbus fixes, cleanups and improvements by Daniel Kurtz. Again, there are more patches needed (and some already queued up), but I wanted to split this a bit for better testing. - pwrite/pread rework and retuning. This series has been in the works for a few months already and a lot of i-g-t tests have been created for it. Now it's finally ready to be merged. Note that one patch in this series touches include/pagemap.h, that patch is acked-by akpm. - reduce mappable pressure and relocation throughput improvements from Chris. - mmap offset exhaustion mitigation by Chris Wilson. - a start at figuring out which codepaths in our messy dri1/ums+gem/kms driver we actually need to support by bailing out of unsupported case. The driver now refuses to load without kms on gen6+ and disallows a few ioctls that userspace never used in certain cases. More of this will definitely come. - More decoupling of global gtt and ppgtt. - Improved dual-link lvds detection by Takashi Iwai. - Shut up the compiler + plus fix the fallout (Ben) - Inverted panel brightness handling (mostly Acer manages to break things in this way). - Small fixlets and adjustements and some minor things to help debugging. Regression-wise QA reported quite a few issues on ivb, but all of them turned out to be hw stability issues which are already fixed in drm-intel-fixes (QA runs the nightly regression tests on -next alone, without -fixes automatically merged in). There's still one issue open on snb, it looks like occlusion query writes are not quite as cache coherent as we've expected. With some of the pwrite adjustements we can now reliably hit this. Kernel workaround for it is in the works." * 'drm-intel-next' of git://people.freedesktop.org/~danvet/drm-intel: (101 commits) drm/i915: VCS is not the last ring drm/i915: Add a dual link lvds quirk for MacBook Pro 8,2 drm/i915: make quirks more verbose drm/i915: dump the DMA fetch addr register on pre-gen6 drm/i915/sdvo: Include YRPB as an additional TV output type drm/i915: disallow gem init ioctl on ilk drm/i915: refuse to load on gen6+ without kms drm/i915: extract gt interrupt handler drm/i915: use render gen to switch ring irq functions drm/i915: rip out old HWSTAM missed irq WA for vlv drm/i915: open code gen6+ ring irqs drm/i915: ring irq cleanups drm/i915: add SFUSE_STRAP registers for digital port detection drm/i915: add WM_LINETIME registers drm/i915: add WRPLL clocks drm/i915: add LCPLL control registers drm/i915: add SSC offsets for SBI access drm/i915: add port clock selection support for HSW drm/i915: add S PLL control drm/i915: add PIXCLK_GATE register ... Conflicts: drivers/char/agp/intel-agp.h drivers/char/agp/intel-gtt.c drivers/gpu/drm/i915/i915_debugfs.c	2012-04-12 10:27:01 +01:00
Chris Wilson	7dd4906586	drm/i915: Mark untiled BLT commands as fenced on gen2/3 The BLT commands on gen2/3 utilize the fence registers and so we cannot modify any fences for the object whilst those commands are in flight. Currently we marked tiled commands as occupying a fence, but forgot to restrict the untiled commands from preventing a fence being assigned before they were completed. One side-effect is that we ten have to double check that a fence was allocated for a fenced buffer during move-to-active. Reported-by: Jiri Slaby <jirislaby@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43427 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47990 Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Testcase: i-g-t/tests/gem_tiled_after_untiled_blt Tested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-01 12:26:05 +02:00
Daniel Vetter	f56f821feb	mm: extend prefault helpers to fault in more than PAGE_SIZE drm/i915 wants to read/write more than one page in its fastpath and hence needs to prefault more than PAGE_SIZE bytes. Add new functions in filemap.h to make that possible. Also kill a copy&pasted spurious space in both functions while at it. v2: As suggested by Andrew Morton, add a multipage parameter to both functions to avoid the additional branch for the pagemap.c hotpath. My gcc 4.6 here seems to dtrt and indeed reap these branches where not needed. v3: Becaus I couldn't find a way around adding a uaddr += PAGE_SIZE to the filemap.c hotpaths (that the compiler couldn't remove again), let's go with separate new functions for the multipage use-case. v4: Adjust comment to CodingStlye and fix spelling. Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:36:30 +02:00
Chris Wilson	dabdfe021a	drm/i915: Avoid using mappable space for relocation processing through the CPU We try to avoid writing the relocations through the uncached GTT, if the buffer is currently in the CPU write domain and so will be flushed out to main memory afterwards anyway. Also on SandyBridge we can safely write to the pages in cacheable memory, so long as the buffer is LLC mapped. In either of these cases, we therefore do not need to force the reallocation of the buffer into the mappable region of the GTT, reducing the aperture pressure. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:16:17 +02:00
Chris Wilson	1d83f4426f	drm/i915: Batch copy_from_user for relocation processing Originally the code tried to allocate a large enough array to perform the copy using vmalloc, performance wasn't great and throughput was improved by processing each individual relocation entry separately. This too is not as efficient as one would desire. A compromise would be to allocate a single page, or to allocate a few entries on the stack, and process the copy in batches. The latter gives simpler code and more consistent performance due to a lack of heuristic. x11perf -copywinwin10: n450/pnv i3-330m i5-2520m (cpu) before: 249000 785000 1280000 (80%) page: 264000 896000 1280000 (65%) on-stack: 264000 902000 1280000 (67%) v2: Use 512-bytes of stack for batching rather than allocate a page. v3: Tidy the code slightly with more descriptive variable names Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-26 09:59:05 +02:00
Daniel Vetter	149c84077f	drm/i915: implement SNB workaround for lazy global gtt PIPE_CONTROL on snb needs global gtt mappings in place to workaround a hw gotcha. No other commands need such a workaround. Luckily we can detect a PIPE_CONTROL commands easily because they have a write_domain = I915_GEM_DOMAIN_INSTRUCTION (and nothing else has that). v2: Binding the target of such a reloc into the global gtt actually works instead of binding the source, which is rather pointless ... v3: Kill a superflous has_global_gtt_mapping assignement noticed by Chris Wilson. Reviewed-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-20 21:53:55 +01:00
Daniel Vetter	9edd576d89	Merge remote-tracking branch 'airlied/drm-fixes' into drm-intel-next-queued Back-merge from drm-fixes into drm-intel-next to sort out two things: - interlaced support: -fixes contains a bugfix to correctly clear interlaced configuration bits in case the bios sets up an interlaced mode and we want to set up the progressive mode (current kernels don't support interlaced). The actual feature work to support interlaced depends upon (and conflicts with) this bugfix. - forcewake voodoo to workaround missed IRQ issues: -fixes only enabled this for ivybridge, but some recent bug reports indicate that we need this on Sandybridge, too. But in a slightly different flavour and with other fixes and reworks on top. Additionally there are some forcewake cleanup patches heading to -next that would conflict with currrent -fixes. Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-02-10 17:14:49 +01:00
Daniel Vetter	7bddb01fb9	drm/i915: ppgtt binding/unbinding support This adds support to bind/unbind objects and wires it up. Objects are only put into the ppgtt when necessary, i.e. at execbuf time. Objects are still unconditionally put into the global gtt. v2: Kill the quick hack and explicitly pass cache_level to ppgtt_bind like for the global gtt function. Noticed by Chris Wilson. Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-02-09 21:25:23 +01:00
Daniel Vetter	ff240199b6	drm/i915: s/DRM_ERROR/DRM_DEBUG in i915_gem_execbuffer.c These are all user-trigerable, so tune down their loudness a notch. For some of these we have i-g-t tests (because they prevent newly-discovered bugs), without this patches running the test suite leaves behind a dirty dmesg. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-02-09 11:10:32 +01:00

1 2

94 Commits