linux

Commit Graph

Author	SHA1	Message	Date
Chris Wilson	f047e395dd	drm/i915: Avoid concurrent access when marking the device as idle/busy As suggested by Daniel, rip out the independent timers for device and crtc busyness and integrate the manual powermanagement of the display engine into the GEM core and its request tracking. The benefits are that the code is a lot smaller, fewer moving parts and should fit more neatly into the overall activity tracking of the driver. v2: Complete overhaul and removal of the racy timers and workers. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:56 +02:00
Chris Wilson	a7b9761d0a	drm/i915: Split i915_gem_flush_ring() into seperate invalidate/flush funcs By moving the function to intel_ringbuffer and currying the appropriate parameter, hopefully we make the callsites easier to read and understand. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:55 +02:00
Chris Wilson	26b9c4a57f	drm/i915: Remove the explicit flush of the GPU write domain Rely instead on the insertion of the implicit flush before the seqno breadcrumb. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:54 +02:00
Chris Wilson	86d5bc3782	drm/i915: Remove explicit flush from i915_gem_object_flush_fence() As the flush is either performed explictly immediately after the execbuffer dispatch, or before the serialisation of last_fenced_seqno we can forgo the explict i915_gem_flush_ring(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:53 +02:00
Chris Wilson	69c2fc8913	drm/i915: Remove the per-ring write list This is now handled by a global flag to ensure we emit a flush before the next serialisation point (if we failed to queue one previously). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:53 +02:00
Chris Wilson	65ce302741	drm/i915: Remove the defunct flushing list As we guarantee to emit a flush before emitting the breadcrumb or the next batchbuffer, there is no further need for the flushing list. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:52 +02:00
Chris Wilson	0201f1ecf4	drm/i915: Replace the pending_gpu_write flag with an explicit seqno As we always flush the GPU cache prior to emitting the breadcrumb, we no longer have to worry about the deferred flush causing the pending_gpu_write to be delayed. So we can instead utilize the known last_write_seqno to hopefully minimise the wait times. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:52 +02:00
Chris Wilson	e5f1d962a8	drm/i915: Remove assertion over write domain after i915_gem_object_sync() As we move to lazily clearing the GPU write domain only when the buffer becomes inactive, this leaves a window of opportunity for i915_gem_object_pin_to_display_plane() to detect a seemingly inconsistent value. This function is special as it tries to pipeline the operation to avoid the stall and so may not retires the buffer and we may not get the opportunity to clear the write domain. However, we know all is good, so drop the assertion. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:51 +02:00
Chris Wilson	3bb73aba1e	drm/i915: Allow late allocation of request for i915_add_request() Request preallocation was added to i915_add_request() in order to support the overlay. However, not all users care and can quite happily ignore the failure to allocate the request as they will simply repeat the request in the future. By pushing the allocation down into i915_add_request(), we can then remove some rather ugly error handling in the callers. v2: Nullify request->file_priv otherwise we chase a garbage pointer when retiring requests. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:51 +02:00
Chris Wilson	e9808edd98	drm/i915: Return a mask of the active rings in the high word of busy_ioctl The intention is to help select which engine to use for copies with interoperating clients - such as a GL client making a request to the X server to perform a SwapBuffers, which may require copying from the active GL back buffer to the X front buffer. We choose to report a mask of the active rings to future proof the interface against any changes which may allow for the object to reside upon multiple rings. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: bikeshed away the write ring mask and add the explanation Chris sent in a follow-up mail why we decided to use masks.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:50 +02:00
Chris Wilson	eeef9b3874	drm/i915: Add -EIO to the list of known errors for __wait_seqno This prevents a WARN introduced with commit `de2b998552` Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Wed Jul 4 22:52:50 2012 +0200 drm/i915: don't return a spurious -EIO from intel_ring_begin Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 10:39:57 +02:00
Chris Wilson	67b1b57182	drm/i915: Disable the BLT on pre-production SNB hardware It never quite worked despite the numerous workarounds, yet I still see people trying to use this hardware and filing bug reports. As we no longer even try to implement the workarounds, since `6a233c7887` (drm/i915/ringbuffer: kill snb blt workaround), simply disable the ring. v2: Add a message to inform the user about the limited capabilities of their pre-production hardware. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-20 12:21:37 +02:00
Chris Wilson	6b9d89b436	drm: Add colouring to the range allocator In order to support snoopable memory on non-LLC architectures (so that we can bind vgem objects into the i915 GATT for example), we have to avoid the prefetcher on the GPU from crossing memory domains and so prevent allocation of a snoopable PTE immediately following an uncached PTE. To do that, we need to extend the range allocator with support for tracking and segregating different node colours. This will be used by i915 to segregate memory domains within the GTT. v2: Now with more drm_mm helpers and less driver interference. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dave Airlie <airlied@redhat.com Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@gmail.com>	2012-07-16 05:59:37 +10:00
Daniel Vetter	a9340ccab5	drm/i915: properly SIGBUS on I/O errors ... instead of looping endless with no hope of ever serving that page-fault. We only need to break out of this loop when the gpu died, to run the reset work (and hopefully resurrect it). To clarify questions Chris raised on irc: This is about handling I/O errors not from our own code, but e.g. when the disk died when trying to swap in a gem bo. So this patch remidies the issue that the current handling only handles gpu-death-induced cases of -EIO. Admittedly, dying disks are much rarer than hanging gpus ...To clarify questions Chris raised on irc: This is about handling I/O errors not from our own code, but e.g. when the disk died when trying to swap in a gem bo. So this patch remidies the issue that the current handling only handles gpu-death-induced cases of -EIO. Admittedly, dying disks are much rarer than hanging gpus ... This seems to have been lost in: commit `d9bc7e9f32` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Feb 7 13:09:31 2011 +0000 drm/i915: Fix infinite loop regression from `21dd3734` Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-05 10:03:01 +02:00
Daniel Vetter	0a6759c6ba	drm/i915: don't hang userspace when the gpu reset is stuck With the gpu reset no longer using a trylock we've increased the chances of userspace getting stuck quite a bit. To make that (hopefully) rare case more paletable time out when waiting for the gpu reset code to complete and signal this little issue to the caller by returning -EIO. This should help userspace to somewhat gracefully fall back and hopefully allow the user to grab some logs and reboot the machine (instead of staring at a frozen X screen in agony). Suggested by Chris Wilson because I've been stubborn about allowing the gpu reset code no to fail, ever (by removing the trylock). Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-05 10:02:24 +02:00
Daniel Vetter	d6b2c790a4	drm/i915: non-interruptible sleeps can't handle -EAGAIN So don't return -EAGAIN, even in the case of a gpu hang. Remap it to -EIO instead. Note that this isn't really an issue with interruptability, but more that we have quite a few codepaths (mostly around kms stuff) that simply can't handle any errors and hence not even -EAGAIN. Instead of adding proper failure paths so that we could restart these ioctls we've opted for the cheap way out of sleeping non-interruptibly. Which works everywhere but when the gpu dies, which this patch fixes. So essentially interruptible == false means 'wait for the gpu or die trying'.' This patch is a bit ugly because intel_ring_begin is all non-interruptible and hence only returns -EIO. But as the comment in there says, auditing all the callsites would be a pain. To avoid duplicating code, reuse i915_gem_check_wedge in __wait_seqno and intel_wait_ring_buffer. Also use the opportunity to clarify the different cases in i915_gem_check_wedge a bit with comments. v2: Don't access dev_priv->mm.interruptible from check_wedge - we might not hold dev->struct_mutex, making this racy. Instead pass interruptible in as a parameter. I've noticed this because I've hit a BUG_ON(!mutex_is_locked) at the top of check_wedge. This has been added in commit `b4aca0106c` Author: Ben Widawsky <ben@bwidawsk.net> Date: Wed Apr 25 20:50:12 2012 -0700 drm/i915: extract some common olr+wedge code although that commit is missing any justification for this. I guess it's just copy&paste, because the same commit add the same BUG_ON check to check_olr, where it indeed makes sense. But in check_wedge everything we access is protected by other means, so this is superflous. And because it now gets in the way (we add a new caller in __wait_seqno, which can be called without dev->struct_mutext) let's just remove it. v3: Group all the i915_gem_check_wedge refactoring into this patch, so that this patch here is all about not returning -EAGAIN to callsites that can't handle syscall restarting. v4: Add clarification what interuptible == fales means in our code, requested by Ben Widawsky. v5: Fix EAGAIN mispell noticed by Chris Wilson. Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-05 10:01:14 +02:00
Daniel Vetter	cc889e0f6c	drm/i915: disable flushing_list/gpu_write_list This is just the minimal patch to disable all this code so that we can do decent amounts of QA before we rip it all out. The complicating thing is that we need to flush the gpu caches after the batchbuffer is emitted. Which is past the point of no return where execbuffer can't fail any more (otherwise we risk submitting the same batch multiple times). Hence we need to add a flag to track whether any caches associated with that ring are dirty. And emit the flush in add_request if that's the case. Note that this has a quite a few behaviour changes: - Caches get flushed/invalidated unconditionally. - Invalidation now happens after potential inter-ring sync. I've bantered around a bit with Chris on irc whether this fixes anything, and it might or might not. The only thing clear is that with these changes it's much easier to reason about correctness. Also rip out a lone get_next_request_seqno in the execbuffer retire_commands function. I've dug around and I couldn't figure out why that is still there, with the outstanding lazy request stuff it shouldn't be necessary. v2: Chris Wilson complained that I also invalidate the read caches when flushing after a batchbuffer. Now optimized. v3: Added some comments to explain the new flushing behaviour. Cc: Eric Anholt <eric@anholt.net> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-06-20 13:54:28 +02:00
Ben Widawsky	f2ef6eb145	drm/i915: switch to default context on idle To keep things as sane as possible, switch to the default context before idling. This should help free context objects, as well as put things in a more well defined state before suspending. v2: remove seqno from context switch call (daniel) return error on failed context switch instead of WARN+continue (daniel) v3: move idling to i915_gpu idle (from i915_gem_idle) (Chris) Signed-off-by: Ben Widawsky <ben@bwidawsk.net>	2012-06-14 17:36:20 +02:00
Ben Widawsky	254f965c39	drm/i915: preliminary context support Very basic code for context setup/destruction in the driver. Adds the file i915_gem_context.c This file implements HW context support. On gen5+ a HW context consists of an opaque GPU object which is referenced at times of context saves and restores. With RC6 enabled, the context is also referenced as the GPU enters and exists from RC6 (GPU has it's own internal power context, except on gen5). Though something like a context does exist for the media ring, the code only supports contexts for the render ring. In software, there is a distinction between contexts created by the user, and the default HW context. The default HW context is used by GPU clients that do not request setup of their own hardware context. The default context's state is never restored to help prevent programming errors. This would happen if a client ran and piggy-backed off another clients GPU state. The default context only exists to give the GPU some offset to load as the current to invoke a save of the context we actually care about. In fact, the code could likely be constructed, albeit in a more complicated fashion, to never use the default context, though that limits the driver's ability to swap out, and/or destroy other contexts. All other contexts are created as a request by the GPU client. These contexts store GPU state, and thus allow GPU clients to not re-emit state (and potentially query certain state) at any time. The kernel driver makes certain that the appropriate commands are inserted. There are 4 entry points into the contexts, init, fini, open, close. The names are self-explanatory except that init can be called during reset, and also during pm thaw/resume. As we expect our context to be preserved across these events, we do not reinitialize in this case. As Adam Jackson pointed out, The cutoff of 1MB where a HW context is considered too big is arbitrary. The reason for this is even though context sizes are increasing with every generation, they have yet to eclipse even 32k. If we somehow read back way more than that, it probably means BIOS has done something strange, or we're running on a platform that wasn't designed for this. v2: rename load/unload to init/fini (daniel) remove ILK support for get_size() (indirectly daniel) add HAS_HW_CONTEXTS macro to clarify supported platforms (daniel) added comments (Ben) Signed-off-by: Ben Widawsky <ben@bwidawsk.net>	2012-06-14 17:36:16 +02:00
Daniel Vetter	8ecd1a6615	drm/i915: call intel_enable_gtt When drm/i915 is in control of the gtt, we need to call the enable function at all the relevant places ourselves. Reviewed-by: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-06-12 22:21:07 +02:00
Daniel Vetter	dd2757f8b5	drm/i915: stop using dev->agp->base For that to work we need to export the base address of the gtt mmio window from intel-gtt. Also replace all other uses of dev->agp by values we already have at hand. Reviewed-by: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-06-12 22:18:06 +02:00
Ben Widawsky	eac1f14fd1	drm/i915: Inifite timeout for wait ioctl Change the ns_timeout parameter of the wait ioctl to a signed value. Doing this allows the kernel to provide an infinite wait when a timeout of less than 0 is provided. This mimics select/poll. Initially the parameter was meant to match up with the GL spec 1:1, but after being made aware of how much 2^64 - 1 nanoseconds actually is, I do not think anyone will ever notice the loss of 1 bit. The infinite timeout on waiting is similar to the existing i915 userspace interface with the exception that struct_mutex is dropped while doing the wait in this ioctl. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-06-06 12:25:46 +02:00
Daniel Vetter	30dfebf34b	drm/i915: extract object active state flushing code Both busy_ioctl and the new wait_ioct need to do the same dance (or at least should). Some slight changes: - busy_ioctl now unconditionally checks for olr. Before emitting a require flush would have prevent the olr check and hence required a second call to the busy ioctl to really emit the request. - the timeout wait now also retires request. Not really required for abi-reasons, but makes a notch more sense imo. I've tested this by pimping the i-g-t test some more and also checking the polling behviour of the wait_rendering_timeout ioctl versus what busy_ioctl returns. v2: Too many people complained about unplug, new color is flush_active. v3: Kill the comment about the unplug moniker. v4: s/un-active/inactive/ Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-06-02 20:51:03 +02:00
Daniel Vetter	e269f90f3d	Merge remote-tracking branch 'airlied/drm-prime-vmap' into drm-intel-next-queued We need the latest dma-buf code from Dave Airlie so that we can pimp the backing storage handling code in drm/i915 with Chris Wilson's unbound tracking and stolen mem backed gem object code. Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-06-01 10:52:54 +02:00
Ben Widawsky	b9524a1e1c	drm/i915: remap l3 on hw init If any l3 rows have been previously remapped, we must remap them after GPU reset/resume too. v2: Just return (no warn) on remapping init if not IVB (Jesse) Move the check of schizo userspace to i915_gem_l3_remap (Jesse) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-31 12:11:29 +02:00
Dave Airlie	a21f976094	Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into drm-fixes * 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel: drm/i915: tune down the noise of the RP irq limit fail drm/i915: Remove the error message for unbinding pinned buffers drm/i915: Limit page allocations to lowmem (dma32) for i965 drm/i915: always use RPNSWREQ for turbo change requests drm/i915: reject doubleclocked cea modes on dp drm/i915: Adding TV Out Missing modes. drm/i915: wait for a vblank to pass after tv detect drm/i915: no lvds quirk for HP t5740e Thin Client drm/i915: enable vdd when switching off the eDP panel drm/i915: Fix PCH PLL assertions to not assume CRTC:PLL relationship drm/i915: Always update RPS interrupts thresholds along with frequency drm/i915: properly handle interlaced bit for sdvo dtd conversion drm/i915: fix module unload since error_state rework drm/i915: be more careful when returning -ENXIO in gmbus transfer	2012-05-29 11:09:06 +01:00
Ben Widawsky	199b2bc25b	drm/i915: s/i915_wait_request/i915_wait_seqno/g Wait request is poorly named IMO. After working with these functions for some time, I feel it's much clearer to name the functions more appropriately. Of course we must update the callers to use the new name as well. This leaves room within our namespace for a real wait request function at some point. Note to maintainer: this patch is optional. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-25 14:18:42 +02:00
Ben Widawsky	23ba4fd0a4	drm/i915: wait render timeout ioctl This helps implement GL_ARB_sync but stops short of allowing full blown sync objects. Finally we can use the new timed seqno waiting function to allow userspace to wait on a buffer object with a timeout. This implements that interface. The IOCTL will take as input a buffer object handle, and a timeout in nanoseconds (flags is currently optional but will likely be used for permutations of flush operations). Users may specify 0 nanoseconds to instantly check. The wait ioctl with a timeout of 0 reimplements the busy ioctl. With any non-zero timeout parameter the wait ioctl will wait for the given number of nanoseconds on an object becoming unbusy. Since the wait itself does so holding struct_mutex the object may become re-busied before this completes. A similar but shorter race condition exists in the busy ioctl. v2: ETIME/ERESTARTSYS instead of changing to EBUSY, and EGAIN (Chris) Flush the object from the gpu write domain (Chris + Daniel) Fix leaked refcount in good case (Chris) Naturally align ioctl struct (Chris) v3: Drop lock after getting seqno to avoid ugly dance (Chris) v4: check for 0 timeout after olr check to allow polling (Chris) v5: Updated the comment. (Chris) v6: Return -ETIME instead of -EBUSY when timeout_ns is 0 (Daniel) Fix the commit message comment to be less ugly (Ben) Add a warning to check the return timespec (Ben) v7: Use DRM_AUTH for the ioctl. (Eugeni) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-25 14:15:46 +02:00
Chris Wilson	31d8d651eb	drm/i915: Remove the error message for unbinding pinned buffers This is now used intentionally to prevent proliferation of is-pinned checks upon the inactive list following: commit `1b50247a8d` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Apr 24 15:47:30 2012 +0100 drm/i915: Remove the list of pinned inactive objects Reported-and-tested-by: guang.a.yang@intel.com Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50075 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-25 10:10:40 +02:00
Chris Wilson	bed1ea95a3	drm/i915: Limit page allocations to lowmem (dma32) for i965 Broadwater and Crestline share a limitation that prevent it from relocating general surface state above 4GiB. The only recourse we have since any buffer object may be used as a relocation target is then to limit all object allocations on 965g[m] to DMA32. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-25 10:07:06 +02:00
Ben Widawsky	5c81fe85da	drm/i915: timeout parameter for seqno wait Insert a wait parameter in the code so we can possibly timeout on a seqno wait if need be. The code should be functionally the same as before because all the callers will continue to retry if an arbitrary timeout elapses. We'd like to have nanosecond granularity, but the only way to do this is with hrtimer, and that doesn't fit well with the needs of this code. v2: Fix rebase error (Chris) Return proper time even in wedged + signal case (Chris + Ben) Use timespec constructs (Ben) Didn't take Daniel's advice regarding the Frankenstein-ness of the function. I did try his advice, but in the end I liked the way the original code looked, better. v3: Make wakeups far less frequent for infinite waits (Chris) v4: Remove dummy_wait variable (Daniel) Use raw monotonic time instead of jiffies (made the code a bit cleaner) (Ben) Added a couple of warnings (Ben) v5: Remove warnings (Daniel) Use more accurate time diff for default case (Daniel) Bikeshed for setting the return timespec in timeout case (Daniel) s/jiffies/time in one of the comments Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-25 09:55:08 +02:00
Daniel Vetter	1286ff7397	i915: add dmabuf/prime buffer sharing support. This adds handle->fd and fd->handle support to i915, this is to allow for offloading of rendering in one direction and outputs in the other. v2 from Daniel Vetter: - fixup conflicts with the prepare/finish gtt prep work. - implement ppgtt binding support. Note that we have squat i-g-t testcoverage for any of the lifetime and access rules dma_buf/prime support brings along. And there are quite a few intricate situations here. Also note that the integration with the existing code is a bit hackish, especially around get_gtt_pages and put_gtt_pages. It imo would be easier with the prep code from Chris Wilson's unbound series, but that is for 3.6. Also note that I didn't bother to put the new prepare/finish gtt hooks to good use by moving the dma_buf_map/unmap_attachment calls in there (like we've originally planned for). Last but not least this patch is only compile-tested, but I've changed very little compared to Dave Airlie's version. So there's a decent chance v2 on drm-next works as well as v1 on 3.4-rc. v3: Right when I've hit sent I've noticed that I've screwed up one obj->sg_list (for dmar support) and obj->sg_table (for prime support) disdinction. We should be able to merge these 2 paths, but that's material for another patch. v4: fix the error reporting bugs pointed out by ickle. v5: fix another error, and stop non-gtt mmaps on shared objects stop pread/pwrite on imported objects, add fake kmap Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-23 10:47:10 +01:00
Chris Wilson	b4519513e8	drm/i915: Introduce for_each_ring() macro In many places we wish to iterate over the rings associated with the GPU, so refactor them to use a common macro. Along the way, there are a few code removals that should be side-effect free and some rearrangement which should only have a cosmetic impact, such as error-state. Note that this slightly changes the semantics in the hangcheck code: We now always cycle through all enabled rings instead of short-circuiting the logic. v2: Pull in a couple of suggestions from Ben and Daniel for intel_ring_initialized() and not removing the warning (just moving them to a new home, closer to the error). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Added note to commit message about the small behaviour change, suggested by Ben Widawsky.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-19 22:39:53 +02:00
Daniel Vetter	5e13a0c5ec	Merge remote-tracking branch 'airlied/drm-core-next' into drm-intel-next-queued Backmerge of drm-next to resolve a few ugly conflicts and to get a few fixes from 3.4-rc6 (which drm-next has already merged). Note that this merge also restricts the stencil cache lra evict policy workaround to snb (as it should) - I had to frob the code anyway because the CM0_MASK_SHIFT define died in the masked bit cleanups. We need the backmerge to get Paulo Zanoni's infoframe regression fix for gm45 - further bugfixes from him touch the same area and would needlessly conflict. Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-08 13:39:59 +02:00
Daniel Vetter	dc257cf154	Linux 3.4-rc6 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQEcBAABAgAGBQJPpvY9AAoJEHm+PkMAQRiGpEoIAJgbu+Y8gITnBK/wh9O6zy3S 5jie5KK4YWdbJsvO58WbNr3CyVIwGIqQ2dUZLiU59aBVLarlGw8xor0MmW+cZwhp 6fBHaf0qDYAV0MZjD+mnnExOiCRyISa2lPmsfu9dAWywh5KGe6/oAP6/qcXIyok3 KZyl3qQf4ENpaZPHwZPXCEkUvtuyHgNiszN+QXEadA3s19Ot4VGe9A3VGw+GNrSm JqFIq3acQAbKa5BYaqf7TQC02v2FI7//eqt6QHxTqbE6a7LGbTvLfX3HlJ2mnfqa 1R6QHhM4y4OZDHbaMT2raHZ8WuLXzhehJzhP8Co7AHFOKwVKOb5XbcUr2RrukMU= =HkMd -----END PGP SIGNATURE----- Merge tag 'v3.4-rc6' into drm-intel-next Conflicts: drivers/gpu/drm/i915/intel_display.c Ok, this is a fun story of git totally messing things up. There /shouldn't/ be any conflict in here, because the fixes in -rc6 do only touch functions that have not been changed in -next. The offending commits in drm-next are 14415745b2..1fa611065 which simply move a few functions from intel_display.c to intel_pm.c. The problem seems to be that git diff gets completely confused: $ git diff 14415745b2..1fa611065 is a nice mess in intel_display.c, and the diff leaks into totally unrelated functions, whereas $git diff --minimal 14415745b2..1fa611065 is exactly what we want. Unfortunately there seems to be no way to teach similar smarts to the merge diff and conflict generation code, because with the minimal diff there really shouldn't be any conflicts. For added hilarity, every time something in that area changes the + and - lines in the diff move around like crazy, again resulting in new conflicts. So I fear this mess will stay with us for a little longer (and might result in another backmerge down the road). Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-07 14:02:14 +02:00
Ben Widawsky	b4aca0106c	drm/i915: extract some common olr+wedge code The new wait_rendering ioctl also needs to check for an oustanding lazy request, and we already duplicate that logic at three places. So extract it. While at it, also extract the code to check the gpu wedging state to improve code flow. v2: Don't use seqno as an outparam (Chris) v3 by danvet: Kill stale comment and pimp commit message Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:32 +02:00
Daniel Vetter	53ca26cab8	drm/i915 disallow physical batchbuffers for KMS Even the horrible gen3 XvMC code has learned to do this right by the time xf86-video-intel releases learned to do kernel modesetting. So we can just disallow this. Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:25 +02:00
Daniel Vetter	8781342df7	drm/i915: create dev_priv->dri1 dragon dungeon^W^W sub-struct ... and shove allow_batchbuffer in there. More dragons will follow suit. There's the curious case that we allow this for KMS ... Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:25 +02:00
Ben Widawsky	3b88cc0dd7	drm/i915: use __wait_seqno for ring throttle It turns out throttle had an almost identical bit of code to do the wait. Now we can call the new helper directly. This is just a bonus, and not needed for the overall series. v2: remove irq_get/put which is now in __wait_seqno (Ben) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:23 +02:00
Ben Widawsky	4146b08d76	drm/i915: remove polled wait from throttle It's about to go away anyway. Just here to help bisection. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:22 +02:00
Ben Widawsky	604dd3ec75	drm/i915: extract __wait_seqno from i915_wait_request i915_wait_request is actually a fairly large function encapsulating quite a few different operations. Because being able to wait on seqnos in various conditions is useful, extracting that bit of code to a helper function seems useful v2: pull the irq_get/put as well (Ben) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:22 +02:00
Ben Widawsky	c58cf4f108	drm/i915: drop polled waits from i915_wait_request The only time irq_get should fail is during unload or suspend. Both of these points should try to quiesce the GPU before disabling interrupts and so the atomic polling should never occur. This was recommended by Chris Wilson as a way of reducing added complexity to the polled wait which I introduced in an RFC patch. 09:57 < ickle_> it's only there as a fudge for waiting after irqs after uninstalled during s&r, we aren't actually meant to hit it 09:57 < ickle_> so maybe we should just kill the code there and fix the breakage v2: return -ENODEV instead of -EBUSY when irq_get fails Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:22 +02:00
Ben Widawsky	9574b3fe29	drm/i915: kill waiting_seqno The waiting_seqno is not terribly useful, and as such we can remove it so that we'll be able to extract lockless code. v2: Keep the information for error_state (Chris) Check if ring is initialized in hangcheck (Chris) Capture the waiting ring (Chris) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: add some bikeshed to clarify a comment.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:21 +02:00
Ben Widawsky	be998e2e39	drm/i915: move vbetool invoked ier stuff This extra bit of interrupt enabling code doesn't belong in the wait seqno function. If anything we should pull it out to a helper so the throttle code can also use it. The history is a bit vague, but I am going to attempt to just dump it, unless someone can argue otherwise. Removing this allows for a shared lock free wait seqno function. To keep tabs on this issue though, the IER value is stored on error capture (recommended by Chris Wilson) v2: fixed typo EIR->IER (Ben) Fix some white space (Ben) Move IER capture to globally instead of per ring (Ben) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: ier is a 16 bit reg on gen2!] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:21 +02:00
Ben Widawsky	b2da9fe5d5	drm/i915: remove do_retire from i915_wait_request This originates from a hack by me to quickly fix a bug in an earlier patch where we needed control over whether or not waiting on a seqno actually did any retire list processing. Since the two operations aren't clearly related, we should pull the parameter out of the wait function, and make the caller responsible for retiring if the action is desired. The only function call site which did not get an explicit retire_request call (on purpose) is i915_gem_inactive_shrink(). That code was already calling retire_request a second time. v2: don't modify any behavior excepit i915_gem_inactive_shrink(Daniel) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:20 +02:00
Daniel Vetter	507432986c	drm/i915: use the new masked bit macro some more I've missed this one. v2: Chris Wilson noticed another register. v3: Color choice improvements. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:20 +02:00
Daniel Vetter	63ed2cb2d1	drm/i915: rip out GEM drm feature checks We always set it so there's no point in checking. We could instead add a bit that tells us whether gem is actually initialized (i.e. either kms or gem_init_ioctl called), but that's imho not worth it. So just rip it out. There's a little change in the wait_ring timeout, but we've never run with anything else than the 60 second timeout, even on dri1 userspace. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:14 +02:00
Daniel Vetter	7bb6fb8dd9	drm/i915: disallow gem ums init ioctl for kms This ioctl used in a kms driver is only useful to create massive havoc. v2: Bail out with -ENODEV as suggested by Chris Wilson. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:13 +02:00
Chris Wilson	1070a42b6b	drm/i915: Move GEM initialisation from i915_dma.c to i915_gem.c Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:12 +02:00
Chris Wilson	1488fc08c1	drm/i915: Remove the deferred-free list The use of the mm_list by deferred-free breaks the following patches to extend the range of objects tracked. We can simplify things if we just make the unbind during free uninterrutible. Note that unbinding should never fail, because we hold an additional reference on every active object. Only the ilk vt-d workaround breaks this, but already takes care of not failing by waiting for the gpu to quiescent non-interruptible. But the existence of the deferred free list casted some doubts on this theory, hence WARN if the unbind fails and only then retry non-interruptible. We can kill this additional code after a release in case the theory is indeed right and no one has hit that WARN. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:11 +02:00
Chris Wilson	1b50247a8d	drm/i915: Remove the list of pinned inactive objects Simplify object tracking by removing the inactive but pinned list. The only place where this was used is for counting the available memory, which is just as easy performed by checking all objects on the rare occasions it is required (application startup). For ease of debugging, we keep the reporting of pinned objects through the error-state and debugfs. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:11 +02:00
Chris Wilson	a39d7efc62	drm/i915: Remove i915_gem_evict_inactive() This was only used by one external caller who would just be as happy with evict-everything, so perform the replacement and make the function private. In the process we note that unbinding the inactive list should not fail, and make it a warning instead. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:10 +02:00
Chris Wilson	8325a09dd0	drm/i915: Bump the inactive LRU on set-to-GTT-domain Currently, we only bump the inactive LRU of an object when we bind into the GTT for a page-fault. As the object may be used many times before its mapping is zapped, we do not mark it as active as frequently as we should. Userspace should be calling set-to-GTT-domain before each pointer deference (for synchronous access) and so is a good place to mark the buffer as active. Marking the buffer as recently used places it at the end of the inactive eviction queue, though still before anything with outstanding rendering. This reduces the likelihood of evicting a buffer that is going to be used again by the CPU in the near future. This way we can hopefully avoid to kick out upload buffers right before we use them on the gpu. Note that we need to check that the object is not active or pinned, for otherwise we create havoc on the active/pinned lists, which also use obj->mm_list. The active lists are sorted by and evicted in last GPU rendering order, access by the CPU to a still active buffer therefore does not affect its eviction ordering. Pinned objects are currently excluded from eviction, therefore the only list that we need to bump for GTT access by the CPU is the inactive list. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Added further explanations to the commit message as discussed on irc.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:10 +02:00
Daniel Vetter	6b26c86d61	drm/i915: create macros to handle masked bits ... and put them to so good use. Note that there's functional change in vlv clock gating code, we now no longer spuriously read back the current value of the bit. According to Bspec the high bits should always read zero, so ORing this in should have no effect. Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:08 +02:00
Chris Wilson	5d82e3e642	drm/i915: Clarify the semantics of tiling_changed Rename obj->tiling_changed to obj->fence_dirty so that it is clear that it flags when the parameters for an active fence (including the no-fence) register are changed. Also, do not set this flag when the object does not have a fence register allocated currently and the gpu does not depend upon the unfence. This case works exactly like when a tiled object lost its fence and hence does not need additional handling for the tiling change in the code. v2: Use fence_dirty to better express what the flag tracks and add a few more details to the comments to serve as a reminder of how the GPU also uses the unfenced register slot. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Add some bikeshed to the commit message about the stricter use of fence_dirty.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:06 +02:00
Ben Widawsky	4f0c7cfbb4	drm/i915: [sparse] __iomem fixes for gem As with one of the earlier patches in the series, we're forced to cast for copy_[to\|from]_user. Again because of the nature of the GEN x86 exclusivity, this should be safe. Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> [danvet: Added some bikeshed.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:01 +02:00
Linus Torvalds	6be5ceb02e	VM: add "vm_mmap()" helper function This continues the theme started with vm_brk() and vm_munmap(): vm_mmap() does the same thing as do_mmap(), but additionally does the required VM locking. This uninlines (and rewrites it to be clearer) do_mmap(), which sadly duplicates it in mm/mmap.c and mm/nommu.c. But that way we don't have to export our internal do_mmap_pgoff() function. Some day we hopefully don't have to export do_mmap() either, if all modular users can become the simpler vm_mmap() instead. We're actually very close to that already, with the notable exception of the (broken) use in i810, and a couple of stragglers in binfmt_elf. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-04-20 17:29:13 -07:00
Chris Wilson	14415745b2	drm/i915: Refactor get_fence() to use the common fence writing routine We can also take advantage of the new 'no retire' mode for seqno waiting to avoid having to take a reference on the old fence object whilst flushing an existing fence. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-18 13:40:51 +02:00
Chris Wilson	ada726c734	drm/i915: Refactor fence clearing to use the common fence writing routine Now that we have a routine that is able to clear the fences as well as setup up the register for a tiled object, remove the surplus routines to clear the fences. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-18 13:34:53 +02:00
Chris Wilson	61050808bb	drm/i915: Refactor put_fence() to use the common fence writing routine One clarification that we make is to the existing semantics of obj->tiling_changed to only mean that we need to update an associated fence register (including the NO_FENCE when executing an untiled but fenced GPU command). If we do not have a fence register or pending fenced GPU access for the object (after put_fence() for example), then we can clear the tiling_changed flag as any fence will necessarily be rewritten upon acquisition. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-18 13:34:30 +02:00
Chris Wilson	9ce079e481	drm/i915: Prepare to consolidate fence writing Update the existing architecture specific fence writing routines to either update the fence to point to a tiled object or to clear them in preparation to remove the other fence writing routes. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-18 13:24:32 +02:00
Chris Wilson	1899184547	drm/i915: Remove the unsightly "optimisation" from flush_fence() As i915_wait_request() will first check for an already passed seqno, doing it also in the caller is a waste of space for a cold path. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-18 13:23:17 +02:00
Chris Wilson	8fe301add5	drm/i915: Simplify fence finding As the fences are stored in LRU order, we can simply reuse the oldest if we do not have an unused register. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-18 13:20:35 +02:00
Chris Wilson	1c293ea3b1	drm/i915: Discard the unused obj->last_fenced_ring As we now never pipeline a fence update, obj->last_fenced_ring is always the same as the obj->ring whenever obj->last_fenced_seqno is active, so remove it. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-18 13:19:51 +02:00
Chris Wilson	69963e7c76	drm/i915: Remove unused ring->setup_seqno As we now no longer track a pipelined fence change, we never use ring->setup_seqno and can kill it. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-18 13:18:52 +02:00
Chris Wilson	a360bb1a83	drm/i915: Remove fence pipelining Step 2 is then to replace the pipelined parameter with NULL and perform constant folding to remove dead code. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-18 13:18:25 +02:00
Chris Wilson	06d9813157	drm/i915: Remove the pipelined parameter from get_fence() We never succeeded in getting pipelined fencing to work (unresolved spurious GPU hangs), so begin the process of dismantling and removal the broken code. Step 1 is the removal of the pipeline parameter to get_fence(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-18 13:15:43 +02:00
Daniel Vetter	48ecfa1090	drm/i915: properly set ppgtt cacheability on snb For some reason snb has 2 fields to set ppgtt cacheability. This one here does not exist on gen7. This might explain why ppgtt wasn't a win on snb like on ivb - not enough pte caching. v2: Fixup rebase fail. Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-17 11:19:59 +02:00
Daniel Vetter	be901a5a1b	drm/i915: set w/a bit for snb pagefaults Bspec says that we need to set this: vol1c.3 "Blitter Command Streamer", Section 1.1.2.1 "GAB_CTL_REG - GAB Unit Control Register". We don't really rely on pagefaults, but who knows what this all affects. Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-17 11:19:56 +02:00
Daniel Vetter	767878908e	Linux 3.4-rc3 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQEbBAABAgAGBQJPi3XOAAoJEHm+PkMAQRiGnsUH9RjHwH4YFVyuP/DKtKa6zs74 wqkpT15yITQ5WWMog4JaJFFg5rJCUd8QZr7AS/HSn0ijDyZX5VU7Rcs9cMudDzNR H/5K/AscS4fjb0HwWVqoltTWHRb9QGSwVN3+E3VCDLt9P89YJ0o3QztkkuEX5dkZ jc7reVXTfRnCcILEa9jleOzrn+OLM3j/jAjQ2hGunl8EDLzD4b17HHPoli4jEZ/5 5ibpSVsPD+AqzN+glbXvYjVItl12D0IQos/JdOwfuZriCVWLxysSSwHZTbPCyvBZ LHH4HR5T+XLSXbjJeNkUFHLzqU+d5gVRadIoWtJCxqxFjKbOs2YtzJ5Ai0nDiw== =kTkC -----END PGP SIGNATURE----- Merge tag 'v3.4-rc3' into drm-intel-next-queued Backmerge Linux 3.4-rc3 into drm-intel-next to resolve a few things that conflict/depend upon patches in -rc3: - Second part of the Sandybridge workaround series - it changes some of the same registers. - Preparation for Chris Wilson's fencing cleanup - we need the fix from -rc3 merged before we can move around all that code. - Resolve the gmbus conflict - gmbus has been disabled in 3.4 again, but should be enabled on all generations in 3.5. Conflicts: drivers/gpu/drm/i915/intel_i2c.c Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-17 11:16:20 +02:00
Daniel Vetter	c07496fa61	drm/i915: don't pwrite tiled objects through the gtt ... we will botch up the bit17 swizzling. Furthermore tiled pwrite is a (now) unused slowpath, so no one really cares. This fixes the last swizzling issues I have with i-g-t on my bit17 swizzling i915G. No regression, it's been broken since the dawn of gem, but it's nice for regression tracking when really _all_ i-g-t tests work. Actually this is not true, Chris Wilson noticed while reviewing this patch that the commit commit `d9e86c0ee6` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Nov 10 16:40:20 2010 +0000 drm/i915: Pipelined fencing [infrastructure] contained a functional change that broke things. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-15 19:37:42 +02:00
Ben Widawsky	1500f7ea06	drm/i915: hide (seqno-1) in ringbuffer code Waiting for seqno-1 in our object synchronization code is an implementation detail given how we've decided to do the waits within the rest of our code. Requested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-12 21:14:14 +02:00
Ben Widawsky	e3a5a2250a	drm/i915: fix for when semaphore updates fail This fixes a long standing issue where emitting the semaphore updates may have failed, but we've already updated our internal data structure. Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-12 21:14:13 +02:00
Ben Widawsky	5816d648d5	drm/i915: i915_gem_object_sync must handle NULL When I extracted the synchronization code for implementing semaphorified pageflips (74f5f6e0), I neglected the non pipelined case which also calls this code. The modesetting code wants to make sure the object has finished rendering to the frame before configuring the scanout (ie. non-pipelined case). As a result of a follow on discussion on IRC, I've decided to add a comment about the function itself which received much inspiration from Chris as well. So really, this patch was ghost-written by Chris :). Reported-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-12 21:14:13 +02:00
Chris Wilson	f84131905b	drm/i915: Allow concurrent read access between CPU and GPU domain Similar to allowing a buffer to be simultaneously read by the GPU and through the GTT, we wish to allow readback of the pages through the CPU domain whilst they are also being read by the GPU. Domain coherency is managed by allowing multiple readers, but only a single writer. This is used by mesa for its program cache which it may search for every new program every frame and then renews should it need to add. During renewal, mesa copies the program bo currently executing through a CPU mapping onto the new bo. This patch allows the search and that copy to proceed without causing a stall on the current batch. Testcase: i-g-t/tests/gem_cpu_concurrent_blit Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-12 21:14:10 +02:00
Ben Widawsky	2911a35b2e	drm/i915: use semaphores for the display plane In theory this will have performance and power improvements. Performance because we don't need to stall when the scanout BO is busy, and power because we don't have to stall when the BO is busy (and the ring can even go to sleep if the HW supports it). v2: squash 2 patches into 1 (me) un-inline the enable_semaphores function (Daniel) remove comment about SNB hangs from i915_gem_object_sync (Chris) rename intel_enable_semaphores to i915_semaphore_is_enabled (me) removed page flip comment; "no why" (Chris) To address other comments from Daniel (irc): update the comment to say 'vt-d is crap, don't enable semaphores' - I think you misinterpreted Chris' comment, it already exists. checking out whether we can pageflip on the render ring on ivb (didn't work on early silicon) - We don't want to enable workarounds for early silicon unless we have to. - I can't find any references in the docs about this. optionally use it if the fb is already busy on the render ring - This should be how the code already worked, unless I am misunderstanding your meaning. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-12 21:14:05 +02:00
Chris Wilson	9a5a53b392	drm/i915: Reorganise rules for get_fence/put_fence By simplifying the rules to calling get_fence when writing to the through the GTT in a tiled manner, and calling put_fence before writing to the object through the GTT in a linear manner, the code becomes clearer and there is less chance of making a mistake. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> [danvet: fixed up conflict with ppgtt code and spelling in a new comment.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-12 21:14:04 +02:00
Dave Airlie	effbc4fd8e	Merge branch 'drm-intel-next' of git://people.freedesktop.org/~danvet/drm-intel into drm-core-next Daniel Vetter wrote First pull request for 3.5-next, slightly large than usual because new things kept coming in since the last pull for 3.4. Highlights: - first batch of hw enablement for vlv (Jesse et al) and hsw (Eugeni). pci ids are not yet added, and there's still quite a few patches to merge (mostly modesetting). To make QA easier I've decided to merge this stuff in pieces. - loads of cleanups and prep patches spurred by the above. Especially vlv is a real frankenstein chip, but also hsw is stretching our driver's code design. Expect more to come in this area for 3.5. - more gmbus fixes, cleanups and improvements by Daniel Kurtz. Again, there are more patches needed (and some already queued up), but I wanted to split this a bit for better testing. - pwrite/pread rework and retuning. This series has been in the works for a few months already and a lot of i-g-t tests have been created for it. Now it's finally ready to be merged. Note that one patch in this series touches include/pagemap.h, that patch is acked-by akpm. - reduce mappable pressure and relocation throughput improvements from Chris. - mmap offset exhaustion mitigation by Chris Wilson. - a start at figuring out which codepaths in our messy dri1/ums+gem/kms driver we actually need to support by bailing out of unsupported case. The driver now refuses to load without kms on gen6+ and disallows a few ioctls that userspace never used in certain cases. More of this will definitely come. - More decoupling of global gtt and ppgtt. - Improved dual-link lvds detection by Takashi Iwai. - Shut up the compiler + plus fix the fallout (Ben) - Inverted panel brightness handling (mostly Acer manages to break things in this way). - Small fixlets and adjustements and some minor things to help debugging. Regression-wise QA reported quite a few issues on ivb, but all of them turned out to be hw stability issues which are already fixed in drm-intel-fixes (QA runs the nightly regression tests on -next alone, without -fixes automatically merged in). There's still one issue open on snb, it looks like occlusion query writes are not quite as cache coherent as we've expected. With some of the pwrite adjustements we can now reliably hit this. Kernel workaround for it is in the works." * 'drm-intel-next' of git://people.freedesktop.org/~danvet/drm-intel: (101 commits) drm/i915: VCS is not the last ring drm/i915: Add a dual link lvds quirk for MacBook Pro 8,2 drm/i915: make quirks more verbose drm/i915: dump the DMA fetch addr register on pre-gen6 drm/i915/sdvo: Include YRPB as an additional TV output type drm/i915: disallow gem init ioctl on ilk drm/i915: refuse to load on gen6+ without kms drm/i915: extract gt interrupt handler drm/i915: use render gen to switch ring irq functions drm/i915: rip out old HWSTAM missed irq WA for vlv drm/i915: open code gen6+ ring irqs drm/i915: ring irq cleanups drm/i915: add SFUSE_STRAP registers for digital port detection drm/i915: add WM_LINETIME registers drm/i915: add WRPLL clocks drm/i915: add LCPLL control registers drm/i915: add SSC offsets for SBI access drm/i915: add port clock selection support for HSW drm/i915: add S PLL control drm/i915: add PIXCLK_GATE register ... Conflicts: drivers/char/agp/intel-agp.h drivers/char/agp/intel-gtt.c drivers/gpu/drm/i915/i915_debugfs.c	2012-04-12 10:27:01 +01:00
Daniel Vetter	15a13bbdff	drm/i915: clear fencing tracking state when retiring requests This fixes a resume regression introduced in commit `7dd4906586` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Mar 21 10:48:18 2012 +0000 drm/i915: Mark untiled BLT commands as fenced on gen2/3 which fixed fencing tracking for untiled blt commands. A side effect of that patch was that now also untiled objects have a non-zero obj->last_fenced_seqno to track when a fence can be set up after a pipelined tiling change. Unfortunately this was only cleared by the fence setup and teardown code, resulting in tons of untiled but inactive objects with non-zero last_fenced_seqno. Now after resume we completely reset the seqno tracking, both on the driver side (by setting dev_priv->next_seqno = 1) and on the hw side (by allocating a new hws page, which contains the seqnos). Hilarity and indefinite waits ensued from the stale seqnos in obj->last_fenced_seqno from before the suspend. The fix is to properly clear the fencing tracking state like we already do for the normal gpu rendering while moving objects off the active list. Reported-and-tested-by: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Jiri Slaby <jslaby@suse.cz> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-12 09:02:37 +02:00
Daniel Vetter	f534bc0b22	drm/i915: disallow gem init ioctl on ilk Ums is already disabled, but on ilk we can additionally disable gem initialization when using user mode setting. Upstream never support ilk without kernel modesetting and not even the RHEL ilk ums backport needs gem - that driver is based on xf86-video-intel version 2.2, which is pre-gem. Reviewed-by: Adam Jackson <ajax@redhat.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-09 18:04:08 +02:00
Chris Wilson	7dd4906586	drm/i915: Mark untiled BLT commands as fenced on gen2/3 The BLT commands on gen2/3 utilize the fence registers and so we cannot modify any fences for the object whilst those commands are in flight. Currently we marked tiled commands as occupying a fence, but forgot to restrict the untiled commands from preventing a fence being assigned before they were completed. One side-effect is that we ten have to double check that a fence was allocated for a fenced buffer during move-to-active. Reported-by: Jiri Slaby <jirislaby@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43427 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47990 Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Testcase: i-g-t/tests/gem_tiled_after_untiled_blt Tested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-01 12:26:05 +02:00
Daniel Vetter	55a254ac63	drm/i915: properly restore the ppgtt page directory on resume The ppgtt page directory lives in a snatched part of the gtt pte range. Which naturally gets cleared on hibernate when we pull the power. Suspend to ram (which is what I've tested) works because despite the fact that this is a mmio region, it is actually back by system ram. Fix this by moving the page directory setup code to the ppgtt init code (which gets called on resume). This fixes hibernate on my ivb and snb. Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-01 12:25:29 +02:00
Jesse Barnes	23e3f9b37e	drm/i915: check for disabled interrupts on ValleyView Haven't seen this yet, but it doesn't hurt. Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-29 00:11:46 +02:00
Daniel Vetter	e7e58eb5c0	drm/i915: mark pwrite/pread slowpaths with unlikely Beside helping the compiler untangle this maze they double-up as documentation for which parts of the code aren't performance-critical but just around to keep old (but already dead-slow) userspace from breaking. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:41:41 +02:00
Daniel Vetter	23c18c71da	drm/i915: fixup in-line clflushing on bit17 swizzled bos The issue is that with inline clflushing the clflushing isn't properly swizzled. Fix this by - always clflushing entire 128 byte chunks and - unconditionally flush before writes when swizzling a given page. We could be clever and check whether we pwrite a partial 128 byte chunk instead of a partial cacheline, but I've figured that's not worth it. Now the usual approach is to fold this into the original patch series, but I've opted against this because - this fixes a corner case only very old userspace relies on and - I'd like to not invalidate all the testing the pwrite rewrite has gotten. This fixes the regression notice by tests/gem_tiled_partial_prite_pread from i-g-t. Unfortunately it doesn't fix the issues with partial pwrites to tiled buffers on bit17 swizzling machines. But that is also broken without the pwrite patches, so likely a different issue (or a problem with the testcase). v2: Simplify the patch by dropping the overly clever partial write logic for swizzled pages. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:40:57 +02:00
Daniel Vetter	f56f821feb	mm: extend prefault helpers to fault in more than PAGE_SIZE drm/i915 wants to read/write more than one page in its fastpath and hence needs to prefault more than PAGE_SIZE bytes. Add new functions in filemap.h to make that possible. Also kill a copy&pasted spurious space in both functions while at it. v2: As suggested by Andrew Morton, add a multipage parameter to both functions to avoid the additional branch for the pagemap.c hotpath. My gcc 4.6 here seems to dtrt and indeed reap these branches where not needed. v3: Becaus I couldn't find a way around adding a uaddr += PAGE_SIZE to the filemap.c hotpaths (that the compiler couldn't remove again), let's go with separate new functions for the multipage use-case. v4: Adjust comment to CodingStlye and fix spelling. Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:36:30 +02:00
Daniel Vetter	d174bd6472	drm/i915: extract copy helpers from shmem_pread\|pwrite While moving around things, this two functions slowly grew out of any sane bounds. So extract a few lines that do the copying and clflushing. Also add a few comments to explain what's going on. v2: Again do s/needs_clflush/needs_clflush_after/ in the write paths as suggested by Chris Wilson. Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:30:33 +02:00
Daniel Vetter	117babcdd5	drm/i915: use uncached writes in pwrite It's around 20% faster. Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:29:38 +02:00
Daniel Vetter	ffc62976d2	drm/i915: fall back to shmem pwrite when the buffer is not accessible It's too expensive to move it around just for that pwrite, especially when we're trashing on the mappable gtt part like crazy. Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:29:08 +02:00
Daniel Vetter	586428852a	drm/i915: implement inline clflush for pwrite In micro-benchmarking of the usual pwrite use-pattern of alternating pwrites with gtt domain reads from the gpu, this yields around 30% improvement of pwrite throughput across all buffers size. The trick is that we can avoid clflush cachelines that we will overwrite completely anyway. Furthermore for partial pwrites it gives a proportional speedup on top of the 30% percent because we only clflush back the part of the buffer we're actually writing. v2: Simplify the clflush-before-write logic, as suggested by Chris Wilson. v3: Finishing touches suggested by Chris Wilson: - add comment to needs_clflush_before and only set this if the bo is uncached. - s/needs_clflush/needs_clflush_after/ in the write paths to clearly differentiate it from needs_clflush_before. Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:28:45 +02:00
Daniel Vetter	96d79b5270	drm/i915: don't clobber userspace memory before commiting to the pread The pagemap.h prefault helpers do the prefaulting by simply writing some data into every page. Hence we should not prefault when we're not yet commited to to actually writing data to userspace. The problem is now that - we can't prefault while holding dev->struct_mutex for we could deadlock with our own pagefault handler - we need to grab dev->struct_mutex before copying to sync up with any outsanding gpu writes. Therefore only prefault when we're dropping the lock the first time in the pread slowpath - at that point we're committed to the write, don't wait on the gpu anymore and hence won't return early (with e.g. -EINTR). Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:28:32 +02:00
Daniel Vetter	935aaa692e	drm/i915: drop gtt slowpath With the proper prefault, it's extremely unlikely that we fall back to the gtt slowpath. So just kill it and use the shmem_pwrite path as fallback. To further clean up the code, move the preparatory gem calls into the respective pwrite functions. This way the gtt_fast->shmem fallback is much more obvious. Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:27:21 +02:00
Daniel Vetter	692a576b9d	drm/i915: don't call shmem_read_mapping unnecessarily This speeds up pwrite and pread from ~120 µs ro ~100 µs for reading/writing 1mb on my snb (if the backing storage pages are already pinned, of course). v2: Chris Wilson pointed out a glaring page reference bug - I've unconditionally dropped the reference. With that fixed (and the associated reduction of dirt in dmesg) it's now even a notch faster. v3: Unconditionaly grab a page reference when dropping dev->struct_mutex to simplify the code-flow. Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:27:03 +02:00
Daniel Vetter	3ae5378330	drm/i915: don't use gtt_pwrite on LLC cached objects ~120 µs instead fo ~210 µs to write 1mb on my snb. I like this. Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:25:45 +02:00
Daniel Vetter	a0356fc373	drm/i915: kill ranged cpu read domain support No longer needed. Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:25:32 +02:00
Daniel Vetter	8489731c9b	drm/i915: move clflushing into shmem_pread This is obviously gonna slow down pread. But for a half-way realistic micro-benchmark, it doesn't matter: Non-broken userspace reads back data from the gpu once before the gpu again dirties it. So all this ranged clflush tracking is just a waste of time. No pread performance change (neglecting the dumb benchmark of constantly reading the same data) measured. As an added bonus, this avoids clflush on read on coherent objects. Which means that partial preads on snb are now roughly 4x as fast. This will be usefull for e.g. the libva encoder - when I finally get around to fix that up. v2: Properly sync with the gpu on LLC machines. Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:20:01 +02:00
Daniel Vetter	dbf7bff074	drm/i915: merge shmem_pread slow&fast-path With the previous rewrite, they've become essential identical. v2: Simplify the page_do_bit17_swizzling logic as suggested by Chris Wilson. Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:19:11 +02:00
Daniel Vetter	e244a443bf	drm/i915: merge shmem_pwrite slow&fast-path With the previous rewrite, they've become essential identical. v2: Simplify the page_do_bit17_swizzling logic as suggested by Chris Wilson. Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:18:58 +02:00
Chris Wilson	dabdfe021a	drm/i915: Avoid using mappable space for relocation processing through the CPU We try to avoid writing the relocations through the uncached GTT, if the buffer is currently in the CPU write domain and so will be flushed out to main memory afterwards anyway. Also on SandyBridge we can safely write to the pages in cacheable memory, so long as the buffer is LLC mapped. In either of these cases, we therefore do not need to force the reallocation of the buffer into the mappable region of the GTT, reducing the aperture pressure. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:16:17 +02:00
Daniel Vetter	644ec02b5d	drm/i915: s/i915_gem_do_init/i915_gem_init_global_gtt ... because this is what it actually doesn now that we have the global gtt vs. ppgtt split. Also move it to the other global gtt functions in i915_gem_gtt.c Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:14:59 +02:00

1 2 3 4 5 ...

654 Commits