Commit Graph

201 Commits

Author SHA1 Message Date
Daniel Vetter ae6c480692 drm/i915: Only track real ppgtt for a context
There's a bit a confusion since we track the global gtt,
the aliasing and real ppgtt in the ctx->vm pointer. And not
all callers really bother to check for the different cases and just
presume that it points to a real ppgtt.

Now looking closely we don't actually need ->vm to always point at an
address space - the only place that cares actually has fixup code
already to decide whether to look at the per-proces or the global
address space.

So switch to just tracking the ppgtt directly and ditch all the
extraneous code.

v2: Fixup the ppgtt debugfs file to not oops on a NULL ctx->ppgtt.
Also drop the early exit - without aliasing ppgtt we want to dump all
the ppgtts of the contexts if we have full ppgtt.

v3: Actually git add the compile fix.

Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Cc: "Thierry, Michel" <michel.thierry@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
OTC-Jira: VIZ-3724
[danvet: Resolve conflicts with execlist patches while applying.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-13 14:23:33 +02:00
Oscar Mateo ba8b7ccb19 drm/i915/bdw: Workload submission mechanism for Execlists
This is what i915_gem_do_execbuffer calls when it wants to execute some
worload in an Execlists world.

v2: Check arguments before doing stuff in intel_execlists_submission. Also,
get rel_constants parsing right.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Drop the chipset flush, that's pre-gen6. And appease
checkpatch a bit .... again!]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 23:18:38 +02:00
Oscar Mateo a83014d3f8 drm/i915: Abstract the legacy workload submission mechanism away
As suggested by Daniel Vetter. The idea, in subsequent patches, is to
provide an alternative to these vfuncs for the Execlists submission
mechanism.

v2: Splitted into two and reordered to illustrate our intentions, instead
of showing it off. Also, remove the add_request vfunc and added the
stop_ring one.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet:
- Make checkpatch happy.
- Be grumpy about the excessive vtable.
- Ditch gt->is_ring_initialized.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 16:40:32 +02:00
Oscar Mateo ec3e9963a6 drm/i915/bdw: Deferred creation of user-created LRCs
The backing objects and ringbuffers for contexts created via open
fd are actually empty until the user starts sending execbuffers to
them. At that point, we allocate & populate them. We do this because,
at create time, we really don't know which engine is going to be used
with the context later on (and we don't want to waste memory on
objects that we might never use).

v2: As contexts created via ioctl can only be used with the render
ring, we have enough information to allocate & populate them right
away.

v3: Defer the creation always, even with ioctl-created contexts, as
requested by Daniel Vetter.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 16:25:58 +02:00
Chris Wilson 906843c3a1 drm/i915: Simplify relocate_entry_gtt() and make 64-bit safe
Even though we should not try to use 4+GiB GTTs on 32-bit systems, by
using a local variable we can future proof the code whilst making it
easier to read.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Appease checkpatch a bit.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 14:16:04 +02:00
Chris Wilson 060e82c6f4 drm/i915: Remove redundant list_empty(eb->vmas) tests in execbuffer
Part of the pre-validation for an execbuffer call is that there is at
least one object in the execlist. As we bail if we fail to lookup any
object, we can be sure that after the eb_lookup_vma() there is at least
one object in the vma list and so we do not need to assert.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 14:15:03 +02:00
Chris Wilson ad19f10bc2 drm/i915: Pre-validate the NEED_GTTS flag for execbuffer
We have an implementation requirement that precludes the user from
requesting a ggtt entry when the device is operating in ppgtt mode. Move
the current check from inside the execbuffer object collation to the
prevalidation phase.

v2: Roll both invalid flags checks into one

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 14:15:02 +02:00
Daniel Vetter da51a1e7e3 drm/i915: Fix secure dispatch with full ppgtt
Based upon a hunk from a patch from Chris Wilson, but augmented to:
- Process the batch in the full ppgtt vm so that self-relocations
  match again with userspace's expectations..
- Add a comment why plain pin for the global gtt binding is safe at
  that point.

v2: Drop local bind_vm variable (Chris).

v3: Explain why this works despite the lack of proper active tracking
for the ggtt batch vma.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 13:49:57 +02:00
Chris Wilson 82b6b6d786 drm/i915: Remove fenced_gpu_access and pending_fenced_gpu_access
This migrates the fence tracking onto the existing seqno
infrastructure so that the later conversion to tracking via requests is
simplified.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 12:20:25 +02:00
Chris Wilson e6a844687c drm/i915: Force CPU relocations if not GTT mapped
Move the decision on whether we need to have a mappable object during
execbuffer to the fore and then reuse that decision by propagating the
flag through to reservation. As a corollary, before doing the actual
relocation through the GTT, we can make sure that we do have a GTT
mapping through which to operate.

Note that the key to make this work is to ditch the
obj->map_and_fenceable unbind optimization - with full ppgtt it
doesn't make a lot of sense any more anyway.

v2: Revamp and resend to ease future patches.
v3: Refresh patch rationale

References: https://bugs.freedesktop.org/show_bug.cgi?id=81094
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
[danvet: Explain why obj->map_and_fenceable is key and split out the
secure batch fix.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 12:01:29 +02:00
Oscar Mateo 78382593e9 drm/i915: Extract the actual workload submission mechanism from execbuffer
So that we isolate the legacy ringbuffer submission mechanism, which becomes
a good candidate to be abstracted away. This is prep-work for Execlists (which
will its own workload submission mechanism).

No functional changes.

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-07-08 12:31:06 +02:00
Oscar Mateo 821d66dd7c drm/i915: Emphasize that ctx->id is merely a user handle
This is an Execlists preparatory patch, since they make context ID become an
overloaded term:

- In the software, it was used to distinguish which context userspace was
  trying to use.
- In the BSpec, the term is used to describe the 20-bits long field the
  hardware uses to it to discriminate the contexts that are submitted to
  the ELSP and inform the driver about their current status (via Context
  Switch Interrupts and Context Status Buffers).

Initially, I tried to make the different meanings converge, but it proved
impossible:

- The software ctx->id is per-filp, while the hardware one needs to be
  globally unique.
- Also, we multiplex several backing states objects per intel_context,
  and all of them need unique HW IDs.
- I tried adding a per-filp ID and then composing the HW context ID as:
  ctx->id + file_priv->id + ring->id, but the fact that the hardware only
  uses 20-bits means we have to artificially limit the number of filps or
  contexts the userspace can create.

The ctx->user_handle renaming bits are done with this Cocci patch (plus
manual frobbing of the struct declaration):

    @@
    struct intel_context c;
    @@
    - (c).id
    + c.user_handle

    @@
    struct intel_context *c;
    @@
    - (c)->id
    + c->user_handle

Also, while we are at it, s/DEFAULT_CONTEXT_ID/DEFAULT_CONTEXT_HANDLE and
change the type to unsigned 32 bits.

v2: s/handle/user_handle and change the type to uint32_t as suggested by
Chris Wilson.

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-07-08 12:30:41 +02:00
Daniel Vetter f99d70690e drm/i915: Track frontbuffer invalidation/flushing
So these are the guts of the new beast. This tracks when a frontbuffer
gets invalidated (due to frontbuffer rendering) and hence should be
constantly scaned out, and when it's flushed again and can be
compressed/one-shot-upload.

Rules for flushing are simple: The frontbuffer needs one more full
upload starting from the next vblank. Which means that the flushing
can _only_ be called once the frontbuffer update has been latched.

But this poses a problem for pageflips: We can't just delay the
flushing until the pageflip is latched, since that would pose the risk
that we override frontbuffer rendering that has been scheduled
in-between the pageflip ioctl and the actual latching.

To handle this track asynchronous invalidations (and also pageflip)
state per-ring and delay any in-between flushing until the rendering
has completed. And also cancel any delayed flushing if we get a new
invalidation request (whether delayed or not).

Also call intel_mark_fb_busy in both cases in all cases to make sure
that we keep the screen at the highest refresh rate both on flips,
synchronous plane updates and for frontbuffer rendering.

v2: Lots of improvements

Suggestions from Chris:
- Move invalidate/flush in flush_*_domain and set_to_*_domain.
- Drop the flush in busy_ioctl since it's redundant. Was a leftover
  from an earlier concept to track flips/delayed flushes.
- Don't forget about the initial modeset enable/final disable.
  Suggested by Chris.

Track flips accurately, too. Since flips complete independently of
rendering we need to track pending flips in a separate mask. Again if
an invalidate happens we need to cancel the evenutal flush to avoid
races.

v3:
Provide correct header declarations for flip functions. Currently not
needed outside of intel_display.c, but part of the proper interface.

v4: Add proper domain management to fbcon so that the fbcon buffer is
also tracked correctly.

v5: Fixup locking around the fbcon set_to_gtt_domain call.

v6: More comments from Chris:
- Split out fbcon changes.
- Drop superflous checks for potential scanout before calling intel_fb
  functions - we can micro-optimize this later.
- s/intel_fb_/intel_fb_obj_/ to make it clear that this deals in gem
  object. We already have precedence for fb_obj in the pin_and_fence
  functions.

v7: Clarify the semantics of the flip flush handling by renaming
things a bit:
- Don't go through a gem object but take the relevant frontbuffer bits
  directly. These functions center on the plane, the actual object is
  irrelevant - even a flip to the same object as already active should
  cause a flush.
- Add a new intel_frontbuffer_flip for synchronous plane updates. It
  currently just calls intel_frontbuffer_flush since the implemenation
  differs.

This way we achieve a clear split between one-shot update events on
one side and frontbuffer rendering with potentially a very long delay
between the invalidate and flush.

Chris and I also had some discussions about mark_busy and whether it
is appropriate to call from flush. But mark busy is a state which
should be derived from the 3 events (invalidate, flush, flip) we now
have by the users, like psr does by tracking relevant information in
psr.busy_frontbuffer_bits. DRRS (the only real use of mark_busy for
frontbuffer) needs to have similar logic. With that the overall
mark_busy in the core could be removed.

v8: Only when retiring gpu buffers only flush frontbuffer bits we
actually invalidated in a batch. Just for safety since before any
additional usage/invalidate we should always retire current rendering.
Suggested by Chris Wilson.

v9: Actually use intel_frontbuffer_flip in all appropriate places.
Spotted by Chris.

v10: Address more comments from Chris:
- Don't call _flip in set_base when the crtc is inactive, avoids redunancy
  in the modeset case with the initial enabling of all planes.
- Add comments explaining that the initial/final plane enable/disable
  still has work left to do before it's fully generic.

v11: Only invalidate for gtt/cpu access when writing. Spotted by Chris.

v12: s/_flush/_flip/ in intel_overlay.c per Chris' comment.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-19 18:14:47 +02:00
Ville Syrjälä d593d992a9 drm/i915: Fix __user sparse warning
CHECK   linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c
linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1529:47: warning: incorrect type in initializer (different address spaces)
linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1529:47:    expected struct drm_i915_gem_exec_object2 *user_exec_list
linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1529:47:    got void [noderef] <asn:1>*
linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1533:61: warning: incorrect type in argument 1 (different address spaces)
linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1533:61:    expected void [noderef] <asn:1>*dst
linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1533:61:    got unsigned long long *<noident>

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-13 17:45:29 +02:00
Dave Airlie 8d4ad9d4bb Merge commit '9e9a928eed8796a0a1aaed7e0b676db86ba84594' into drm-next
Merge drm-fixes into drm-next.

Both i915 and radeon need this done for later patches.

Conflicts:
	drivers/gpu/drm/drm_crtc_helper.c
	drivers/gpu/drm/i915/i915_drv.h
	drivers/gpu/drm/i915/i915_gem.c
	drivers/gpu/drm/i915/i915_gem_execbuffer.c
	drivers/gpu/drm/i915/i915_gem_gtt.c
2014-06-05 20:28:59 +10:00
Chris Wilson d23db88c3a drm/i915: Prevent negative relocation deltas from wrapping
This is pure evil. Userspace, I'm looking at you SNA, repacks batch
buffers on the fly after generation as they are being passed to the
kernel for execution. These batches also contain self-referenced
relocations as a single buffer encompasses the state commands, kernels,
vertices and sampler. During generation the buffers are placed at known
offsets within the full batch, and then the relocation deltas (as passed
to the kernel) are tweaked as the batch is repacked into a smaller buffer.
This means that userspace is passing negative relocations deltas, which
subsequently wrap to large values if the batch is at a low address. The
GPU hangs when it then tries to use the large value as a base for its
address offsets, rather than wrapping back to the real value (as one
would hope). As the GPU uses positive offsets from the base, we can
treat the relocation address as the minimum address read by the GPU.
For the upper bound, we trust that userspace will not read beyond the
end of the buffer.

So, how do we fix negative relocations from wrapping? We can either
check that every relocation looks valid when we write it, and then
position each object such that we prevent the offset wraparound, or we
just special-case the self-referential behaviour of SNA and force all
batches to be above 256k. Daniel prefers the latter approach.

This fixes a GPU hang when it tries to use an address (relocation +
offset) greater than the GTT size. The issue would occur quite easily
with full-ppgtt as each fd gets its own VM space, so low offsets would
often be handed out. However, with the rearrangement of the low GTT due
to capturing the BIOS framebuffer, it is already affecting kernels 3.15
onwards. I think only IVB+ is susceptible to this bug, but the workaround
should only kick in rarely, so it seems sensible to always apply it.

v3: Use a bias for batch buffers to prevent small negative delta relocations
from wrapping.

v4 from Daniel:
- s/BIAS/BATCH_OFFSET_BIAS/
- Extract eb_vma_misplaced/i915_vma_misplaced since the conditions
  were growing rather cumbersome.
- Add a comment to eb_get_batch explaining why we do this.
- Apply the batch offset bias everywhere but mention that we've only
  observed it on gen7 gpus.
- Drop PIN_OFFSET_FIX for now, that slipped in from a feature patch.

v5: Add static to eb_get_batch, spotted by 0-day tester.

Testcase: igt/gem_bad_reloc
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78533
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-27 11:18:40 +03:00
Chris Wilson 9aab8bff7a drm/i915: Only copy back the modified fields to userspace from execbuffer
We only want to modifiy a single field in the userspace view of the
execbuffer command buffer, so explicitly change that rather than copy
everything back again.

This serves two purposes:

1. The single fields are much cheaper to copy (constant size so the
copy uses special case code) and much smaller than the whole array.

2. We modify the array for internal use that need to be masked from
the user.

Note: We need this backported since without it the next bugfix will
blow up when userspace recycles batchbuffers and relocations.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-27 11:18:39 +03:00
Oscar Mateo 273497e5cd drm/i915: s/i915_hw_context/intel_context
Up until now, contexts had one (and only one) backing object that was
used by the hardware to save/restore render ring contexts (via the
MI_SET_CONTEXT command). Other rings did not have or need this, so
our i915_hw_context struct had a 1:1 relationship with a a real HW
context.

With Logical Ring Contexts and Execlists, this is not possible anymore:
all rings need a backing object, and it cannot be reused. To prepare
for that, rename our contexts to the more generic term intel_context.

No functional changes.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-22 23:41:17 +02:00
Oscar Mateo a4872ba6d0 drm/i915: s/intel_ring_buffer/intel_engine_cs
In the upcoming patches we plan to break the correlation between
engine command streamers (a.k.a. rings) and ringbuffers, so it
makes sense to refactor the code and make the change obvious.

No functional changes.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-22 23:01:05 +02:00
Daniel Vetter bdf1e7e3db drm/i915: move bsd dispatch index somewhere better
Adding stuff at the bottom is really no how this should be done, since
that's the place for ums/dri dungeons.

This was added in

commit a8ebba75b3
Author: Zhao Yakui <yakui.zhao@intel.com>
Date:   Thu Apr 17 10:37:40 2014 +0800

    drm/i915: Use the coarse ping-pong mechanism based on drm fd to dispatch the BSD command on BDW GT3

Also add a note to prevent this from happening again - people really
should be less lazy and take more time to look for a good home of
their new driver-global state.

Cc: Imre Deak <imre.deak@intel.com>
Cc: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-22 15:06:31 +02:00
Chris Wilson 227f782e46 drm/i915: Retire requests before creating a new one
More fallout from

commit c8725f3dc0
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Mar 17 12:21:55 2014 +0000

    drm/i915: Do not call retire_requests from wait_for_rendering

is that we can completely fill all of memory using small objects, such
that we exhaust the filp space, and spend all of our time evicting
objects from the aperture. As such, we never fill the ring, and never
trigger the last resort flushing in

commit 1cf0ba1474
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon May 5 09:07:33 2014 +0100

    drm/i915: Flush request queue when waiting for ring space

and so all the requests are left active and the objects keep that last
active reference. Eventually the system comes to a halt as it runs out
of memory.

The impact is mainly limited to test cases as regular userspace will
trigger retirement by manually checking whether an object is active.

Testcase: igt/gem_lut_handle
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78724
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Guo Jinxian <jinxianx.guo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-19 15:30:56 +02:00
Daniel Vetter ffd93f2480 drm/i915: Work-around garbage DR4 from UXA
Somehow UXA submits a completely bogus DR4 value since essentially
forever. It was originally introduced in

commit bade7d7d2505a10a8a7d24b084aff9742e2d6d64
Author: Eric Anholt <eric@anholt.net>
Date:   Fri Jun 6 14:03:25 2008 -0700

    Use the DRM for submitting batchbuffers when available.

and dutifully copied around ever since. Since we want to keep the
general dirt catching around just special case the UXA value.

This regression was introduced in

commit 9cb346648d
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Thu Apr 24 08:09:11 2014 +0200

    drm/i915: Catch dirt in unused execbuffer fields

Comment from Chris' review:

"To be fair, it is a sensible value if one supposes a Region style API to
cliprects. Under that API, DR[14] define the extents of the clip region,
and ((0,0), (0,0)) [DR1==DR4==0] would mean all clipped, do not draw
anything."

v2: Pimp commit message a bit and remove the double space.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78494
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jörg Otte <jrg.otte@gmail.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-13 17:16:13 +02:00
Ben Widawsky d9ceb957fd drm/i915: Support 64b relocations
All the rest of the code to enable this is in my branch. Without my
branch, hitting > 32b offsets is impossible. The code has always
"supported" 64b, but it's never actually been run of tested. This change
doesn't actually fix anything. [1] I am not sure why X won't work yet. I
do not get hangs or obvious errors.

There are 3 fixes grouped together here. First is to remove the
hardcoded 0 for the upper dword of the relocation. The next fix is to
use a 64b value for target_offset. The final fix is to not directly
apply target_offset to reloc->delta. reloc->delta is part of ABI, and so
we cannot change it. As it stands, 32b is enough to represent everything
we're interested in representing anyway. The main problem is, we cannot
add greater than 32b values to it directly.

[1] Almost all of intel-gpu-tools is not yet ready to test 64b
relocations. There are a few places that expect 32b values for offsets
and these all won't work.

Cc: Rafael Barbalho <rafael.barbalho@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 16:04:23 +02:00
Ben Widawsky 9bcb144c83 drm/i915: Support 64b execbuf
Previously, our code only had a 32b offset value for where the
batchbuffer starts. With full PPGTT, and 64b canonical GPU address
space, that is an insufficient value. The code to expand is pretty
straight forward, and only one platform needs to do anything with the
extra bits.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 16:01:58 +02:00
Chris Wilson c8725f3dc0 drm/i915: Do not call retire_requests from wait_for_rendering
A common issue we have is that retiring requests causes recursion
through GTT manipulation or page table manipulation which we can only
handle at very specific points. However, to maintain internal
consistency (enforced through our sanity checks on write_domain at
various points in the GEM object lifecycle) we do need to retire the
object prior to marking it with a new write_domain, and also clear the
write_domain for the implicit flush following a batch.

Note that this then allows the unbound objects to still be on the active
lists, and so care must be taken when removing objects from unbound lists
(similar to the caveats we face processing the bound lists).

v2: Fix i915_gem_shrink_all() to handle updated object lifetime rules,
by refactoring it to call into __i915_gem_shrink().

v3: Missed an object-retire prior to changing cache domains in
i915_gem_object_set_cache_leve()

v4: Rebase

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:09:15 +02:00
Zhao Yakui a8ebba75b3 drm/i915: Use the coarse ping-pong mechanism based on drm fd to dispatch the BSD command on BDW GT3
The BDW GT3 has two independent BSD rings, which can be used to process the
video commands. To be simpler, it is transparent to user-space driver/middle.
Instead the kernel driver will decide which ring is to dispatch the BSD video
command.

As every BSD ring is powerful, it is enough to dispatch the BSD video command
based on the drm fd. In such case it can play back video stream while encoding
another video stream. The coarse ping-pong mechanism is used to determine
which BSD ring is used to dispatch the BSD video command.

V1->V2: Follow Daniel's comment and use the simple ping-pong mechanism.
This is only to add the support of dual BSD rings on BDW GT3 machine.
The further optimization will be considered in another patch set.

V2->V3: Follow Daniel's comment to use the struct_mutext instead of
atomic_t during determining which ring can be used to dispatch Video command.

Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:08:49 +02:00
Zhao Yakui b1a93306ed drm/i915: Update the restrict check to filter out wrong Ring ID passed by user-space
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:08:44 +02:00
Daniel Vetter 9cb346648d drm/i915: Catch dirt in unused execbuffer fields
We need to make sure that userspace keeps on following the contract,
otherwise we won't be able to use the reserved fields at all.

v2: Add DRM_DEBUG (Chris)

Testcase: igt/gem_exec_params/*-dirt
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:08:43 +02:00
Daniel Vetter c0f5b82cd1 drm/i915: Catch abuse of I915_EXEC_CONSTANTS_*
A bit tricky since 0 is also a valid constant ...

v2: Add DRM_DEBUG (Chris)

Testcase: igt/gem_exec_params/rel-constants-*
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:08:42 +02:00
Daniel Vetter 9d662da8b6 drm/i915: Catch abuse of I915_EXEC_GEN7_SOL_RESET
Currently we catch it, but silently succeed. Our userspace is
better than this.

v2: Add DRM_DEBUG (Chris)

Testcase: igt/gem_exec_params/sol-reset-*
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:08:41 +02:00
Dave Airlie 885ac04ab3 Merge tag 'drm-intel-next-2014-04-16' of git://anongit.freedesktop.org/drm-intel into drm-next
drm-intel-next-2014-04-16:
- vlv infoframe fixes from Jesse
- dsi/mipi fixes from Shobhit
- gen8 pageflip fixes for LRI/SRM from Damien
- cmd parser fixes from Brad Volkin
- some prep patches for CHV, DRRS, ...
- and tons of little things all over
drm-intel-next-2014-04-04:
- cmd parser for gen7 but only in enforcing and not yet granting mode - the
  batch copying stuff is still missing. Also performance is a bit ... rough
  (Brad Volkin + OACONTROL fix from Ken).
- deprecate UMS harder (i.e. CONFIG_BROKEN)
- interrupt rework from Paulo Zanoni
- runtime PM support for bdw and snb, again from Paulo
- a pile of refactorings from various people all over the place to prep for new
  stuff (irq reworks, power domain polish, ...)

drm-intel-next-2014-04-04:
- cmd parser for gen7 but only in enforcing and not yet granting mode - the
  batch copying stuff is still missing. Also performance is a bit ... rough
  (Brad Volkin + OACONTROL fix from Ken).
- deprecate UMS harder (i.e. CONFIG_BROKEN)
- interrupt rework from Paulo Zanoni
- runtime PM support for bdw and snb, again from Paulo
- a pile of refactorings from various people all over the place to prep for new
  stuff (irq reworks, power domain polish, ...)

Conflicts:
	drivers/gpu/drm/i915/i915_gem_context.c
2014-05-01 09:11:37 +10:00
Chris Wilson 691e6415c8 drm/i915: Always use kref tracking for all contexts.
If we always initialize kref for the context, even if we are using fake
contexts for hangstats when there is no hw support, we can forgo the
dance to dereference the ctx->obj and inspect whether we are permitted
to use kref inside i915_gem_context_reference() and _unreference().

My ulterior motive here is to improve the debugging of a use-after-free
of ctx->obj. This patch avoids the dereference here and instead forces
the assertion checks associated with kref.

v2: Refactor the fake contexts to being even more like the real
contexts, so that there is much less duplicated and special case code.

v3: Tweaks.
v4: Tweaks, minor.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76671
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: lu hua <huax.lu@intel.com>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
[Jani: tiny change to backport to drm-intel-fixes.]
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2014-04-11 13:29:51 +03:00
Ben Widawsky 935f38d694 drm/i915: Unref context on failed eb_create
I opted to do this instead of grabbing the context reference after
eb_create since eb_create can potentially call the shrinker, and that
makes things very complicated. This simple patch balances the ref count
without requiring a great deal of review to make sure the shrinker path
is safe.

Theoretically (by design) the shrinker can end up destroying a context,
which enforces the reasoning for doing the fix this way instead of
moving the reference to later in the function.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-04-09 14:37:11 +02:00
Jani Nikula 50227e1cae drm/i915: prefer struct drm_i915_private to drm_i915_private_t
Remove the rest of the references to drm_i915_private_t. No functional
changes.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
[danvet: Drop hunk in i915_cmd_parser.c]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-31 15:34:21 +02:00
Brad Volkin 351e3db2b3 drm/i915: Implement command buffer parsing logic
The command parser scans batch buffers submitted via execbuffer ioctls before
the driver submits them to hardware. At a high level, it looks for several
things:

1) Commands which are explicitly defined as privileged or which should only be
   used by the kernel driver. The parser generally rejects such commands, with
   the provision that it may allow some from the drm master process.
2) Commands which access registers. To support correct/enhanced userspace
   functionality, particularly certain OpenGL extensions, the parser provides a
   whitelist of registers which userspace may safely access (for both normal and
   drm master processes).
3) Commands which access privileged memory (i.e. GGTT, HWS page, etc). The
   parser always rejects such commands.

See the overview comment in the source for more details.

This patch only implements the logic. Subsequent patches will build the tables
that drive the parser.

v2: Don't set the secure bit if the parser succeeds
Fail harder during init
Makefile cleanup
Kerneldoc cleanup
Clarify module param description
Convert ints to bools in a few places
Move client/subclient defs to i915_reg.h
Remove the bits_count field

OTC-Tracker: AXIA-4631
Change-Id: I50b98c71c6655893291c78a2d1b8954577b37a30
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
[danvet: Appease checkpatch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-07 22:37:00 +01:00
Daniel Vetter 8ea99c9287 drm/i915: Only bind each object rather than for every execbuffer
One side-effect of the introduction of ppgtt was that we needed to
rebind the object into the appropriate vm (and global gtt in some
peculiar cases). For simplicity this was done twice for every object on
every call to execbuffer. However, that adds a tremendous amount of CPU
overhead (rewriting all the PTE for all objects into WC memory) per
draw. The fix is to push all the decision about which vm to bind into
and when down into the low-level bind routines through hints rather than
performing the bind unconditionally in the execbuffer routine.

Note that this is a regression introduced in the full ppgtt feature
branch, before this we've only done re-bound objects when the relevant
has_(aliasing_ppgtt|global_gtt)_mapping flag was clear. But since
that's per-object and not per-vma that optimization broke.

v2: Split out prep work and unrelated changes.

v3: Bring back functional change around PIN_GLOBAL that I've
accidentally split out.

v4: Remove the temporary hack for the old binding logic to avoid
bisection issues.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72906
Tested-by: jianx.zhou@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-14 14:18:38 +01:00
Daniel Vetter bf3d149b25 drm/i915: split PIN_GLOBAL out from PIN_MAPPABLE
With abitrary pin flags it makes sense to split out a "please bind
this into global gtt" from the "please allocate in the mappable
range".

Use this unconditionally in our global gtt pin helper since this is
what its callers want. Later patches will drop PIN_MAPPABLE where it's
not strictly needed.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-14 14:17:27 +01:00
Daniel Vetter 1ec9e26dda drm/i915: Consolidate binding parameters into flags
Anything more than just one bool parameter is just a pain to read,
symbolic constants are much better.

Split out from Chris' vma-binding rework patch.

v2: Undo the behaviour change in object_pin that Chris spotted.

v3: Split out misplaced hunk to handle set_cache_level errors,
spotted by Jani.

v4: Keep the current over-zealous binding logic in the execbuffer code
working with a quick hack while the overall binding code gets shuffled
around.

v5: Reorder the PIN_ flags for more natural patch splitup.

v6: Pull out the PIN_GLOBAL split-up again.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-14 14:16:58 +01:00
Jani Nikula d330a9530c drm/i915: move module parameters into a struct, in a new file
With 20+ module parameters, I think referring to them via a struct
improves clarity over just having a bunch of globals. While at it, move
the parameter initialization and definitions into a new file
i915_params.c to reduce clutter in i915_drv.c.

Apart from the ill-named i915_enable_rc6, i915_enable_fbc and
i915_enable_ppgtt parameters, for which we lose the "i915_" prefix
internally, the module parameters now look the same both on the kernel
command line and in code. For example, "i915.modeset".

The downsides of the change are losing static on a couple of variables
and not having the initialization and module_param_named() right next to
each other. On the other hand, all module parameters are now defined in
one place at i915_params.c. Plus you can do this to find all module
parameter references:

$ git grep "i915\." -- drivers/gpu/drm/i915

v2:
- move the definitions into a new file
- s/i915_params/i915/
- make i915_try_reset i915.reset, for consistency

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-01-27 17:16:45 +01:00
Rodrigo Vivi a25eebb0af drm: dp helper: Add DP test sink CRC definition.
This address will be used to verify panel CRC for test and
validation purposes.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
[danvet: Fix whitespace fail.]
Acked-by: Dave Airlie <airlied@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-01-27 09:55:23 +01:00
Daniel Vetter 0e5539b923 Merge branch 'topic/ppgtt' into drm-intel-next-queued
Because whatever.*

* This should contain a fairly long list of issues and still
unresolved resgressions, but I didn't really get a vote.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-01-25 21:14:57 +01:00
Ben Widawsky 8b78f0e588 drm/i915: Clarify relocation errnos
While trying to find a random -EINVAL from a failing test, I noticed we
had a few hard to follow return values.

The first two hunks in this patch replace completely useless
initialization of ret. The last several hunks help to distinguish
between altering 'return ret' and 'return <ERROR>'

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-01-22 09:58:25 +01:00
Daniel Vetter 0d9d349d87 Merge commit origin/master into drm-intel-next
Conflicts are getting out of hand, and now we have to shuffle even
more in -next which was also shuffled in -fixes (the call for
drm_mode_config_reset needs to move yet again).

So do a proper backmerge. I wanted to wait with this for the 3.13
relaese, but alas let's just do this now.

Conflicts:
	drivers/gpu/drm/i915/i915_reg.h
	drivers/gpu/drm/i915/intel_ddi.c
	drivers/gpu/drm/i915/intel_display.c
	drivers/gpu/drm/i915/intel_pm.c

Besides the conflict around the forcewake get/put (where we chaged the
called function in -fixes and added a new parameter in -next) code all
the current conflicts are of the adjacent lines changed type.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-01-16 22:06:30 +01:00
Ben Widawsky 72ad5c45f0 drm/i915/ppgtt: Fix ioctl errno for "no such context"
Without this fix the ioctls silently succeeded (but actually did
nothing).

It makes all the code which calls into this function way too confusing.

v2: Fix destroy IOCTL as well

v3: Clarify the other two callers of i915_gem_context_get() to never
check for NULL. (Mika)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72903
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Testcase: igt/gem_ctx_exec/basic
[danvet: Fix up the commit message and actually bother to mention the
testcase this fixes.]
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-01-07 08:50:11 +01:00
Daniel Vetter a7c1d426ef drm/i915: Don't check for NEEDS_GTT when deciding the address space
This means something different and is only relevant for gen6 and the
reason why we cant use anything else than aliasing ppgtt there.

Note that the currently implemented logic for secure batches is
broken: Userspace wants the buffer both in ppgtt (for self-referencing
relocations) and in ggtt (for priveledge operations).

This is the same issue the command parser is also facing.
Unfortunately our coverage for corner-cases of self-referencing
batches is spotty.

Note that this will break vsync'ed Xv and DRI2 copies.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 17:50:40 +01:00
Daniel Vetter 2c9f8d56a1 drm/i915: Reject NEEDS_GTT relocations with full ppgtt
Doesn't make sense. Spotted while fixing an issue Chris
noticed in the same area.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 17:50:39 +01:00
Daniel Vetter 7c9c4b8f5d drm/i915: Reject non-default contexts on non-render again
This reverts the abi-change from

commit 67e3d2979b
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Fri Dec 6 14:11:01 2013 -0800

    drm/i915: Permit contexts on all rings

We don't actually need this, only the internal changes to allow
contexts on all rings for the purpose of ppgtt switching are required.
And I'm not sure whether this is the right thing to do given some of
the hw features in the pipeline.

Also, new abi needs userspace patches as a proof-of-need, which is
completely lacking here.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 17:50:38 +01:00
Ben Widawsky 7e0d96bc03 drm/i915: Use multiple VMs -- the point of no return
As with processes which run on the CPU, the goal of multiple VMs is to
provide process isolation. Specific to GEN, there is also the ability to
map more objects per process (2GB each instead of 2Gb-2k total).

For the most part, all the pipes have been laid, and all we need to do
is remove asserts and actually start changing address spaces with the
context switch. Since prior to this we've converted the setting of the
page tables to a streamed version, this is quite easy.

One important thing to point out (since it'd been hotly contested) is
that with this patch, every context created will have it's own address
space (provided the HW can do it).

v2: Disable BDW on rebase

NOTE: I tried to make this commit as small as possible. I needed one
place where I could "turn everything on" and that is here. It could be
split into finer commits, but I didn't really see much point.

Cc: Eric Anholt <eric@anholt.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 16:24:52 +01:00
Daniel Vetter 3d7f0f9dcc Merge commit drm-intel-fixes into topic/ppgtt
I need the tricky do_switch fix before I can merge the final piece of
the ppgtt enabling puzzle. Otherwise the conflict will be a real pain
to resolve since the do_switch hunk from -fixes must be placed at the
exact right place within a hunk in the next patch.

Conflicts:
	drivers/gpu/drm/i915/i915_gem_context.c
	drivers/gpu/drm/i915/i915_gem_execbuffer.c
	drivers/gpu/drm/i915/intel_display.c

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 16:23:37 +01:00
Ben Widawsky 41bde5535a drm/i915: Get context early in execbuf
We need to have the address space when reserving space for the objects.
Since the address space and context are tied together, and reserve
occurs before context switch (for good reason), we must lookup our
context earlier in the process.

This leaves some room for optimizations where we no longer need to use
ctx_id in certain places. This will be addressed in a subsequent patch.

Important tricky bit:
Because slow relocations during execbuffer drop struct_mutex

Perhaps it would be best to acquire the reference when we get the
context, but I'll save that for another day (note I have written the
patch before, and I found the changes required to be uglier than this).

Note that since we currently access everything via context id, and not
the data structure this is fine, though not desirable. The next change
attempts to get the context only once via the context ID idr lookup, and
as such, the following can happen:

CTX-A is created, refcount = 1
CTX-A execbuf, mutex dropped
close IOCTL called on CTX-A, refcount = 0
CTX-A resumes in execbuf.

v2: Rebased on top of
commit b6359918b8
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date:   Wed Oct 30 15:44:16 2013 +0200

    drm/i915: add i915_get_reset_stats_ioctl

v3: Rebased on top of
commit 25b3dfc87b
Author: Mika Westerberg <mika.westerberg@linux.intel.com>
Date:   Tue Nov 12 11:57:30 2013 +0200

Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date:   Tue Nov 26 16:14:33 2013 +0200

    drm/i915: check context reset stats before relocations

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:52:42 +01:00