Commit Graph

929 Commits

Author SHA1 Message Date
Ben Widawsky bdf4fd7ea0 drm/i915: Do aliasing PPGTT init with contexts
We have a default context which suits the aliasing PPGTT well. Tie them
together so it looks like any other context/PPGTT pair. This makes the
code cleaner as it won't have to special case aliasing as often.

The patch has one slightly tricky part in the default context creation
function. In the future (and on aliased setup) we create a new VM for a
context (potentially). However, if we have aliasing PPGTT, which occurs
at this point in time for all platforms GEN6+, we can simply manage the
refcounting to allow things to behave as normal. Now is a good time to
recall that the aliasing_ppgtt doesn't have a real VM, it uses the GGTT
drm_mm.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:32:14 +01:00
Ben Widawsky a3d67d2396 drm/i915: PPGTT vfuncs should take a ppgtt argument
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:27:56 +01:00
Ben Widawsky 2fa48d8d4a drm/i915: Split context enabling from init
We **need** to do this for exactly 1 reason, because we want to embed a
PPGTT into the context, but we don't want to special case the default
context.

To achieve that, we must be able to initialize contexts after the GTT is
setup (so we can allocate and pin the default context's BO), but before
the PPGTT and rings are initialized. This is because, currently, context
initialization requires ring usage. We don't have rings until after the
GTT is setup. If we split the enabling part of context initialization,
the part requiring the ringbuffer, we can untangle this, and then later
embed the PPGTT

Incidentally this allows us to also adhere to the original design of
context init/fini in future patches: they were only ever meant to be
called at driver load and unload.

v2: Move hw_contexts_disabled test in i915_gem_context_enable() (Chris)

v3: BUG_ON after checking for disabled contexts. Or else it blows up pre
gen6 (Ben)

v4: Forward port
Modified enable for each ring, since that patch is earlier in the series
Dropped ring arg from create_default_context so it can be used by others

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:27:55 +01:00
Ben Widawsky acce9ffa48 drm/i915: Better reset handling for contexts
This patch adds to changes for contexts on reset:
Sets last context to default - this will prevent the context switch
happening after a reset. That switch is not possible because the
rings are hung during reset and context switch requires reset. This
behavior will need to be reworked in the future, but this is what we
want for now.

In the future, we'll also want to reset the guilty context to
uninitialized. We should wait for ARB_Robustness related code to land
for that.

This is somewhat for paranoia.  Because we really don't know what the
GPU was doing when it hung, or the state it was in (mid context write,
for example), later restoring the context is a bad idea. By setting the
flag to not initialized, the next load of that context will not restore
the state, and thus on the subsequent switch away from the context will
overwrite the old data.

NOTE: This code needs a fixup when we actually have multiple VMs. The
issue that can occur is inactive objects in a VM will need to be
destroyed before the last context unref. This can now happen via the
fake switch introduced in this patch (and it other ways in the future)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:27:54 +01:00
Ben Widawsky e422b888eb drm/i915: Add a context open function
We'll be doing a bit more stuff with each file, so having our own open
function should make things clean.

This also allows us to easily add conditionals for stuff we don't want
to do when we don't have HW contexts.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:27:51 +01:00
Ben Widawsky 6f65e29aca drm/i915: Create bind/unbind abstraction for VMAs
To sum up what goes on here, we abstract the vma binding, similarly to
the previous object binding. This helps for distinguishing legacy
binding, versus modern binding. To keep the code churn as minimal as
possible, I am leaving in insert_entries(). It serves as the per
platform pte writing basically. bind_vma and insert_entries do share a
lot of similarities, and I did have designs to combine the two, but as
mentioned already... too much churn in an already massive patchset.

What follows are the 3 commits which existed discretely in the original
submissions. Upon rebasing on Broadwell support, it became clear that
separation was not good, and only made for more error prone code. Below
are the 3 commit messages with all their history.

drm/i915: Add bind/unbind object functions to VMA
drm/i915: Use the new vm [un]bind functions
drm/i915: reduce vm->insert_entries() usage

drm/i915: Add bind/unbind object functions to VMA

As we plumb the code with more VM information, it has become more
obvious that the easiest way to deal with bind and unbind is to simply
put the function pointers in the vm, and let those choose the correct
way to handle the page table updates. This change allows many places in
the code to simply be vm->bind, and not have to worry about
distinguishing PPGTT vs GGTT.

Notice that this patch has no impact on functionality. I've decided to
save the actual change until the next patch because I think it's easier
to review that way. I'm happy to squash the two, or let Daniel do it on
merge.

v2:
Make ggtt handle the quirky aliasing ppgtt
Add flags to bind object to support above
Don't ever call bind/unbind directly for PPGTT until we have real, full
PPGTT (use NULLs to assert this)
Make sure we rebind the ggtt if there already is a ggtt binding.  This
happens on set cache levels.
Use VMA for bind/unbind (Daniel, Ben)

v3: Reorganize ggtt_vma_bind to be more concise and easier to read
(Ville). Change logic in unbind to only unbind ggtt when there is a
global mapping, and to remove a redundant check if the aliasing ppgtt
exists.

v4: Make the bind function a bit smarter about the cache levels to avoid
unnecessary multiple remaps. "I accept it is a wart, I think unifying
the pin_vma / bind_vma could be unified later" (Chris)
Removed the git notes, and put version info here. (Daniel)

v5: Update the comment to not suck (Chris)

v6:
Move bind/unbind to the VMA. It makes more sense in the VMA structure
(always has, but I was previously lazy). With this change, it will allow
us to keep a distinct insert_entries.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

drm/i915: Use the new vm [un]bind functions

Building on the last patch which created the new function pointers in
the VM for bind/unbind, here we actually put those new function pointers
to use.

Split out as a separate patch to aid in review. I'm fine with squashing
into the previous patch if people request it.

v2: Updated to address the smart ggtt which can do aliasing as needed
Make sure we bind to global gtt when mappable and fenceable. I thought
we could get away without this initialy, but we cannot.

v3: Make the global GTT binding explicitly use the ggtt VM for
bind_vma(). While at it, use the new ggtt_vma helper (Chris)

At this point the original mailing list thread diverges. ie.

v4^:
use target_obj instead of obj for gen6 relocate_entry
vma->bind_vma() can be called safely during pin. So simply do that
instead of the complicated conditionals.
Don't restore PPGTT bound objects on resume path
Bug fix in resume path for globally bound Bos
Properly handle secure dispatch
Rebased on vma bind/unbind conversion

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

drm/i915: reduce vm->insert_entries() usage

FKA: drm/i915: eliminate vm->insert_entries()

With bind/unbind function pointers in place, we no longer need
insert_entries. We could, and want, to remove clear_range, however it's
not totally easy at this point. Since it's used in a couple of place
still that don't only deal in objects: setup, ppgtt init, and restore
gtt mappings.

v2: Don't actually remove insert_entries, just limit its usage. It will
be useful when we introduce gen8. It will always be called from the vma
bind/unbind.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:27:50 +01:00
Ben Widawsky d7f46fc4e7 drm/i915: Make pin count per VMA
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:27:49 +01:00
Ben Widawsky feb822cfc2 drm/i915: Handle inactivating objects for all VMAs
This came from a patch called, "drm/i915: Move active to vma"

When moving an object to the inactive list, we do it for all VMs for
which the object is bound.

The primary difference from that patch is this time around we don't not
track 'active' per vma, but rather by object. Therefore, we only need
one unref.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:27:46 +01:00
Ben Widawsky c39538a88d drm/i915: Takedown drm_mm on failed gtt setup
This was found by code inspection. If the GTT setup fails then we are
left without properly tearing down the drm_mm.

Hopefully this never happens.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:27:45 +01:00
Ben Widawsky 6e164c3382 drm/i915: Allow ggtt lookups to not WARN
To be able to effectively use the GGTT object lookup function, we don't
want to warn when there is no GGTT mapping. Let the caller deal with it
instead.

Originally, I had intended to have this behavior, and has not
introduced the WARN. It was introduced during review with the addition
of the follow commit

commit 5c2abbeab7
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date:   Tue Sep 24 09:57:57 2013 -0700

    drm/i915: Provide a cheap ggtt vma lookup

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:27:45 +01:00
Ben Widawsky 6f425321e0 drm/i915: Don't unconditionally try to deref aliasing ppgtt
Since the beginning, the functions which try to properly reference the
aliasing PPGTT have deferences a potentially null aliasing_ppgtt member.
Since the accessors are meant to be global, this will not do.

Introduced originally in:
commit a70a3148b0
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Wed Jul 31 16:59:56 2013 -0700

    drm/i915: Make proper functions for VMs

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:27:44 +01:00
Paulo Zanoni 48018a57a8 drm/i915: release the GTT mmaps when going into D3
So we'll get a fault when someone tries to access the mmap, then we'll
wake up from D3.

v2: - Rebase
v3: - Use gtt active/inactive

Testcase: igt/pm_pc8/gem-mmap-gtt
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
[danvet: Add comment + WARN as discussed with Paulo on irc.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-14 15:35:52 +01:00
Mika Kuoppala 168c3f2151 drm/i915: dont call irq_put when irq test is on
If test is running, irq_get was not called so we should gain
balance by not doing irq_put

"So the rule is: if you access unlocked values, you use ACCESS_ONCE().
You don't say "but it can't matter". Because you simply don't know."
-- Linus

v2: use local variable so it can't change during test (Chris)

v3: update commit msg and use ACCESS_ONCE (Ville)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-12 22:58:44 +01:00
Mika Kuoppala 47e9766df0 drm/i915: Fix timeout with missed interrupts in __wait_seqno
Commit 094f9a54e3 ("drm/i915: Fix __wait_seqno to use true infinite
timeouts") added support for __wait_seqno to detect missing interrupts and
go around them by polling. As there is also timeout detection in
__wait_seqno, the polling and timeout detection were done with the same
timer.

When there has been missed interrupts and polling is needed, the timer is
set to trigger in (now + 1) jiffies in future, instead of the caller
specified timeout.

Now when io_schedule() returns, we calculate the jiffies left to timeout
using the timer expiration value. As the current jiffies is now bound to be
always equal or greater than the expiration value, the timeout_jiffies will
become zero or negative and we return -ETIME to caller even tho the
timeout was never reached.

Fix this by decoupling timeout calculation from timer expiration.

v2: Commit message with some sense in it (Chris Wilson)

v3: add parenthesis on timeout_expire calculation

v4: don't read jiffies without timeout (Chris Wilson)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-12 15:27:21 +01:00
Chris Wilson 4db080f9e9 drm/i915: Fix erroneous dereference of batch_obj inside reset_status
As the rings may be processed and their requests deallocated in a
different order to the natural retirement during a reset,

/* Whilst this request exists, batch_obj will be on the
 * active_list, and so will hold the active reference. Only when this
 * request is retired will the the batch_obj be moved onto the
 * inactive_list and lose its active reference. Hence we do not need
 * to explicitly hold another reference here.
 */

is violated, and the batch_obj may be dereferenced after it had been
freed on another ring. This can be simply avoided by processing the
status update prior to deallocating any requests.

Fixes regression (a possible OOPS following a GPU hang) from
commit aa60c664e6
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date:   Wed Jun 12 15:13:20 2013 +0300

    drm/i915: find guilty batch buffer on ring resets

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
[danvet: Add the code comment Chris supplied.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-12 10:49:05 +01:00
Paulo Zanoni f65c916898 drm/i915: add runtime put/get calls at the basic places
If I add code to enable runtime PM on my Haswell machine, start a
desktop environment, then enable runtime PM, these functions will
complain that they're trying to read/write registers while the
graphics card is suspended.

v2: - Simplify i915_gem_fault changes.

Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
[danvet: Drop the hunk in i915_hangcheck_elapsed, it's the wrong thing
to do.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-10 22:47:33 +01:00
Chris Wilson 70903c3ba8 drm/i915: Fix ordering of unbind vs unpin pages
It is useful to assert that if the object is bound, then it must have
its pages pinned to prevent the shrinker from reaping its backing store.
This is even more useful with the introduction of real-ppgtt whereupon
we may have the object bound into several vma, with each instance
pinning the backing store. This assertion breaks down during unbind
where we unpinned the backing store before decoupling the vma binding.
This can be fixed with a trivial reording of the unbind sequence, which
reinforces the

   pin pages
   bind to vma
   ...
   unbind from vma
   unpin pages

concept.

v2: Bonus comment

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-04 12:10:50 +01:00
Ville Syrjälä 0bf2134780 drm/i915: MI_PREDICATE_RESULT_2 is HSW only
The MI_PREDICATE_RESULT_2 register exits only on HSW. On other
platforms the same offset is either reserved, or contains some
other register. So write the register only on HSW.

This regression has been introduced in

commit 9435373ef8
Author: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Date:   Wed Aug 28 16:45:46 2013 -0300

    drm/i915: Report enabled slices on Haswell GT3

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
[danvet: Add regression notice.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-29 15:00:03 +01:00
Daniel Vetter c09cd6e969 Merge branch 'backlight-rework' into drm-intel-next-queued
Pull in Jani's backlight rework branch. This was merged through a
separate branch to be able to sort out the Broadwell conflicts
properly before pulling it into the main development branch.

Conflicts:
	drivers/gpu/drm/i915/intel_display.c

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-15 10:02:39 +01:00
Ben Widawsky 31a5336e1c drm/i915/bdw: Swizzling support
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-08 18:09:37 +01:00
Ben Widawsky 5ab31333ac drm/i915/bdw: Fences on gen8 look just like gen7
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-08 18:09:36 +01:00
Ben Widawsky 8245be3139 drm/i915: Require HW contexts (when possible)
v2: Fixed the botched locking on init_hw failure in i915_reset (Ville)
Call cleanup_ringbuffer on failed context create in init_hw (Ville)

v3: Add dev argument ti clean_ringbuffer

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-07 09:35:44 +01:00
Paulo Zanoni de45eaf7b9 drm/i915: fix open-coded DIV_ROUND_UP
Use the nice Kernel macro, it makes the code much more readable.

Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-21 10:04:03 +02:00
Daniel Vetter aa5f802181 drm/i915: Use unsigned long for obj->user_pin_count
At least on linux sizeof(long) == sizeof(void*) and the thinking
is that you can grab about as many references as there's memory.

Doesn't really matter, just a bit of OCD since the fixed size data
type in a pure in-kernel datastructure look off.

v2: Ville asked for an overflow check since no one prevents userspace
from incrementing the pin count forever.

v3: s/INT/LONG/, noticed by Chris.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-16 22:06:39 +02:00
Chris Wilson 45c5f2022c drm/i915: Disable all GEM timers and work on unload
We have two once very similar functions, i915_gpu_idle() and
i915_gem_idle(). The former is used as the lower level operation to
flush work on the GPU, whereas the latter is the high level interface to
flush the GEM bookkeeping in addition to flushing the GPU. As such
i915_gem_idle() also clears out the request and activity lists and
cancels the delayed work. This is what we need for unloading the driver,
unfortunately we called i915_gpu_idle() instead.

In the process, make sure that when cancelling the delayed work and
timer, which is synchronous, that we do not hold any locks to prevent a
deadlock if the work item is already waiting upon the mutex. This
requires us to push the mutex down from the caller to i915_gem_idle().

v2: s/i915_gem_idle/i915_gem_suspend/

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70334
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: xunx.fang@intel.com
[danvet: Only set ums.suspended for !kms as discussed earlier. Chris
noticed that this slipped through.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-16 19:42:14 +02:00
Ben Widawsky 3d57e5bd12 drm/i915: Do a fuller init after reset
I had this lying around from he original PPGTT series, and thought we
might try to get it in by itself.

It's convenient to just call i915_gem_init_hw at reset because we'll be
adding new things to that function, and having just one function to call
instead of reimplementing it in two places is nice.

In order to accommodate we cleanup ringbuffers in order to bring them
back up cleanly. Optionally, we could also teardown/re initialize the
default context but this was causing some problems on reset which I
wasn't able to fully debug, and is unnecessary with the previous context
init/enable split.

This essentially reverts:
commit 8e88a2bd59
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Tue Jun 19 18:40:00 2012 +0200

    drm/i915: don't call modeset_init_hw in i915_reset

It seems to work for me on ILK now. Perhaps it's due to:
commit 8a5c2ae753
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Thu Mar 28 13:57:19 2013 -0700

    drm/i915: fix ILK GPU reset for render

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-16 11:08:08 +02:00
Daniel Vetter 3bbbe706e8 drm/i915: check that the i965g/gm 4G limit is really obeyed
In truly crazy circumstances shmem might give us the wrong type of
page. So be a bit paranoid and double check this.

Reviewer: Damien Lespiau <damien.lespiau@intel.com>
Cc: Rob Clark <robdclark@gmail.com>
References: http://lkml.org/lkml/2011/7/11/238
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-10 12:47:05 +02:00
Chris Wilson d9973b4356 drm/i915: Fix type mismatch and accounting in i915_gem_shrink
The interface uses an unsigned long, and we can use the unsigned counter
throughout our code, so do so. In the process, we notice one instance
where the shrink count is based on a heuristic rather than the result,
and another where we ask for too many pages to be purged.

v2: nr_to_scan needs to be promoted to a long as well, so just use
    sc->nr_to_scan directly.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-10 12:46:48 +02:00
Chris Wilson 5035c275af drm/i915: Call io_schedule() whilst whilsting for the GPU
Since we are waiting upon IO completion, inform the kernel through use
of the io_schedule() call rather than the regular schedule(). This
should allow the kernel to make better decisions regarding scheduling
and power management.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-10 12:46:47 +02:00
Daniel Vetter 967ad7f148 Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next
The conflict in intel_drv.h tripped me up a bit since a patch in dinq
moves all the functions around, but another one in drm-next removes a
single function. So I'ev figured backing this into a backmerge would
be good.

i915_dma.c is just adjacent lines changed, nothing nefarious there.

Conflicts:
	drivers/gpu/drm/i915/i915_dma.c
	drivers/gpu/drm/i915/intel_drv.h

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-10 12:44:43 +02:00
David Herrmann 16eb5f4379 drm: kill ->gem_init_object() and friends
All drivers embed gem-objects into their own buffer objects. There is no
reason to keep drm_gem_object_alloc(), gem->driver_private and
->gem_init_object() anymore.

New drivers are highly encouraged to do the same. There is no benefit in
allocating gem-objects separately.

Cc: Dave Airlie <airlied@gmail.com>
Cc: Alex Deucher <alexdeucher@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Ben Skeggs <skeggsb@gmail.com>
Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-10-09 14:38:02 +10:00
Chris Wilson b29c19b645 drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.

This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.

In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)

Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.

Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.

v2: Limit each client to receiving a single boost for each active period.
    Tested by QA to only marginally increase power, and to demonstrably
    increase throughput in games. No latency measurements yet.

v3: Cater for front-buffer rendering with manual throttling.

v4: Tidy up.

v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.

Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-03 20:01:31 +02:00
Chris Wilson 094f9a54e3 drm/i915: Fix __wait_seqno to use true infinite timeouts
When we switched to always using a timeout in conjunction with
wait_seqno, we lost the ability to detect missed interrupts. Since, we
have had issues with interrupts on a number of generations, and they are
required to be delivered in a timely fashion for a smooth UX, it is
important that we do log errors found in the wild and prevent the
display stalling for upwards of 1s every time the seqno interrupt is
missed.

Rather than continue to fix up the timeouts to work around the interface
impedence in wait_event_*(), open code the combination of
wait_event[_interruptible][_timeout], and use the exposed timer to
poll for seqno should we detect a lost interrupt.

v2: In order to satisfy the debug requirement of logging missed
interrupts with the real world requirments of making machines work even
if interrupts are hosed, we revert to polling after detecting a missed
interrupt.

v3: Throw in a debugfs interface to simulate broken hw not reporting
interrupts.

v4: s/EGAIN/EAGAIN/ (Imre)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Imre Deak <imre.deak@intel.com>
[danvet: Don't use the struct typedef in new code.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-03 20:01:30 +02:00
Chris Wilson b52b89da09 drm/i915: Add a tracepoint for using a semaphore
So that we can find the callers who introduce a ring stall. A single
ring stall is not too unwelcome, the right issue becomes when they start
to interlock and prevent any concurrent work. That, however, is a little
tricker to detect with a mere tracepoint!

v2: Rebrand it as a ring event, rather than an object event.
v3: Include the seqno in the tracepoint for posterity or something.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-01 07:45:24 +02:00
Ben Widawsky e2d05a8b1e drm/i915: Convert active API to VMA
Even though we track object activity and not VMA, because we have the
active_list be based on the VM, it makes the most sense to use VMAs in
the APIs.

NOTE: Daniel intends to eventually rip out active/inactive LRUs, but for
now, leave them be.

v2: Remove leftover hunk from the previous patch which didn't keep
i915_gem_object_move_to_active. That patch had to rely on the ring to
get the dev instead of the obj. (Chris)

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-01 07:45:21 +02:00
Ben Widawsky 5c2abbeab7 drm/i915: Provide a cheap ggtt vma lookup
"We do fairly often lookup the ggtt vma for an obj." - Chris Wilson. As
such, provide a function to offer slightly cheaper access to the vma.
Not performance tested. By my quick estimation it saves at least 3
pointer dereferences from the existing mechanism.

This patch mostly matches code from Chris in
<20130911221430.GB7825@nuc-i3427.alporthouse.com>

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-01 07:45:21 +02:00
Chris Wilson f740334775 drm/i915: Do not unlock upon error in i915_gem_idle()
We never took the lock ourselves and all callers expect the struct_mutex
to be locked upon return (be it success or error), thereore dropping the
lock along the error paths looks to be a vestigial error from

commit db1b76ca6a
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Tue Jul 9 16:51:37 2013 +0200

    drm/i915: don't frob mm.suspended when not using ums

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-01 07:45:03 +02:00
Daniel Vetter b14c5679dd drm/i915: use pointer = k[cmz...]alloc(sizeof(*pointer), ...) pattern
Done while reviewing all our allocations for fubar. Also a few errant
cases of lacking () for the sizeof operator - just a bit of OCD.

I've left out all the conversions that also should use kcalloc from
this patch  (it's only 2).

Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-01 07:45:01 +02:00
Dave Airlie 4821ff14a3 Merge tag 'drm-intel-next-2013-09-21-merged' of git://people.freedesktop.org/~danvet/drm-intel into drm-next
drm-intel-next-2013-09-21:
- clock state handling rework from Ville
- l3 parity handling fixes for hsw from Ben
- some more watermark improvements from Ville
- ban badly behaved context from Mika
- a few vlv improvements from Jesse
- VGA power domain handling from Ville
drm-intel-next-2013-09-06:
- Basic mipi dsi support from Jani. Not yet converted over to drm_bridge
  since that was too fresh, but the porting is in progress already.
- More vma patches from Ben, this time the code to convert the execbuffer
  code. Now that the shrinker recursion bug is tracked down we can move
  ahead here again. Yay!
- Optimize hw context switching to not generate needless interrupts (Chris
  Wilson). Also some shuffling for the oustanding request allocation.
- Opregion support for SWSCI, although not yet fully wired up (we need a
  bit of runtime D3 support for that apparently, due to Windows design
  deficiencies), from Jani Nikula.
- A few smaller changes all over.

[airlied: merge conflict fix in i9xx_set_pipeconf]

* tag 'drm-intel-next-2013-09-21-merged' of git://people.freedesktop.org/~danvet/drm-intel: (119 commits)
  drm/i915: assume all GM45 Acer laptops use inverted backlight PWM
  drm/i915: cleanup a min_t() cast
  drm/i915: Pull intel_init_power_well() out of intel_modeset_init_hw()
  drm/i915: Add POWER_DOMAIN_VGA
  drm/i915: Refactor power well refcount inc/dec operations
  drm/i915: Add intel_display_power_{get, put} to request power for specific domains
  drm/i915: Change i915_request power well handling
  drm/i915: POSTING_READ IPS_CTL before waiting for the vblank
  drm/i915: don't disable ERR_INT on the IRQ handler
  drm/i915/vlv: disable rc6p and rc6pp residency reporting on BYT
  drm/i915/vlv: honor i915_enable_rc6 boot param on VLV
  drm/i915: s/HAS_L3_GPU_CACHE/HAS_L3_DPF
  drm/i915: Do remaps for all contexts
  drm/i915: Keep a list of all contexts
  drm/i915: Make l3 remapping use the ring
  drm/i915: Add second slice l3 remapping
  drm/i915: Fix HSW parity test
  drm/i915: dump crtc timings from the pipe config
  drm/i915: register backlight device also when backlight class is a module
  drm/i915: write D_COMP using the mailbox
  ...

Conflicts:
	drivers/gpu/drm/i915/intel_display.c
2013-10-01 10:00:50 +10:00
Daniel Vetter d32270460f drm/i915: Fix up usage of SHRINK_STOP
In

commit 81e49f8114
Author: Glauber Costa <glommer@openvz.org>
Date:   Wed Aug 28 10:18:13 2013 +1000

    i915: bail out earlier when shrinker cannot acquire mutex

SHRINK_STOP was added to tell the core shrinker code to bail out and
go to the next shrinker since the i915 shrinker couldn't acquire
required locks. But the SHRINK_STOP return code was added to the
->count_objects callback and not the ->scan_objects callback as it
should have been, resulting in tons of dmesg noise like

shrink_slab: i915_gem_inactive_scan+0x0/0x9c negative objects to delete nr=-xxxxxxxxx

Fix discusssed with Dave Chinner.

References: http://www.spinics.net/lists/intel-gfx/msg33597.html
Reported-by: Knut Petersen <Knut_Petersen@t-online.de>
Cc: Knut Petersen <Knut_Petersen@t-online.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Glauber Costa <glommer@openvz.org>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-26 00:31:51 +02:00
Daniel Vetter b599c89e8c Linux 3.12-rc2
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQEcBAABAgAGBQJSQMORAAoJEHm+PkMAQRiGj14H/1bjhtfNjPdX7MVQAzA+WpwX
 s7h1IQu2Si9S5S1lBiM2sBTOssVcmfheO9x4yqm7JNOD1RnssWKOM3q+zVOLstwd
 GD3gluJPeraD5EyYSqEJ9ILPQ3gbxb4wOlT0Z291TW6E8XhLRr0RTOJPksRsgvLH
 Ckm9uJh6ArS6ZXfXiaDQfd+xHAQJkUfW6nMSA0g9ZO9C6KIDRvcbUmrY3m4HhfIk
 mK0TXCBs+AXGDIjTEB8JgIQL/5y1Qn0c4R+2uTU/4YWwyLvJTV1e44kGoleukMMT
 6Pw/TNlUEN161dbSaqCyF3sfXHDYQ5valycI2PDgitMtPSxbzsU1VDizS8+daRg=
 =lEmF
 -----END PGP SIGNATURE-----

Merge tag 'v3.12-rc2' into drm-intel-next

Backmerge Linux 3.12-rc2 to prep for a bunch of -next patches:
- Header cleanup in intel_drv.h, both changed in -fixes and my current
  -next pile.
- Cursor handling cleanup for -next which depends upon the cursor
  handling fix merged into -rc2.

All just trivial conflicts of the "changed adjacent lines" type:
	drivers/gpu/drm/i915/i915_gem.c
	drivers/gpu/drm/i915/intel_display.c
	drivers/gpu/drm/i915/intel_drv.h

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-24 09:32:53 +02:00
Linus Torvalds d8524ae9d6 Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
 - some small fixes for msm and exynos
 - a regression revert affecting nouveau users with old userspace
 - intel pageflip deadlock and gpu hang fixes, hsw modesetting hangs

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (22 commits)
  Revert "drm: mark context support as a legacy subsystem"
  drm/i915: Don't enable the cursor on a disable pipe
  drm/i915: do not update cursor in crtc mode set
  drm/exynos: fix return value check in lowlevel_buffer_allocate()
  drm/exynos: Fix address space warnings in exynos_drm_fbdev.c
  drm/exynos: Fix address space warning in exynos_drm_buf.c
  drm/exynos: Remove redundant OF dependency
  drm/msm: drop unnecessary set_need_resched()
  drm/i915: kill set_need_resched
  drm/msm: fix potential NULL pointer dereference
  drm/i915/dvo: set crtc timings again for panel fixed modes
  drm/i915/sdvo: Robustify the dtd<->drm_mode conversions
  drm/msm: workaround for missing irq
  drm/msm: return -EBUSY if bo still active
  drm/msm: fix return value check in ERR_PTR()
  drm/msm: fix cmdstream size check
  drm/msm: hangcheck harder
  drm/msm: handle read vs write fences
  drm/i915/sdvo: Fully translate sync flags in the dtd->mode conversion
  drm/i915: Use proper print format for debug prints
  ...
2013-09-22 19:51:49 -07:00
Ben Widawsky 040d2baa62 drm/i915: s/HAS_L3_GPU_CACHE/HAS_L3_DPF
We'd only ever used this define to denote whether or not we have the
dynamic parity feature (DPF) and never to determine whether or not L3
exists. Baytrail is a good example of where L3 exists, and not DPF.

This patch provides clarify in the code for future use cases which might
want to actually query whether or not L3 exists.

v2: Add /* DPF == dynamic parity feature */

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-19 20:41:00 +02:00
Ben Widawsky a33afea5ff drm/i915: Keep a list of all contexts
I have implemented this patch before without creating a separate list
(I'm having trouble finding the links, but the messages ids are:
<1364942743-6041-2-git-send-email-ben@bwidawsk.net>
<1365118914-15753-9-git-send-email-ben@bwidawsk.net>)

However, the code is much simpler to just use a list and it makes the
code from the next patch a lot more pretty.

As you'll see in the next patch, the reason for this is to be able to
specify when a context needs to get L3 remapping. More details there.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-19 20:39:43 +02:00
Ben Widawsky c3787e2eac drm/i915: Make l3 remapping use the ring
Using LRI for setting the remapping registers allows us to stream l3
remapping information. This is necessary to handle per context remaps as
we'll see implemented in an upcoming patch.

Using the ring also means we don't need to frob the DOP clock gating
bits.

v2: Add comment about lack of worry for concurrent register access
(Daniel)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
[danvet: Bikeshed the comment a bit by doing a s/XXX/Note - there's
nothing to fix.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-19 20:38:00 +02:00
Ben Widawsky 35a85ac606 drm/i915: Add second slice l3 remapping
Certain HSW SKUs have a second bank of L3. This L3 remapping has a
separate register set, and interrupt from the first "slice". A slice is
simply a term to define some subset of the GPU's l3 cache. This patch
implements both the interrupt handler, and ability to communicate with
userspace about this second slice.

v2:  Remove redundant check about non-existent slice.
Change warning about interrupts of unknown slices to WARN_ON_ONCE
Handle the case where we get 2 slice interrupts concurrently, and switch
the tracking of interrupts to be non-destructive (all Ville)
Don't enable/mask the second slice parity interrupt for ivb/vlv (even
though all docs I can find claim it's rsvd) (Ville + Bryan)
Keep BYT excluded from L3 parity

v3: Fix the slice = ffs to be decremented by one (found by Ville). When
I initially did my testing on the series, I was using 1-based slice
counting, so this code was correct. Not sure why my simpler tests that
I've been running since then didn't pick it up sooner.

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-19 20:37:04 +02:00
Linus Torvalds 26935fb06e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile 4 from Al Viro:
 "list_lru pile, mostly"

This came out of Andrew's pile, Al ended up doing the merge work so that
Andrew didn't have to.

Additionally, a few fixes.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (42 commits)
  super: fix for destroy lrus
  list_lru: dynamically adjust node arrays
  shrinker: Kill old ->shrink API.
  shrinker: convert remaining shrinkers to count/scan API
  staging/lustre/libcfs: cleanup linux-mem.h
  staging/lustre/ptlrpc: convert to new shrinker API
  staging/lustre/obdclass: convert lu_object shrinker to count/scan API
  staging/lustre/ldlm: convert to shrinkers to count/scan API
  hugepage: convert huge zero page shrinker to new shrinker API
  i915: bail out earlier when shrinker cannot acquire mutex
  drivers: convert shrinkers to new count/scan API
  fs: convert fs shrinkers to new scan/count API
  xfs: fix dquot isolation hang
  xfs-convert-dquot-cache-lru-to-list_lru-fix
  xfs: convert dquot cache lru to list_lru
  xfs: rework buffer dispose list tracking
  xfs-convert-buftarg-lru-to-generic-code-fix
  xfs: convert buftarg LRU to generic code
  fs: convert inode and dentry shrinking to be node aware
  vmscan: per-node deferred work
  ...
2013-09-12 15:01:38 -07:00
Daniel Vetter 571c608d06 drm/i915: kill set_need_resched
This is just a remnant from the old days when our reset handling was
horribly racy, suffered from terribly locking issues and often happily
live-locked. Those days are now gone so we can drop the hacks and just
rip the reschedule-point out.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-12 22:40:36 +02:00
Ben Widawsky 23f5448398 drm/i915: Synchronize pread/pwrite with wait_rendering
lifted from Daniel:
pread/pwrite isn't about the object's domain at all, but purely about
synchronizing for outstanding rendering. Replacing the call to
set_to_gtt_domain with a wait_rendering would imo improve code
readability. Furthermore we could pimp pread to only block for
outstanding writes and not for reads.

Since you're not the first one to trip over this: Can I volunteer you
for a follow-up patch to fix this?

v2: Switch the pwrite patch to use \!read_only. This was a typo in the
original code. (Chris, Daniel)

Recommended-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Fix up the logic fumble - wait_rendering has a bool readonly
paramater, set_to_gtt_domain otoh has bool write. Breakage reported by
Jani Nikula, I've double-checked that igt/gem_concurrent_blt/prw-*
would have caught this.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-12 21:56:52 +02:00
Glauber Costa 81e49f8114 i915: bail out earlier when shrinker cannot acquire mutex
The main shrinker driver will keep trying for a while to free objects if
the returned value from the shrink scan procedure is 0.  That means "no
objects now", but a retry could very well succeed.

But what we should say here is a different thing: that it is impossible to
shrink, and we would better bail out soon.  We find this behavior more
appropriate for the case where the lock cannot be taken.  Specially given
the hammer behavior of the i915: if another thread is already shrinking,
we are likely not to be able to shrink anything anyway when we finally
acquire the mutex.

Signed-off-by: Glauber Costa <glommer@openvz.org>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10 18:56:32 -04:00
Dave Chinner 7dc19d5aff drivers: convert shrinkers to new count/scan API
Convert the driver shrinkers to the new API.  Most changes are compile
tested only because I either don't have the hardware or it's staging
stuff.

FWIW, the md and android code is pretty good, but the rest of it makes me
want to claw my eyes out.  The amount of broken code I just encountered is
mind boggling.  I've added comments explaining what is broken, but I fear
that some of the code would be best dealt with by being dragged behind the
bike shed, burying in mud up to it's neck and then run over repeatedly
with a blunt lawn mower.

Special mention goes to the zcache/zcache2 drivers.  They can't co-exist
in the build at the same time, they are under different menu options in
menuconfig, they only show up when you've got the right set of mm
subsystem options configured and so even compile testing is an exercise in
pulling teeth.  And that doesn't even take into account the horrible,
broken code...

[glommer@openvz.org: fixes for i915, android lowmem, zcache, bcache]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@openvz.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10 18:56:32 -04:00
Chris Wilson 5a1d5eb020 drm/i915: Remove the double-list iteration from bound_any()
The purpose of the function is to find out whether the object is still
bound in any address space. This can be easily checked by looking at the
vma currently associated with the object, rather than asking if any of
the global address spaces have an active vma on the object.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-10 16:14:06 +02:00
Mika Kuoppala be62acb4cc drm/i915: ban badly behaving contexts
Now when we have mechanism in place to track which context
was guilty of hanging the gpu, it is possible to punish
for bad behaviour.

If context has recently submitted a faulty batchbuffers guilty of
gpu hang and submits another batch which hangs gpu in quick
succession, ban it permanently. If ctx is banned, no more
batchbuffers will be queued for execution.

There is no need for global wedge machinery anymore and
it would be unwise to wedge the whole gpu if we have multiple
hanging batches queued for execution. Instead just ban
the guilty ones and carry on.

v2: Store guilty ban status bool in gpu_error instead of pointers
    that might become danling before hang is declared.

v3: Use return value for banned status instead of stashing state
    into gpu_error (Chris Wilson)

v4: - rebase on top of fixed hang stats api
    - add define for ban period
    - rename commit and improve commit msg

v5: - rely context banning instead of wedging the gpu
    - beautification and fix for ban calculation (Chris)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-06 17:55:50 +02:00
Chris Wilson 57094f8246 drm/i915: Hold an object reference whilst we shrink it
Whilst running the shrinker, we need to hold a reference as we unbind
the objects, or else we may end up waiting for and retiring requests,
which in turn may result in this object being freed.

This is very similar to the eviction code which also has to be very
careful to keep a reference to its objects as it retires and unbinds
them.

Another similarity, that Ben pointed out, is that as we may call
retire-requests, the unbound_list is outside of our control. We must
only process a single element of that list at a time, that is we can not
rely on the "safe" next pointer being valid after a call to
i915_vma_unbind().

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
  IP: [<ffffffffa0082892>] i915_gem_gtt_finish_object+0x68/0xbd [i915]
  PGD 758d3067 PUD ac0d6067 PMD 0
  Oops: 0000 [#1] SMP
  Modules linked in: dm_mod snd_hda_codec_realtek iTCO_wdt iTCO_vendor_support pcspkr snd_hda_intel i2c_i801 snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer snd lpc_ich mfd_core soundcore battery ac option usb_wwan usbserial uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev i915 video button drm_kms_helper drm acpi_cpufreq mperf freq_table
  CPU: 1 PID: 16835 Comm: fbo-maxsize Not tainted 3.11.0-rc7_nightlytop_8fdad4_20130902_+ #7977
  task: ffff8800712106d0 ti: ffff880028e4a000 task.ti: ffff880028e4a000
  RIP: 0010:[<ffffffffa0082892>]  [<ffffffffa0082892>] i915_gem_gtt_finish_object+0x68/0xbd [i915]
  RSP: 0018:ffff880028e4b9e8  EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff880145734000 RCX: ffff880145735328
  RDX: ffff8801457353fc RSI: 0000000000000000 RDI: ffff88007597cc00
  RBP: ffff88007597cc00 R08: 0000000000000001 R09: ffff88014f257f00
  R10: ffffea0001d65f00 R11: 0000000000bba60b R12: ffff880149e5b000
  R13: ffff880145734001 R14: ffff88007597ccc8 R15: ffff88007597cc00
  FS:  00007ff5bc919740(0000) GS:ffff88014f240000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000008 CR3: 0000000028f4c000 CR4: 00000000001407e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Stack:
   0000000000000000 ffff88007597cc00 ffff8801440d6840 0000000000000000
   ffff880145734000 ffffffffa007c854 0000000000000010 ffff88007597c900
   0000000000018000 00000000004a1201 ffff88007597cc60 ffffffffa007d183
  Call Trace:
   [<ffffffffa007c854>] ? i915_vma_unbind+0xe2/0x1d1 [i915]
   [<ffffffffa007d183>] ? __i915_gem_shrink+0xf1/0x162 [i915]
   [<ffffffffa007d2ee>] ? i915_gem_object_get_pages_gtt+0xfa/0x303 [i915]
   [<ffffffffa00795f4>] ? i915_gem_object_get_pages+0x54/0x89 [i915]
   [<ffffffffa007cbda>] ? i915_gem_object_pin+0x238/0x5ce [i915]
   [<ffffffff812cba5f>] ? __sg_page_iter_next+0x2b/0x58
   [<ffffffffa0082056>] ? gen6_ppgtt_insert_entries+0xf2/0x114 [i915]
   [<ffffffffa007fe4b>] ? i915_gem_execbuffer_reserve_vma.isra.13+0x79/0x18d [i915]
   [<ffffffffa008017c>] ? i915_gem_execbuffer_reserve+0x21d/0x347 [i915]
   [<ffffffffa0080bfb>] ? i915_gem_do_execbuffer.isra.17+0x4f3/0xe61 [i915]
   [<ffffffffa00795f4>] ? i915_gem_object_get_pages+0x54/0x89 [i915]
   [<ffffffffa007e405>] ? i915_gem_pwrite_ioctl+0x743/0x7a5 [i915]
   [<ffffffffa0081a46>] ? i915_gem_execbuffer2+0x15e/0x1e4 [i915]
   [<ffffffffa000e20d>] ? drm_ioctl+0x2a5/0x3c4 [drm]
   [<ffffffffa00818e8>] ? i915_gem_execbuffer+0x37f/0x37f [i915]
   [<ffffffff816f64c0>] ? __do_page_fault+0x3ab/0x449
   [<ffffffff810be3da>] ? do_mmap_pgoff+0x2b2/0x341
   [<ffffffff810e49be>] ? vfs_ioctl+0x1e/0x31
   [<ffffffff810e5194>] ? do_vfs_ioctl+0x3ad/0x3ef
   [<ffffffff810e5224>] ? SyS_ioctl+0x4e/0x7e
   [<ffffffff816f88d2>] ? system_call_fastpath+0x16/0x1b
  Code: 52 0c a0 48 c7 c6 22 30 0d a0 31 c0 e8 ef 00 f9 ff bf c6 a7 00 00 e8 90 5d 24 e1 f6 85 13 01 00 00 10 75 44 48 8b 85 18 01 00 00 <8b> 50 08 48 8b 30 49 8b 84 24 88 02 00 00 48 89 c7 48 81 c7 98
  RIP  [<ffffffffa0082892>] i915_gem_gtt_finish_object+0x68/0xbd [i915]
  RSP <ffff880028e4b9e8>
  CR2: 0000000000000008

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68171
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
[danvet: Bikeshed the comments a bit as discussed with Chris.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-05 14:47:59 +02:00
Chris Wilson 3c0e234c84 drm/i915; Preallocate the lazy request
It is possible for us to be forced to perform an allocation for the lazy
request whilst running the shrinker. This allocation may fail, leaving
us unable to reclaim any memory leading to premature OOM. A neat
solution to the problem is to preallocate the request at the same time
as acquiring the seqno for the ring transaction. This means that we can
report ENOMEM prior to touching the rings.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-05 12:03:53 +02:00
Chris Wilson 1823521d2b drm/i915: Rename ring->outstanding_lazy_request
Prior to preallocating an request for lazy emission, rename the existing
field to make way (and differentiate the seqno from the request struct).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-05 12:03:12 +02:00
Chris Wilson 9a7e0c2a1b drm/i915: Rearrange the comments in i915_add_request()
The comments were a little out-of-sequence with the code, forcing the
reader to jump around whilst reading. Whilst moving the comments around,
add one to explain the context reference.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-04 17:34:54 +02:00
Chris Wilson c0321e2c5a drm/i915: Do not add an interrupt for a context switch
We use the request to ensure we hold a reference to the context for the
duration that it remains in use by the ring. Each request only holds a
reference to the current context, hence we emit a request after
switching contexts with the final reference to the old context. However,
the extra interrupt caused by that request is not useful (no timing
critical function will wait for the context object), instead the overhead
of servicing the IRQ shows up in some (lightweight) benchmarks. In order
to keep the useful property of using the request to manage the context
lifetime, we want to add a dummy request that is associated with the
interrupt from the subsequent real request following the batch.

The extra interrupt was added as a side-effect of using
i915_add_request() in

commit 112522f678
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu May 2 16:48:07 2013 +0300

    drm/i915: put context upon switching

v2: Daniel convinced me that the request here was solely for context
lifetime tracking and that we have the active ref to keep the object
alive whilst the MI_SET_CONTEXT. So the only concern then is which
context should get the blame for MI_SET_CONTEXT failing. The old scheme
added a request for the old context so that any hang upto and including
the switch away would mark the old context as guilty. Now any hang here
implicates the new context. However since we have already gone through a
complete flush with the last context in its last request, and all that
lies in no-man's-land is an invalidate flush and the MI_SET_CONTEXT, we
should be safe in not unduly placing blame on the new context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-04 17:34:53 +02:00
Daniel Vetter 0ff501cbb5 drm/i915: Fix list corruption in vma_unbind
The saga around the breadcrumb vmas used by execbuf continues ...

This time around we've managed to unconditionally move the object to
the unbound list on the last vma unbind even though it might never
have been on either the bound or unbound list. Hilarity ensued.

Chris Wilson tracked this one down but compared to his patches I've
simply opted to completely separate the unbound case for not-yet bound
vmas. Otherwise we imo end up with semantically hard to parse checks
around the list_move_tail(global_list, ...).

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68462
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-04 17:34:52 +02:00
Rodrigo Vivi 9435373ef8 drm/i915: Report enabled slices on Haswell GT3
Batchbuffers constructed by userspace can conditionalise their URB
allocations through the use of the MI_SET_PREDICATE command. This
command can read the MI_PREDICATE_RESULT_2 register to see how many
slices are enabled on GT3, and by virtue of the result, scale their
memory allocations to fit enabled memory.

Of course, this only works if the kernel sets the appropriate bit in the
register first.

v2: Better commit subject and message by Chris Wilson.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Credits-to: Yejun Guo <yejun.guo@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-04 17:34:51 +02:00
Daniel Vetter b93dab6e9d drm/i915: More vma fixups around unbind/destroy
The important bugfix here is that we must not unlink the vma when
we keep it around as a placeholder for the execbuf code. Since then we
won't find it again when execbuf gets interrupt and restarted and
create a 2nd vma. And since the code as-is isn't fit yet to deal with
more than one vma, hilarity ensues.

Specifically the dma map/unmap of the sg table isn't adjusted for
multiple vmas yet and will blow up like this:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffffa008fb37>] i915_gem_gtt_finish_object+0x73/0xc8 [i915]
PGD 56bb5067 PUD ad3dd067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in: tcp_lp ppdev parport_pc lp parport ipv6 dm_mod dcdbas snd_hda_codec_hdmi pcspkr snd_hda_codec_realtek serio_raw i2c_i801 iTCO_wdt iTCO_vendor_support snd_hda_intel snd_hda_codec lpc_ich snd_hwdep mfd_core snd_pcm snd_page_alloc snd_timer snd soundcore acpi_cpufreq i915 video button drm_kms_helper drm mperf freq_table
CPU: 1 PID: 16650 Comm: fbo-maxsize Not tainted 3.11.0-rc4_nightlytop_d93f59_debug_20130814_+ #6957
Hardware name: Dell Inc. OptiPlex 9010/03JR84, BIOS A01 05/04/2012
task: ffff8800563b3f00 ti: ffff88004bdf4000 task.ti: ffff88004bdf4000
RIP: 0010:[<ffffffffa008fb37>]  [<ffffffffa008fb37>] i915_gem_gtt_finish_object+0x73/0xc8 [i915]
RSP: 0018:ffff88004bdf5958  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8801135e0000 RCX: ffff8800ad3bf8e0
RDX: ffff8800ad3bf8e0 RSI: 0000000000000000 RDI: ffff8801007ee780
RBP: ffff88004bdf5978 R08: ffff8800ad3bf8e0 R09: 0000000000000000
R10: ffffffff86ca1810 R11: ffff880036a17101 R12: ffff8801007ee780
R13: 0000000000018001 R14: ffff880118c4e000 R15: ffff8801007ee780
FS:  00007f401a0ce740(0000) GS:ffff88011e280000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000005635c000 CR4: 00000000001407e0
Stack:
 ffff8801007ee780 ffff88005c253180 0000000000018000 ffff8801135e0000
 ffff88004bdf59a8 ffffffffa0088e55 0000000000000011 ffff8801007eec00
 0000000000018000 ffff880036a17101 ffff88004bdf5a08 ffffffffa0089026
Call Trace:
 [<ffffffffa0088e55>] i915_vma_unbind+0xdf/0x1ab [i915]
 [<ffffffffa0089026>] __i915_gem_shrink+0x105/0x177 [i915]
 [<ffffffffa0089452>] i915_gem_object_get_pages_gtt+0x108/0x309 [i915]
 [<ffffffffa0085ba9>] i915_gem_object_get_pages+0x61/0x90 [i915]
 [<ffffffffa008f22b>] ? gen6_ppgtt_insert_entries+0x103/0x125 [i915]
 [<ffffffffa008a113>] i915_gem_object_pin+0x1fa/0x5df [i915]
 [<ffffffffa008cdfe>] i915_gem_execbuffer_reserve_object.isra.6+0x8d/0x1bc [i915]
 [<ffffffffa008d156>] i915_gem_execbuffer_reserve+0x229/0x367 [i915]
 [<ffffffffa008dbf6>] i915_gem_do_execbuffer.isra.12+0x4dc/0xf3a [i915]
 [<ffffffff810fc823>] ? might_fault+0x40/0x90
 [<ffffffffa008eb89>] i915_gem_execbuffer2+0x187/0x222 [i915]
 [<ffffffffa000971c>] drm_ioctl+0x308/0x442 [drm]
 [<ffffffffa008ea02>] ? i915_gem_execbuffer+0x3ae/0x3ae [i915]
 [<ffffffff817db156>] ? __do_page_fault+0x3dd/0x481
 [<ffffffff8112fdba>] vfs_ioctl+0x26/0x39
 [<ffffffff811306a2>] do_vfs_ioctl+0x40e/0x451
 [<ffffffff817deda7>] ? sysret_check+0x1b/0x56
 [<ffffffff8113073c>] SyS_ioctl+0x57/0x87
 [<ffffffff8135bbfe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff817ded82>] system_call_fastpath+0x16/0x1b
Code: 48 c7 c6 84 30 0e a0 31 c0 e8 d0 e9 f7 ff bf c6 a7 00 00 e8 07 af 2c e1 41 f6 84 24 03 01 00 00 10 75 44 49 8b 84 24 08 01 00 00 <8b> 50 08 48 8b 30 49 8b 86 b0 04 00 00 48 89 c7 48 81 c7 98 00
RIP  [<ffffffffa008fb37>] i915_gem_gtt_finish_object+0x73/0xc8 [i915]
 RSP <ffff88004bdf5958>
CR2: 0000000000000008

As a consequence we need to change the "only one vma for now" check in
vma_unbind - since vma_destroy isn't always called the obj->vma_list
might not be empty. Instead check that the vma list is singular at the
beginning of vma_unbind. This is also more symmetric with bind_to_vm.

This fixes the igt/gem_evict_everything|alignment testcases.

v2:
- Add a paranoid WARN to mark_free in the eviction code to make sure
  we never try to evict a vma used by the execbuf code right now.
- Move the check for a temporary execbuf vma into vma_destroy -
  otherwise the failure path cleanup in bind_to_vm will blow up.

Our first attempting at fixing this was

commit 1be81a2f2cfd8789a627401d470423358fba2d76
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Aug 20 12:56:40 2013 +0100

    drm/i915: Don't destroy the vma placeholder during execbuffer reservation

Squash with this when merging!

v3: Improvements suggested in Chris' review:
- Move the WARN_ON in vma_destroy that checks for vmas with an drm_mm
  allocation before the early return.
- Bail out if we hit the WARN in mark_free to hopefully make the
  kernel survive for long enough to capture it.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68298
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68171
Tested-by: lu hua <huax.lu@intel.com> (v2)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-04 17:34:42 +02:00
Chris Wilson aaa0566792 drm/i915: Don't destroy the vma placeholder during execbuffer reservation
The execbuffer handle and exec_link were moved from the object into the
vma. As the vma may be unbound and destroyed whilst attempting to
reserve the execbuffer objects (either through a forced unbind to fix up
a misalignment or through an evict-everything call) we need to prevent
the free of the i915_vma itself. Otherwise not only is the list of
objects to reserve corrupt, but we continue to reference stale vma
entries.

Fixes kernel crash with i-g-t/gem_evict_everything

This regression has been introduced in

commit 04038a515d6eda6dd0857c0ade0b3950d372f4c0
Author:     Ben Widawsky <ben@bwidawsk.net>
AuthorDate: Wed Aug 14 11:38:36 2013 +0200

    drm/i915: Convert execbuf code to use vmas

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
References: http://www.spinics.net/lists/intel-gfx/msg32038.html
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68298
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-04 17:34:42 +02:00
Daniel Vetter e656a6cba0 drm/i915: inline vma_create into lookup_or_create_vma
In the execbuf code we don't clean up any vmas which ended up not
getting bound for code simplicity. To make sure that we don't end up
creating multiple vma for the same vm kill the somewhat dangerous
vma_create function and inline it into lookup_or_create.

This is just a safety measure to prevent surprises in the future.

Also update the somewhat confused comment in the execbuf code and
clarify what kind of magic is going on with a new one.

v2: Keep the function separate as requested by Chris. But give it a __
prefix for paranoia and move it tighter together with the other vma
stuff.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-04 17:34:41 +02:00
Ben Widawsky 27173f1f95 drm/i915: Convert execbuf code to use vmas
In order to transition more of our code over to using a VMA instead of
an <OBJ, VM> pair - we must have the vma accessible at execbuf time. Up
until now, we've only had a VMA when actually binding an object.

The previous patch helped handle the distinction on bound vs. unbound.
This patch will help us catch leaks, and other issues before we actually
shuffle a bunch of stuff around.

This attempts to convert all the execbuf code to speak in vmas. Since
the execbuf code is very self contained it was a nice isolated
conversion.

The meat of the code is about turning eb_objects into eb_vma, and then
wiring up the rest of the code to use vmas instead of obj, vm pairs.

Unfortunately, to do this, we must move the exec_list link from the obj
structure. This list is reused in the eviction code, so we must also
modify the eviction code to make this work.

WARNING: This patch makes an already hotly profiled path slower. The cost is
unavoidable. In reply to this mail, I will attach the extra data.

v2: Release table lock early, and two a 2 phase vma lookup to avoid
having to use a GFP_ATOMIC. (Chris)

v3: s/obj_exec_list/obj_exec_link/
Updates to address
commit 6d2b888569
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Aug 7 18:30:54 2013 +0100

    drm/i915: List objects allocated from stolen memory in debugfs

v4: Use obj = vma->obj for neatness in some places (Chris)
need_reloc_mappable() should return false if ppgtt (Chris)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Split out prep patches. Also remove a FIXME comment which is
now taken care of.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-04 17:34:41 +02:00
Damien Lespiau d2933a5b8f drm/i915: Don't call sg_free_table() if sg_alloc_table() fails
One needs to call __sg_free_table() if __sg_alloc_table() fails, but
sg_alloc_table() does that for us already.

Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewd-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-03 19:18:00 +02:00
Joe Perches fac15c1082 i915_gem: Convert kmem_cache_alloc(...GFP_ZERO) to kmem_cache_zalloc
The helper exists, might as well use it instead of __GFP_ZERO.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-03 19:17:56 +02:00
Dave Airlie efa27f9cec Merge tag 'drm-intel-next-2013-08-23' of git://people.freedesktop.org/~danvet/drm-intel into drm-next
Need to get my stuff out the door ;-) Highlights:
- pc8+ support from Paulo
- more vma patches from Ben.
- Kconfig option to enable preliminary support by default (Josh
  Triplett)
- Optimized cpu cache flush handling and support for write-through caching
  of display planes on Iris (Chris)
- rc6 tuning from Stéphane Marchesin for more stability
- VECS seqno wrap/semaphores fix (Ben)
- a pile of smaller cleanups and improvements all over

Note that I've ditched Ben's execbuf vma conversion for 3.12 since not yet
ready. But there's still other vma conversion stuff in here.

* tag 'drm-intel-next-2013-08-23' of git://people.freedesktop.org/~danvet/drm-intel: (62 commits)
  drm/i915: Print seqnos as unsigned in debugfs
  drm/i915: Fix context size calculation on SNB/IVB/VLV
  drm/i915: Use POSTING_READ in lcpll code
  drm/i915: enable Package C8+ by default
  drm/i915: add i915.pc8_timeout function
  drm/i915: add i915_pc8_status debugfs file
  drm/i915: allow package C8+ states on Haswell (disabled)
  drm/i915: fix SDEIMR assertion when disabling LCPLL
  drm/i915: grab force_wake when restoring LCPLL
  drm/i915: drop WaMbcDriverBootEnable workaround
  drm/i915: Cleaning up the relocate entry function
  drm/i915: merge HSW and SNB PM irq handlers
  drm/i915: fix how we mask PMIMR when adding work to the queue
  drm/i915: don't queue PM events we won't process
  drm/i915: don't disable/reenable IVB error interrupts when not needed
  drm/i915: add dev_priv->pm_irq_mask
  drm/i915: don't update GEN6_PMIMR when it's not needed
  drm/i915: wrap GEN6_PMIMR changes
  drm/i915: wrap GTIMR changes
  drm/i915: add the FCLK case to intel_ddi_get_cdclk_freq
  ...
2013-08-30 09:47:41 +10:00
Paulo Zanoni c67a470b1d drm/i915: allow package C8+ states on Haswell (disabled)
This patch allows PC8+ states on Haswell. These states can only be
reached when all the display outputs are disabled, and they allow some
more power savings.

The fact that the graphics device is allowing PC8+ doesn't mean that
the machine will actually enter PC8+: all the other devices also need
to allow PC8+.

For now this option is disabled by default. You need i915.allow_pc8=1
if you want it.

This patch adds a big comment inside i915_drv.h explaining how it
works and how it tracks things. Read it.

v2: (this is not really v2, many previous versions were already sent,
     but they had different names)
    - Use the new functions to enable/disable GTIMR and GEN6_PMIMR
    - Rename almost all variables and functions to names suggested by
      Chris
    - More WARNs on the IRQ handling code
    - Also disable PC8 when there's GPU work to do (thanks to Ben for
      the help on this), so apps can run caster
    - Enable PC8 on a delayed work function that is delayed for 5
      seconds. This makes sure we only enable PC8+ if we're really
      idle
    - Make sure we're not in PC8+ when suspending
v3: - WARN if IRQs are disabled on __wait_seqno
    - Replace some DRM_ERRORs with WARNs
    - Fix calls to restore GT and PM interrupts
    - Use intel_mark_busy instead of intel_ring_advance to disable PC8
v4: - Use the force_wake, Luke!
v5: - Remove the "IIR is not zero" WARNs
    - Move the force_wake chunk to its own patch
    - Only restore what's missing from RC6, not everything

Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-23 14:52:33 +02:00
Ben Widawsky accfef2e5a drm/i915: prepare bind_to_vm for preallocated vma
In the new execbuf code we want to track buffers using the vmas even
before they're all properly mapped. Which means that bind_to_vm needs
to deal with buffers which have preallocated vmas which aren't yet
bound.

This patch implements this prep work and adjusts our WARN/BUG checks.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Split out from Ben's big execbuf patch. Also move one BUG
back to its original place to deflate the diff a notch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-22 13:31:53 +02:00
Ben Widawsky 82a55ad1a0 drm/i915: Switch eviction code to use vmas
The execbuf wants to do relocations usings vmas, so we need a
vma->exec_list. The eviction code also uses the old obj execbuf list
for it's own book-keeping, but would really prefer to deal in vmas
only. So switch it over to the new list.

Again this is just a prep patch for the big execbuf vma conversion.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Split out from Ben's big execbuf vma patch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-22 13:31:52 +02:00
Ben Widawsky b25cb2f882 drm/i915: s/obj->exec_list/obj->obj_exec_link in debugfs
To convert the execbuf code over to use vmas natively we need to
shuffle the exec_list a bit. This patch here just prepares things with
the debugfs code, which also uses the old exec_list list_head, newly
called obj_exec_link.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Split out from Ben's big patch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-22 13:31:51 +02:00
Chris Wilson 4b6d846e9a drm/i915: Drop the overzealous warning from i915_gem_set_cache_level
By our earlier reckoning, move from a snooped/llc setting to an uncached
setting, leaves the CPU cache in a consistent state irrespective of our
domain tracking - so we can forgo the warning about the lack of
invalidation. Similarly for any writes posted to the snooped CPU domain,
we know will be safely clflushed to the uncached PTEs after forcing the
domain change.

This WARN started to pop up with

commit d46f1c3f13
Author:     Chris Wilson <chris@chris-wilson.co.uk>
AuthorDate: Thu Aug 8 14:41:06 2013 +0100

    drm/i915: Allow the GPU to cache stolen memory

Ville brought up a scenario where the interaction of a set_caching
ioctl call from userspace on a scanout buffer (i.e. obj->pin_display
is set) resulted in the code getting confused and not properly
flushing stale cpu cachelines. Luckily we already prevent this by
rejecting caching changes when obj->pin_count is set.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68040
Tested-by: cancan,feng <cancan.feng@intel.com>
[danvet: Add buglink, bisect result and explain why Ville's scenario
is already taken care of.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-22 13:31:46 +02:00
Daniel Vetter 49987099e2 drm/i915: use vma->node directly and rewrap map&fence in bind
Use () to make for neater alignment of the split lines, too. With this
we ditch another jump through the obj_gtt_size/offset indirection
maze.

Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-22 13:31:45 +02:00
Ben Widawsky 4bd561b3e8 drm/i915: cleanup map&fence in bind
Cleanup the map and fenceable setting during bind to make more sense,
and not check i915_is_ggtt() 2 unnecessary times

v2: Move the bools into the if block (Chris) - There are ways to tidy
this function (fence calculations for instance) even further, but they
are quite invasive, so I am punting on those unless specifically asked.

v3: Add newline between variable declaration and logic (Chris)

Recommended-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-22 13:31:45 +02:00
Ben Widawsky 433544bd25 drm/i915: Remove node only when allocated
VMAs can be created and not bound. One may think of it as lazy cleanup,
and safely gloss over the conditions which manufacture it. In either
case, when the object backing the i915 vma is destroyed, we must cleanup
the vma without stumbling into a bunch of pitfalls that assume the vma
is bound.

NOTE: I was pretty certain the above condition could only happen when we
introduced the use of VMAs being looked up at execbuf, and already
existing. Paulo has hit this though, so I must be missing something. As
I believe the patch is correct anyway, therefore I won't scratch my head
too hard.

v2: use goto destroy as a compromise (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-22 13:31:44 +02:00
Chris Wilson 4257d3ba3b drm/i915: Allow the user to set bo into the DISPLAY cache domain
This is primarily for the benefit of the create2 ioctl so that the
caller can avoid the later step of rebinding the bo with new PTE bits.
After introducing WT (and possibly GFDT) cacheing for display targets,
not everything in the display is earmarked as UC, and more importantly
what is is controlled by the kernel.

Note that set_cache_level/get_cache_level for DISPLAY is not necessarily
idempotent; get_cache_level may return UC for architectures that have no
special cache domain for the display engine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-22 13:31:39 +02:00
Chris Wilson 651d794fae drm/i915: Use Write-Through cacheing for the display plane on Iris
Haswell GT3e has the unique feature of supporting Write-Through cacheing
of objects within the eLLC/LLC. The purpose of this is to enable the display
plane to remain coherent whilst objects lie resident in the eLLC/LLC - so
that we, in theory, get the best of both worlds, perfect display and fast
access.

However, we still need to be careful as the CPU does not see the WT when
accessing the cache. In particular, this means that we need to flush the
cache lines after writing to an object through the CPU, and on
transitioning from a cached state to WT.

v2: Actually do the clflush on transition to WT, nagging by Ville.
v3: Flush the CPU cache after writes into WT objects.
v4: Rease onto LLC updates and report WT as "uncached" for
get_cache_level_ioctl to remain symmetric with set_cache_level_ioctl.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-22 13:31:38 +02:00
Jani Nikula f2f4d82faf drm/i915: give more distinctive names to ring hangcheck action enums
The short lowercase names are bound to collide. The default warnings
don't even warn about shadowing.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-22 13:31:37 +02:00
Ben Widawsky 7ace7ef2f5 drm/i915: WARN_ON failed map_and_fenceable
I just noticed in our code we don't really check the assertion, and
given some of the code I am changing in this area, I feel a WARN is very
nice to have.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: s/&/&&/ to fix typo on the check.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-22 13:31:35 +02:00
Chris Wilson 000433b67e drm/i915: Only do a chipset flush after a clflush
Now that we skip clflushes more often, return a boolean indicating
whether the clflush was actually performed, and only if it was do the
chipset flush. (Though on most of the architectures where the clflush will
be skipped, the chipset flush is a no-op!)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-22 13:31:34 +02:00
Dave Airlie 9712def2b3 Merge tag 'drm-intel-next-2013-08-09' of git://people.freedesktop.org/~danvet/drm-intel into drm-next
Daniel writes:
New pile of stuff for -next:
- Cleanup of the old crtc helper callbacks, all encoders are now converted
  to the i915 modeset infrastructure.
- Massive amount of wm patches from Ville for ilk, snb, ivb, hsw, this is
  prep work to eventually get things going for nuclear pageflips where we
  need to adjust watermarks on the fly.
- More vm/vma patches from Ben. This refactoring isn't yet fully rolled
  out, we miss the execbuf conversion and some of the low-level
  bind/unbind support code.
- Convert our hdmi infoframe code to use the new common helper functions
  (Damien). This contains some bugfixes for the common infoframe helpers.
- Some cruft removal from Damien.
- Various smaller bits&pieces all over, as usual.

* tag 'drm-intel-next-2013-08-09' of git://people.freedesktop.org/~danvet/drm-intel: (105 commits)
  drm/i915: Fix FB WM for HSW
  drm/i915: expose HDMI connectors on port C on BYT
  drm/i915: fix a limit check in hsw_compute_wm_results()
  drm/i915: unbreak i915_gem_object_ggtt_unbind()
  drm/i915: Make intel_set_mode() static
  drm/i915: Remove intel_modeset_disable()
  drm/i915: Make intel_encoder_dpms() static
  drm/i915: Make i915_hangcheck_elapsed() static
  drm/i915: Fix #endif comment
  drm/i915: Remove i915_gem_object_check_coherency()
  drm/i915: Remove stale prototypes
  drm/i915: List objects allocated from stolen memory in debugfs
  drm/i915: Always call intel_update_sprite_watermarks() when disabling a plane
  drm/i915: Pass plane and crtc to intel_update_sprite_watermarks
  drm/i915: Don't try to disable plane if it's already disabled
  drm/i915: Pass crtc to our update/disable_plane hooks
  drm/i915: Split plane watermark parameters into a separate struct
  drm/i915: Pull some watermarks state into a separate structure
  drm/i915: Calculate max watermark levels for ILK+
  drm/i915: Rename hsw_lp_wm_result to intel_wm_level
  ...
2013-08-21 12:48:59 +10:00
Chris Wilson 2c22569bba drm/i915: Update rules for writing through the LLC with the cpu
As mentioned in the previous commit, reads and writes from both the CPU
and GPU go through the LLC. This gives us coherency between the CPU and
GPU irrespective of the attribute settings either device sets. We can
use to avoid having to clflush even uncached memory.

Except for the scanout.

The scanout resides within another functional block that does not use
the LLC but reads directly from main memory. So in order to maintain
coherency with the scanout, writes to uncached memory must be flushed.
In order to optimize writes elsewhere, we start tracking whether an
framebuffer is attached to an object.

v2: Use pin_display tracking rather than fb_count (to ensure we flush
cursors as well etc) and only force the clflush along explicit writes to
the scanout paths (i.e. pin_to_display_plane and pwrite into scanout).

v3: Force the flush after hitting the slowpath in pwrite, as after
dropping the lock the object's cache domain may be invalidated. (Ville)

Based on a patch by Ville Syrjälä.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-10 11:20:49 +02:00
Chris Wilson cc98b413c1 drm/i915: Track when an object is pinned for use by the display engine
The display engine has unique coherency rules such that it requires
special handling to ensure that all writes to cursors, scanouts and
sprites are clflushed. This patch introduces the infrastructure to
simply track when an object is being accessed by the display engine.

v2: Explain the is_pin_display() magic as the sources for obj->pin_count
and their individual rules is not obvious. (Ville)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-10 11:19:51 +02:00
Chris Wilson c76ce038e3 drm/i915: Update rules for reading cache lines through the LLC
The LLC is a fun device. The cache is a distinct functional block within
the SA that arbitrates access from both the CPU and GPU cores. As such
all writes to memory land first in the LLC before further action is
taken. For example, an uncached write from either the CPU or GPU will
then proceed to memory and evict the cacheline from the LLC. This means that
a read from the LLC always returns the correct information even if the PTE
bit in the GPU differs from the PAT bit in the CPU. For the older
snooping architecture on non-LLC, the fundamental principle still holds
except that some coordination is required between the CPU and GPU to
explicitly perform the snooping (which is handled by our request
tracking).

The upshot of this is that we know that we can issue a read from either
LLC devices or snoopable memory and trust the contents of the cache -
i.e. we can forgo a clflush before a read in these circumstances.
Writing to memory from the CPU is a little more tricky as we have to
consider that the scanout does not read from the CPU cache at all, but
from main memory. So we have to currently treat all requests to write to
uncached memory as having to be flushed to main memory for coherency
with all consumers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-10 11:19:50 +02:00
Dan Carpenter 58e73e1570 drm/i915: unbreak i915_gem_object_ggtt_unbind()
There is an extra semi-colon here so we just leak and never unbind
anything.

This regression has been introduced in

commit 07fe0b1280
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Wed Jul 31 17:00:10 2013 -0700

    drm/i915: plumb VM into bind/unbind code

Cc: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-09 12:04:53 +02:00
Ben Widawsky 8b9c2b9411 drm/i915: Add vma to list at creation
With the current code there shouldn't be a distinction - however with an
upcoming change we intend to allocate a vma much earlier, before it's
actually bound anywhere.

To do this we have to check node allocation as well for the _bound()
check.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: move list_del(&vma->vma_link) from vma_unbind to vma_destroy,
again fallout from the loss of "rm/i915: Cleanup more of VMA in
destroy".]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

fixup for drm/i915: Add vma to list at creation
2013-08-08 14:10:20 +02:00
Ben Widawsky ca191b1313 drm/i915: mm_list is per VMA
formerly: "drm/i915: Create VMAs (part 5) - move mm_list"

The mm_list is used for the active/inactive LRUs. Since those LRUs are
per address space, the link should be per VMx .

Because we'll only ever have 1 VMA before this point, it's not incorrect
to defer this change until this point in the patch series, and doing it
here makes the change much easier to understand.

Shamelessly manipulated out of Daniel:
"active/inactive stuff is used by eviction when we run out of address
space, so needs to be per-vma and per-address space. Bound/unbound otoh
is used by the shrinker which only cares about the amount of memory used
and not one bit about in which address space this memory is all used in.
Of course to actual kick out an object we need to unbind it from every
address space, but for that we have the per-object list of vmas."

v2: only bump GGTT LRU in i915_gem_object_set_to_gtt_domain (Chris)

v3: Moved earlier in the series

v4: Add dropped message from v3

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Frob patch to apply and use vma->node.size directly as
discused with Ben. Also drop a needles BUG_ON before move_to_inactive,
the function itself has the same check.]
[danvet 2nd: Rebase on top of the lost "drm/i915: Cleanup more of VMA
in destroy", specifically unlink the vma from the mm_list in
vma_unbind (to keep it symmetric with bind_to_vm) instead of
vma_destroy.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-08 14:06:58 +02:00
Ben Widawsky 5cacaac77c drm/i915: Fix up map and fenceable for VMA
formerly: "drm/i915: Create VMAs (part 3.5) - map and fenceable
tracking"

The map_and_fenceable tracking is per object. GTT mapping, and fences
only apply to global GTT. As such,  object operations which are not
performed on the global GTT should not effect mappable or fenceable
characteristics.

Functionally, this commit could very well be squashed in to a previous
patch which updated object operations to take a VM argument.  This
commit is split out because it's a bit tricky (or at least it was for
me).

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Drop the bogus hunk in i915_vma_unbind as discussed with
Ben.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-08 14:04:55 +02:00
Ben Widawsky 9843877d10 drm/i915: turn bound_ggtt checks to bound_any
In some places, we want to know if an object is bound in any address
space, and not just the global GTT. This often applies when there is a
single global resource (object, pages, etc.)

function                             |      reason
--------------------------------------------------
i915_gem_object_is_inactive          | global object
i915_gem_object_put_pages            | object's pages
915_gem_object_unpin                 | global object
i915_gem_execbuffer_unreserve_object | temporary until we plumb vma
pread/pwrite                         | see the note below

Note: set_to_gtt_domain in pwrite/pread is abused as a wait_rendering
call - but that once only worked if the object is bound. We really
should replace this with a plain wait_rendering call, which would have
the upside that in pread it would be clearer that we actually only
wait for oustanding gpu writes.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Explain the set_to_gtt_domain in pwrite/pread and volunteer
Ben to replace those with wait_rendering calls.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-08 14:04:43 +02:00
Ben Widawsky f6cd1f15d3 drm/i915: Use new bind/unbind in eviction code
Eviction code, like the rest of the converted code needs to be aware of
the address space for which it is evicting (or the everything case, all
addresses). With the updated bind/unbind interfaces of the last patch,
we can now safely move the eviction code over.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-08 14:04:43 +02:00
Ben Widawsky 07fe0b1280 drm/i915: plumb VM into bind/unbind code
As alluded to in several patches, and it will be reiterated later... A
VMA is an abstraction for a GEM BO bound into an address space.
Therefore it stands to reason, that the existing bind, and unbind are
the ones which will be the most impacted. This patch implements this,
and updates all callers which weren't already updated in the series
(because it was too messy).

This patch represents the bulk of an earlier, larger patch. I've pulled
out a bunch of things by the request of Daniel. The history is preserved
for posterity with the email convention of ">" One big change from the
original patch aside from a bunch of cropping is I've created an
i915_vma_unbind() function. That is because we always have the VMA
anyway, and doing an extra lookup is useful. There is a caveat, we
retain an i915_gem_object_ggtt_unbind, for the global cases which might
not talk in VMAs.

> drm/i915: plumb VM into object operations
>
> This patch was formerly known as:
> "drm/i915: Create VMAs (part 3) - plumbing"
>
> This patch adds a VM argument, bind/unbind, and the object
> offset/size/color getters/setters. It preserves the old ggtt helper
> functions because things still need, and will continue to need them.
>
> Some code will still need to be ported over after this.
>
> v2: Fix purge to pick an object and unbind all vmas
> This was doable because of the global bound list change.
>
> v3: With the commit to actually pin/unpin pages in place, there is no
> longer a need to check if unbind succeeded before calling put_pages().
> Make put_pages only BUG() after checking pin count.
>
> v4: Rebased on top of the new hangcheck work by Mika
> plumbed eb_destroy also
> Many checkpatch related fixes
>
> v5: Very large rebase
>
> v6:
> Change BUG_ON to WARN_ON (Daniel)
> Rename vm to ggtt in preallocate stolen, since it is always ggtt when
> dealing with stolen memory. (Daniel)
> list_for_each will short-circuit already (Daniel)
> remove superflous space (Daniel)
> Use per object list of vmas (Daniel)
> Make obj_bound_any() use obj_bound for each vm (Ben)
> s/bind_to_gtt/bind_to_vm/ (Ben)
>
> Fixed up the inactive shrinker. As Daniel noticed the code could
> potentially count the same object multiple times. While it's not
> possible in the current case, since 1 object can only ever be bound into
> 1 address space thus far - we may as well try to get something more
> future proof in place now. With a prep patch before this to switch over
> to using the bound list + inactive check, we're now able to carry that
> forward for every address space an object is bound into.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Rebase on top of the loss of "drm/i915: Cleanup more of VMA
in destroy".]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-08 14:04:20 +02:00
Ben Widawsky 80dcfdbd68 drm/i915: Rework __i915_gem_shrink
In order to do this for all VMs, it's convenient to rework the logic a
bit. This should have no functional impact.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-08 14:02:41 +02:00
Dave Airlie 32c913e436 Merge tag 'drm-intel-next-2013-07-26-fixed' of git://people.freedesktop.org/~danvet/drm-intel into drm-next
Neat that QA (and Ben) keeps on humming along while I'm on vacation, so
you already get the next feature pull request:
- proper eLLC support for HSW from Ben
- more interrupt refactoring
- add w/a tags where we implement them already (Damien)
- hangcheck fixes (Chris) + hangcheck stats (Mika)
- flesh out the new vm structs for ppgtt and ggtt (Ben)
- PSR for Haswell, still disabled by default (Rodrigo et al.)
- pc8+ refclock sequence code from Paulo
- more interrupt refactoring from Paulo, unifying ilk/snb with the ivb/hsw
  interrupt code
- full solution for the Haswell concurrent reg access issues (Chris)
- fix racy object accounting, used by some new leak tests
- fix sync polarity settings on ch7xxx dvo encoder
- random bits&pieces, little fixes and better debug output all over

[airlied: fix conflict with drm_mm cleanups]

* tag 'drm-intel-next-2013-07-26-fixed' of git://people.freedesktop.org/~danvet/drm-intel: (289 commits)
  drm/i915: Do not dereference NULL crtc or fb until after checking
  drm/i915: fix pnv display core clock readout out
  drm/i915: Replace open-coded offset_in_page()
  drm/i915: Retry DP aux_ch communications with a different clock after failure
  drm/i915: Add messages useful for HPD storm detection debugging (v2)
  drm/i915: dvo_ch7xxx: fix vsync polarity setting
  drm/i915: fix the racy object accounting
  drm/i915: Convert the register access tracepoint to be conditional
  drm/i915: Squash gen lookup through multiple indirections inside GT access
  drm/i915: Use the common register access functions for NOTRACE variants
  drm/i915: Use a private interface for register access within GT
  drm/i915: Colocate all GT access routines in the same file
  drm/i915: fix reference counting in i915_gem_create
  drm/i915: Use Graphics Base of Stolen Memory on all gen3+
  drm/i915: disable stolen mem for OVERLAY_NEEDS_PHYSICAL
  drm/i915: add functions to disable and restore LCPLL
  drm/i915: disable CLKOUT_DP when it's not needed
  drm/i915: extend lpt_enable_clkout_dp
  drm/i915: fix up error cleanup in i915_gem_object_bind_to_gtt
  drm/i915: Add some debug breadcrumbs to connector detection
  ...
2013-08-07 18:11:35 +10:00
David Herrmann 31e5d7c67b drm/mm: add "best_match" flag to drm_mm_insert_node()
Add a "best_match" flag similar to the drm_mm_search_*() helpers so we
can convert TTM to use them in follow up patches. We can also inline the
non-generic helpers and move them into the header to allow compile-time
optimizations.

To make calls to drm_mm_{search,insert}_node() more readable, this
converts the boolean argument to a flagset. There are pending patches that
add additional flags for top-down allocators and more.

v2:
 - use flag parameter instead of boolean "best_match"
 - convert *_search_free() helpers to also use flags argument

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-08-07 10:08:58 +10:00
Daniel Vetter 43387b37fa drm/gem: create drm_gem_dumb_destroy
All the gem based kms drivers really want the same function to
destroy a dumb framebuffer backing storage object.

So give it to them and roll it out in all drivers.

This still leaves the option open for kms drivers which don't use GEM
for backing storage, but it does decently simplify matters for gem
drivers.

Acked-by: Inki Dae <inki.dae@samsung.com>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Intel Graphics Development <intel-gfx@lists.freedesktop.org>
Cc: Ben Skeggs <skeggsb@gmail.com>
Reviwed-by: Rob Clark <robdclark@gmail.com>
Cc: Alex Deucher <alexdeucher@gmail.com>
Acked-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-08-07 09:59:24 +10:00
Ben Widawsky 637efacf8f drm/i915: eliminate dead domain clearing on reset
The code itself is no longer accurate without updating once we have
multiple address space since clearing the domains of every object
requires scanning the inactive list for all VMs.

"This code is dead. Just remove it rather than port it to vma." - Chris
Wilson

Recommended-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-05 19:12:25 +02:00
Ben Widawsky d1ccbb5d71 drm/i915: make reset&hangcheck code VM aware
Hangcheck, and some of the recent reset code for guilty batches need to
know which address space the object was in at the time of a hangcheck.
This is because we use offsets in the (PP|G)GTT to determine this
information, and those offsets can differ depending on which VM they are
bound into.

Since we still only have 1 VM ever, this code shouldn't yet have any
impact.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-05 19:04:13 +02:00
Ben Widawsky 3e12302705 drm/i915: BUG_ON put_pages later
With multiple VMs, the eviction code benefits from being able to blindly
put pages without needing to know if there are any entities still
holding on to those pages. As such it's preferable to return the -EBUSY
before the BUG.

Eviction code is the only user for now, but overall it makes sense
anyway, IMO.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-05 19:04:13 +02:00
Ben Widawsky 3089c6f239 drm/i915: make caching operate on all address spaces
For now, objects will maintain the same cache levels amongst all address
spaces. This is to limit the risk of bugs, as playing with cacheability
in the different domains can be very error prone.

In the future, it may be optimal to allow setting domains per VMA (ie.
an object bound into an address space).

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-05 19:04:12 +02:00
Ben Widawsky c37e220461 drm/i915: Add VM to pin
To verbalize it, one can say, "pin an object into the given address
space." The semantics of pinning remain the same otherwise.

Certain objects will always have to be bound into the global GTT.
Therefore, global GTT is a special case, and keep a special interface
around for it (i915_gem_obj_ggtt_pin).

v2: s/i915_gem_ggtt_pin/i915_gem_obj_ggtt_pin

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-05 19:04:09 +02:00