Commit Graph

1156 Commits

Author SHA1 Message Date
Daniel Vetter 2232f0315c drm/i915: Fix EIO/wedged handling in gem fault handler
In

commit 1f83fee08d
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Thu Nov 15 17:17:22 2012 +0100

    drm/i915: clear up wedged transitions

I've accidentally inverted the EIO/wedged handling in the fault
handler: We want to return the EIO as a SIGBUS only if it's not
because of the gpu having died, to prevent userspace from unduly
dying.

In my defence the comment right above is completely misleading, so fix
both.

v2: Drop the WARN_ON, it's not actually a bug to e.g. receive an -EIO
when swap-in fails.

v3: Don't remove too much ... oops.

Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2014-09-08 08:38:50 +03:00
Ville Syrjälä 81e7f2002b drm/i915: Idle unused rings on gen2/3 during init/resume
gen2/3 platforms have a boatload of rings we're not using. On my 830
the BIOS/hw can leave some of those "active" after resume which will
prevent c3 entry. The ring is apparently considered active whenever
head != tail even if the ring is disabled.

Disable and clear all such unused ringbuffers on init/resume.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-09-03 11:05:06 +02:00
Thomas Daniel ecdb5fd861 drm/i915/bdw: Don't execute context reset and switch with Execlists
These two functions make no sense in an Logical Ring Context & Execlists
world.

v2: We got rid of lrc_enabled and centralized everything in the sanitized
i915.enable_execlists instead.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>

v3: Rebased.  Corrected a typo in comment for i915_switch_context and
added a comment that it should not be called in execlist mode. Added
WARN_ON if i915_switch_context is called in execlist mode. Moved check
for execlist mode out of i915_switch_context and into callers. Added
comment in context_reset explaining why nothing is done in execlist
mode.

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
[danvet: Simplify the patch subject so I can understand it.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-09-03 11:04:17 +02:00
McAulay, Alistair 6689c167ae drm/i915: Rework GPU reset sequence to match driver load & thaw
This patch is to address Daniels concerns over different code during reset:

http://lists.freedesktop.org/archives/intel-gfx/2014-June/047758.html

"The reason for aiming as hard as possible to use the exact same code for
driver load, gpu reset and runtime pm/system resume is that we've simply
seen too many bugs due to slight variations and unintended omissions."

Tested using igt drv_hangman.

V2: Cleaner way of preventing check_wedge returning -EAGAIN
V3: Clean the last_context during reset, to ensure do_switch() does the MI_SET_CONTEXT. As per review.
Signed-off-by: McAulay, Alistair <alistair.mcaulay@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
[danvet: Rebase over ctx->ppgtt rework and extend the comment in
check_wedge a bit.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-09-03 10:54:09 +02:00
Dave Airlie a18b29f0c6 Merge tag 'drm-intel-next-2014-09-01' of git://anongit.freedesktop.org/drm-intel into drm-next
drm-intel-next-2014-08-22:
- basic code for execlist, which is the fancy new cmd submission on gen8. Still
  disabled by default (Ben, Oscar Mateo, Thomas Daniel et al)
- remove the useless usage of console_lock for I915_FBDEV=n (Chris)
- clean up relations between ctx and ppgtt
- clean up ppgtt lifetime handling (Michel Thierry)
- various cursor code improvements from Ville
- execbuffer code cleanups and secure batch fixes (Chris)
- prep work for dev -> dev_priv transition (Chris)
- some of the prep patches for the seqno -> request object transition (Chris)
- various small improvements all over

* tag 'drm-intel-next-2014-09-01' of git://anongit.freedesktop.org/drm-intel: (86 commits)
  drm/i915: fix suspend/resume for GENs w/o runtime PM support
  drm/i915: Update DRIVER_DATE to 20140822
  drm: fix plane rotation when restoring fbdev configuration
  drm/i915/bdw: Disable execlists by default
  drm/i915/bdw: Enable Logical Ring Contexts (hence, Execlists)
  drm/i915/bdw: Document Logical Rings, LR contexts and Execlists
  drm/i915/bdw: Print context state in debugfs
  drm/i915/bdw: Display context backing obj & ringbuffer info in debugfs
  drm/i915/bdw: Display execlists info in debugfs
  drm/i915/bdw: Disable semaphores for Execlists
  drm/i915/bdw: Make sure gpu reset still works with Execlists
  drm/i915/bdw: Don't write PDP in the legacy way when using LRCs
  drm/i915: Track cursor changes as frontbuffer tracking flushes
  drm/i915/bdw: Help out the ctx switch interrupt handler
  drm/i915/bdw: Avoid non-lite-restore preemptions
  drm/i915/bdw: Handle context switch events
  drm/i915/bdw: Two-stage execlist submit process
  drm/i915/bdw: Write the tail pointer, LRC style
  drm/i915/bdw: Implement context switching (somewhat)
  drm/i915/bdw: Emission of requests with logical rings
  ...

Conflicts:
	drivers/gpu/drm/i915/i915_drv.c
2014-09-03 08:30:48 +10:00
Oscar Mateo cc9130be80 drm/i915/bdw: Make sure gpu reset still works with Execlists
If we reset a ring after a hang, we have to make sure that we clear
out all queued Execlists requests.

v2: The ring is, at this point, already being correctly re-programmed
for Execlists, and the hangcheck counters cleared.

v3: Daniel suggests to drop the "if (execlists)" because the Execlists
queue should be empty in legacy mode (which is true, if we do the
INIT_LIST_HEAD).

v4: Do the pending intel_runtime_pm_put

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-20 17:17:48 +02:00
Oscar Mateo 48e29f5535 drm/i915/bdw: Emission of requests with logical rings
On a previous iteration of this patch, I created an Execlists
version of __i915_add_request and asbtracted it away as a
vfunc. Daniel Vetter wondered then why that was needed:

"with the clean split in command submission I expect every
function to know wether it'll submit to an lrc (everything in
intel_lrc.c) or wether it'll submit to a legacy ring (existing
code), so I don't see a need for an add_request vfunc."

The honest, hairy truth is that this patch is the glue keeping
the whole logical ring puzzle together:

- i915_add_request is used by intel_ring_idle, which in turn is
  used by i915_gpu_idle, which in turn is used in several places
  inside the eviction and gtt codes.
- Also, it is used by i915_gem_check_olr, which is littered all
  over i915_gem.c
- ...

If I were to duplicate all the code that directly or indirectly
uses __i915_add_request, I'll end up creating a separate driver.

To show the differences between the existing legacy version and
the new Execlists one, this time I have special-cased
__i915_add_request instead of adding an add_request vfunc. I
hope this helps to untangle this Gordian knot.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Adjust to ringbuf->FIXME_lrc_ctx per the discussion with
Thomas Daniel.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-14 22:02:55 +02:00
Daniel Vetter 82460d9724 drm/i915: Rework ppgtt init to no require an aliasing ppgtt
Currently we abuse the aliasing ppgtt to set up the ppgtt support in
general. Which is a bit backwards since with full ppgtt we don't ever
need the aliasing ppgtt.

So untangle this and separate the ppgtt init from the aliasing
ppgtt. While at it drag it out of the context enabling (which just
does a switch to the default context).

Note that we still have the differentiation between synchronous and
asynchronous ppgtt setup, but that will soon vanish. So also correctly
wire up the return value handling to be prepared for when ->switch_mm
drops the synchronous parameter and could start to fail.

Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-13 14:23:31 +02:00
Daniel Vetter 896ab1a5d5 drm/i915: Fix up checks for aliasing ppgtt
A subsequent patch will no longer initialize the aliasing ppgtt if we
have full ppgtt enabled, since we simply don't need that any more.

Unfortunately a few places check for the aliasing ppgtt instead of
checking for ppgtt in general. Fix them up.

One special case are the gtt offset and size macros, which have some
code to remap the aliasing ppgtt to the global gtt. The aliasing ppgtt
is _not_ a logical address space, so passing that in as the vm is
plain and simple a bug. So just WARN about it and carry on - we have a
gracefully fall-through anyway if we can't find the vma.

Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-13 14:23:31 +02:00
Daniel Vetter 6c5566a82c drm/i915: Allow i915_gem_setup_global_gtt to fail
We already needs this just as a safety check in case the preallocation
reservation dance fails. But we definitely need this to be able to
move tha aliasing ppgtt setup back out of the context code to this
place, where it belongs.

Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-13 14:23:30 +02:00
Daniel Vetter 5dc383b05a drm/i915: Add proper prefix to obj_to_ggtt
Stuff in headers really aught to have this.

Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-13 14:23:29 +02:00
Daniel Vetter 841cd77375 drm/i915: Only refcount ppgtt if it actually is one
This essentially unbreaks non-ppgtt operation where we'd scribble over
random memory.

While at it give the vm_to_ppgtt function a proper prefix and make it
a bit more paranoid.

Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-13 14:23:29 +02:00
Daniel Vetter ee960be7bb drm/i915: Some cleanups for the ppgtt lifetime handling
So when reviewing Michel's patch I've noticed a few things and cleaned
them up:
- The early checks in ppgtt_release are now redundant: The inactive
  list should always be empty now, so we can ditch these checks. Even
  for the aliasing ppgtt (though that's a different confusion) since
  we tear that down after all the objects are gone.
- The ppgtt handling functions are splattered all over. Consolidate
  them in i915_gem_gtt.c, give them OCD prefixes and add wrappers for
  get/put.
- There was a bit a confusion in ppgtt_release about whether it cares
  about the active or inactive list. It should care about them both,
  so augment the WARNINGs to check for both.

There's still create_vm_for_ctx left to do, put that is blocked on the
removal of ppgtt->ctx. Once that's done we can rename it to
i915_ppgtt_create and move it to its siblings for handling ppgtts.

v2: Move the ppgtt checks into the inline get/put functions as
suggested by Chris.

v3: Inline the now redundant ppgtt local variable.

Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-12 15:24:04 +02:00
Michel Thierry b9d06dd9d1 drm/i915: vma/ppgtt lifetime rules
VMAs should take a reference of the address space they use.

Now, when the fd is closed, it will release the ref that the context was
holding, but it will still be referenced by any vmas that are still
active.

ppgtt_release() should then only be called when the last thing referencing
it releases the ref, and it can just call the base cleanup and free the
ppgtt.

Note that with this we will extend the lifetime of ppgtts which
contain shared objects. But all the non-shared objects will get
removed as soon as they drop of the active list and for the shared
ones the shrinker can eventually reap them. Since we currently can't
evict ppgtt pagetables either I don't think that temporary leak is
important.

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
[danvet: Add note about potential ppgtt leak with this approach.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-12 15:22:26 +02:00
Oscar Mateo 454afebde8 drm/i915/bdw: Skeleton for the new logical rings submission path
Execlists are indeed a brave new world with respect to workload
submission to the GPU.

In previous version of these series, I have tried to impact the
legacy ringbuffer submission path as little as possible (mostly,
passing the context around and using the correct ringbuffer when I
needed one) but Daniel is afraid (probably with a reason) that
these changes and, especially, future ones, will end up breaking
older gens.

This commit and some others coming next will try to limit the
damage by creating an alternative path for workload submission.
The first step is here: laying out a new ring init/fini.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 16:40:57 +02:00
Oscar Mateo a83014d3f8 drm/i915: Abstract the legacy workload submission mechanism away
As suggested by Daniel Vetter. The idea, in subsequent patches, is to
provide an alternative to these vfuncs for the Execlists submission
mechanism.

v2: Splitted into two and reordered to illustrate our intentions, instead
of showing it off. Also, remove the add_request vfunc and added the
stop_ring one.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet:
- Make checkpatch happy.
- Be grumpy about the excessive vtable.
- Ditch gt->is_ring_initialized.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 16:40:32 +02:00
Oscar Mateo 127f100369 drm/i915/bdw: Macro for LRCs and module option for Execlists
GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts".
These expanded contexts enable a number of new abilities, especially
"Execlists".

The macro is defined to off until we have things in place to hope to
work.

v2: Rename "advanced contexts" to the more correct "logical ring
contexts".

v3: Add a module parameter to enable execlists. Execlist are relatively
new, and so it'd be wise to be able to switch back to ring submission
to debug subtle problems that will inevitably arise.

v4: Add an intel_enable_execlists function.

v5: Sanitize early, as suggested by Daniel. Remove lrc_enabled.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> (v3)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2, v4 & v5)
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 16:00:27 +02:00
Chris Wilson 82b6b6d786 drm/i915: Remove fenced_gpu_access and pending_fenced_gpu_access
This migrates the fence tracking onto the existing seqno
infrastructure so that the later conversion to tracking via requests is
simplified.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 12:20:25 +02:00
Chris Wilson e6a844687c drm/i915: Force CPU relocations if not GTT mapped
Move the decision on whether we need to have a mappable object during
execbuffer to the fore and then reuse that decision by propagating the
flag through to reservation. As a corollary, before doing the actual
relocation through the GTT, we can make sure that we do have a GTT
mapping through which to operate.

Note that the key to make this work is to ditch the
obj->map_and_fenceable unbind optimization - with full ppgtt it
doesn't make a lot of sense any more anyway.

v2: Revamp and resend to ease future patches.
v3: Refresh patch rationale

References: https://bugs.freedesktop.org/show_bug.cgi?id=81094
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
[danvet: Explain why obj->map_and_fenceable is key and split out the
secure batch fix.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 12:01:29 +02:00
Chris Wilson dc8cd1e790 drm/i915: Only perform set-to-gtt domain for objects bound to the global gtt
If an object is not bound into the global GTT, then it cannot be
accessed via the GTT. This restores the original code that was muddled
by ppGTT. In the process, we remove a WARN that had long outlived its
usefulness and was simply being coded around instead.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 11:36:12 +02:00
Linus Torvalds 889fa782bf Merge tag 'drm-intel-fixes-2014-08-08' of git://anongit.freedesktop.org/drm-intel
Pull intel drm fixes from Daniel Vetter:
 "So I heard that proper pull requests have a revert on top ;-) So here
  we go with my usual mid-merge-window pile of fixes.

[ Ed. This revert thing had better not become the "in" thing ]

   Big fix is the duct-tape for ring init on g4x platforms, we seem to
  have found the magic again to make those machines as happy as before
  (not perfect though unfortunately, but that was never the case).

  Otherwise fixes all over:
   - tune down some overzealous debug output
   - VDD power sequencing fix after resume
   - bunch of dsi fixes for baytrail among them hw state checker
     de-noising
   - bunch of error state capture fixes for bdw
   - misc tiny fixes/workarounds for various platforms

  Last minute rebase was to kick out two patches that shouldn't have
  been in here - they're for the state checker, so 0 functional code
  affected.

  Jani's back from vacation, so he'll take over -fixes from here"

* tag 'drm-intel-fixes-2014-08-08' of git://anongit.freedesktop.org/drm-intel: (21 commits)
  Revert "drm/i915: Enable semaphores on BDW"
  drm/i915: read HEAD register back in init_ring_common() to enforce ordering
  drm/i915: Fix crash when failing to parse MIPI VBT
  drm/i915: Bring GPU Freq to min while suspending.
  drm/i915: Fix DEIER and GTIER collecting for BDW.
  drm/i915: Don't accumulate hangcheck score on forward progress
  drm/i915: Add the WaCsStallBeforeStateCacheInvalidate:bdw workaround.
  drm/i915: Refactor Broadwell PIPE_CONTROL emission into a helper.
  drm/i915: Fix threshold for choosing 32 vs. 64 precisions for VLV DDL values
  drm/i915: Fix drain latency precision multipler for VLV
  drm/i915: Collect gtier properly on HSW.
  drm/i915: Tune down MCH_SSKPD values warning
  drm/i915: Tune done rc6 enabling output
  drm/i915: Don't require dev->struct_mutex in psr_match_conditions
  drm/i915: Fix error state collecting
  drm/i915: fix VDD state tracking after system resume
  drm/i915: Add correct hw/sw config check for DSI encoder
  drm/i915: factor out intel_edp_panel_vdd_sanitize
  drm/i915: wait for all DSI FIFOs to be empty
  drm/i915: work around warning in i915_gem_gtt
  ...
2014-08-08 10:24:36 -07:00
Linus Torvalds a7d7a143d0 Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
Pull DRM updates from Dave Airlie:
 "Like all good pull reqs this ends with a revert, so it must mean we
  tested it,

[ Ed. That's _one_ way of looking at it ]

  This pull is missing nouveau, Ben has been stuck trying to track down
  a very longstanding bug that revealed itself due to some other
  changes.  I've asked him to send you a direct pull request for nouveau
  once he cleans things up.  I'm away until Monday so don't want to
  delay things, you can make a decision on that when he sends it, I have
  my phone so I can ack things just not really merge much.

  It has one trivial conflict with your tree in armada_drv.c, and also
  the pull request contains some component changes that are already in
  your tree, the base tree from Russell went via Greg's tree already,
  but some stuff still shows up in here that doesn't when I merge my
  tree into yours.

  Otherwise all pretty standard graphics fare, one new driver and
  changes all over the place.

  New drivers:
   - sti kms driver for STMicroelectronics chipsets stih416 and stih407.

  core:
   - lots of cleanups to the drm core
   - DP MST helper code merged
   - universal cursor planes.
   - render nodes enabled by default

  panel:
   - better panel interfaces
   - new panel support
   - non-continuous cock advertising ability

  ttm:
   - shrinker fixes

  i915:
   - hopefully ditched UMS support
   - runtime pm fixes
   - psr tracking and locking - now enabled by default
   - userptr fixes
   - backlight brightness fixes
   - MST support merged
   - runtime PM for dpms
   - primary planes locking fixes
   - gen8 hw semaphore support
   - fbc fixes
   - runtime PM on SOix sleep state hw.
   - mmio base page flipping
   - lots of vlv/chv fixes.
   - universal cursor planes

  radeon:
   - Hawaii fixes
   - display scalar support for non-fixed mode displays
   - new firmware format support
   - dpm on more asics by default
   - GPUVM improvements
   - uncached and wc GTT buffers
   - BOs > visible VRAM

  exynos:
   - i80 interface support
   - module auto-loading
   - ipp driver consolidated.

  armada:
   - irq handling in crtc layer only
   - crtc renumbering
   - add component support
   - DT interaction changes.

  tegra:
   - load as module fixes
   - eDP bpp and sync polarity fixed
   - DSI non-continuous clock mode support
   - better support for importing buffers from nouveau

  msm:
   - mdp5/adq8084 v1.3 hw enablement
   - devicetree clk changse
   - ifc6410 board working

  tda998x:
   - component support
   - DT documentation update

  vmwgfx:
   - fix compat shader namespace"

* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (551 commits)
  Revert "drm: drop redundant drm_file->is_master"
  drm/panel: simple: Use devm_gpiod_get_optional()
  drm/dsi: Replace upcasting macro by function
  drm/panel: ld9040: Replace upcasting macro by function
  drm/exynos: dp: Modify driver to support drm_panel
  drm/exynos: Move DP setup into commit()
  drm/panel: simple: Add AUO B133HTN01 panel support
  drm/panel: simple: Support delays in panel functions
  drm/panel: simple: Add proper definition for prepare and unprepare
  drm/panel: s6e8aa0: Add proper definition for prepare and unprepare
  drm/panel: ld9040: Add proper definition for prepare and unprepare
  drm/tegra: Add support for panel prepare and unprepare routines
  drm/exynos: dsi: Add support for panel prepare and unprepare routines
  drm/exynos: dpi: Add support for panel prepare and unprepare routines
  drm/panel: simple: Add dummy prepare and unprepare routines
  drm/panel: s6e8aa0: Add dummy prepare and unprepare routines
  drm/panel: ld9040: Add dummy prepare and unprepare routines
  drm/panel: Provide convenience wrapper for .get_modes()
  drm/panel: add .prepare() and .unprepare() functions
  drm/panel: simple: Remove simple-panel compatible
  ...
2014-08-07 17:36:12 -07:00
Deepak S 274fa1c1ac drm/i915: Bring GPU Freq to min while suspending.
We might be leaving the PGU Frequency (and thus vnn) high during the suspend.
Flusing the delayed work queue should take care of this.

Signed-off-by: Deepak S <deepak.s@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-07 14:04:09 +02:00
Linus Torvalds e7fda6c4c3 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer and time updates from Thomas Gleixner:
 "A rather large update of timers, timekeeping & co

   - Core timekeeping code is year-2038 safe now for 32bit machines.
     Now we just need to fix all in kernel users and the gazillion of
     user space interfaces which rely on timespec/timeval :)

   - Better cache layout for the timekeeping internal data structures.

   - Proper nanosecond based interfaces for in kernel users.

   - Tree wide cleanup of code which wants nanoseconds but does hoops
     and loops to convert back and forth from timespecs.  Some of it
     definitely belongs into the ugly code museum.

   - Consolidation of the timekeeping interface zoo.

   - A fast NMI safe accessor to clock monotonic for tracing.  This is a
     long standing request to support correlated user/kernel space
     traces.  With proper NTP frequency correction it's also suitable
     for correlation of traces accross separate machines.

   - Checkpoint/restart support for timerfd.

   - A few NOHZ[_FULL] improvements in the [hr]timer code.

   - Code move from kernel to kernel/time of all time* related code.

   - New clocksource/event drivers from the ARM universe.  I'm really
     impressed that despite an architected timer in the newer chips SoC
     manufacturers insist on inventing new and differently broken SoC
     specific timers.

[ Ed. "Impressed"? I don't think that word means what you think it means ]

   - Another round of code move from arch to drivers.  Looks like most
     of the legacy mess in ARM regarding timers is sorted out except for
     a few obnoxious strongholds.

   - The usual updates and fixlets all over the place"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
  timekeeping: Fixup typo in update_vsyscall_old definition
  clocksource: document some basic timekeeping concepts
  timekeeping: Use cached ntp_tick_length when accumulating error
  timekeeping: Rework frequency adjustments to work better w/ nohz
  timekeeping: Minor fixup for timespec64->timespec assignment
  ftrace: Provide trace clocks monotonic
  timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC
  seqcount: Add raw_write_seqcount_latch()
  seqcount: Provide raw_read_seqcount()
  timekeeping: Use tk_read_base as argument for timekeeping_get_ns()
  timekeeping: Create struct tk_read_base and use it in struct timekeeper
  timekeeping: Restructure the timekeeper some more
  clocksource: Get rid of cycle_last
  clocksource: Move cycle_last validation to core code
  clocksource: Make delta calculation a function
  wireless: ath9k: Get rid of timespec conversions
  drm: vmwgfx: Use nsec based interfaces
  drm: i915: Use nsec based interfaces
  timekeeping: Provide ktime_get_raw()
  hangcheck-timer: Use ktime_get_ns()
  ...
2014-08-05 17:46:42 -07:00
Thomas Gleixner 5ed0bdf21a drm: i915: Use nsec based interfaces
Use ktime_get_raw_ns() and get rid of the back and forth timespec
conversions.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2014-07-23 15:01:50 -07:00
Chris Wilson eedd10f45b drm/i915: Simplify i915_gem_release_all_mmaps()
An object can only have an active gtt mapping if it is currently bound
into the global gtt. Therefore we can simply walk the list of all bound
objects and check the flag upon those for an active gtt mapping.

From commit 48018a57a8
Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
Date:   Fri Dec 13 15:22:31 2013 -0200

    drm/i915: release the GTT mmaps when going into D3

Also note that the WARN is inappropriate for this function as GPU
activity is orthogonal to GTT mmap status. Rather it is the caller that
relies upon this condition and so it should assert that the GPU is idle
itself.

References: https://bugs.freedesktop.org/show_bug.cgi?id=80081
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
[danvet: cherry-pick from -next to -fixes.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-07-23 16:09:51 +02:00
Armin Reese 9490edb588 drm/i915: Do not unmap object unless no other VMAs reference it
When using an IOMMU, GEM objects are mapped by their DMA address as the
physical address is unknown. This depends on the underlying IOMMU
driver to map and unmap the physical pages properly as defined in
intel_iommu.c.

The current code will tell the IOMMU to unmap the GEM BO's pages on the
destruction of the first VMA that "maps" that BO. This is clearly wrong
as there may be other VMAs "mapping" that BO (using flink). The scanout
is one such example.

The patch fixes this issue by only unmapping the DMA maps when there are
no more VMAs mapping that object. This is equivalent to when an object
is considered unbound as can be seen by the code. On the first VMA that
again because bound, we will remap.

An alternate solution would be to move the dma mapping to object
creation and destrubtion. I am not sure if this is considered an
unfriendly thing to do.

Some notes to backporters trying to backport full PPGTT:

The bug can never be hit without enabling the IOMMU. The existing code
will also do the right thing when the object is shared via dmabuf. The
failure should be demonstrable with flink. In cases when not using
intel_iommu_strict it is likely (likely, as defined by: off the top of
my head) on current workloads to *not* hit this bug since we often
teardown all VMAs for an object shared across multiple VMs.  We also
finish access to that object before the first dma_unmapping.
intel_iommu_strict with flinked buffers is likely to hit this issue.

Signed-off-by: Armin Reese <armin.c.reese@intel.com>
[danvet: Add the excellent commit message provided by Ben.]
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-07-23 07:05:40 +02:00
Jesse Barnes 9df7575f1c drm/i915: add helper for checking whether IRQs are enabled
Now that we use the runtime IRQ enable/disable functions in our suspend
path, we can simply check the pm._irqs_disabled flag everywhere.  So
rename it to catch the users, and add an inline for it to make the
checks clear everywhere.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-07-23 07:05:34 +02:00
Chris Wilson a1db2fa7c8 drm/i915: Abandon oom quickly if killed by a signal
Whilst waiting to obtain our locks for the last resort shrinking before
an oom, we check whether or not a fatal signal was pending. If there was,
we do not need to keep waiting as the oom will be aborted.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-07-23 07:05:28 +02:00
Oscar Mateo 1b5d063faf drm/i915: Generalize intel_ring_get_tail to take a ringbuf
Again, it's low-level enough to simply take a ringbuf and nothing
else.

Trivial change.

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-07-08 12:31:02 +02:00
Chris Wilson ec5cc0f9b0 drm/i915: Restrict GPU boost to the RCS engine
Make the assumption that media workloads are not as latency sensitive
for __wait_seqno, and that upclocking the GPU does not affect the BLT
engine. Under that assumption, we only wait to forcibly upclock the GPU
when we are stalling for results from the render pipeline.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Deepak S<deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-07-08 10:25:17 +02:00
Rodrigo Vivi ddd4dbc6c1 drm/i915: Updating comments.
ring index calculation table was out of date after other rings were added,
although the formula is flexible and scale when adding new rings.

So this patch just update the comments and add a brief explanation
why to use sync_seqno[ring index].

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-07-07 22:02:49 +02:00
Daniel Vetter f99d70690e drm/i915: Track frontbuffer invalidation/flushing
So these are the guts of the new beast. This tracks when a frontbuffer
gets invalidated (due to frontbuffer rendering) and hence should be
constantly scaned out, and when it's flushed again and can be
compressed/one-shot-upload.

Rules for flushing are simple: The frontbuffer needs one more full
upload starting from the next vblank. Which means that the flushing
can _only_ be called once the frontbuffer update has been latched.

But this poses a problem for pageflips: We can't just delay the
flushing until the pageflip is latched, since that would pose the risk
that we override frontbuffer rendering that has been scheduled
in-between the pageflip ioctl and the actual latching.

To handle this track asynchronous invalidations (and also pageflip)
state per-ring and delay any in-between flushing until the rendering
has completed. And also cancel any delayed flushing if we get a new
invalidation request (whether delayed or not).

Also call intel_mark_fb_busy in both cases in all cases to make sure
that we keep the screen at the highest refresh rate both on flips,
synchronous plane updates and for frontbuffer rendering.

v2: Lots of improvements

Suggestions from Chris:
- Move invalidate/flush in flush_*_domain and set_to_*_domain.
- Drop the flush in busy_ioctl since it's redundant. Was a leftover
  from an earlier concept to track flips/delayed flushes.
- Don't forget about the initial modeset enable/final disable.
  Suggested by Chris.

Track flips accurately, too. Since flips complete independently of
rendering we need to track pending flips in a separate mask. Again if
an invalidate happens we need to cancel the evenutal flush to avoid
races.

v3:
Provide correct header declarations for flip functions. Currently not
needed outside of intel_display.c, but part of the proper interface.

v4: Add proper domain management to fbcon so that the fbcon buffer is
also tracked correctly.

v5: Fixup locking around the fbcon set_to_gtt_domain call.

v6: More comments from Chris:
- Split out fbcon changes.
- Drop superflous checks for potential scanout before calling intel_fb
  functions - we can micro-optimize this later.
- s/intel_fb_/intel_fb_obj_/ to make it clear that this deals in gem
  object. We already have precedence for fb_obj in the pin_and_fence
  functions.

v7: Clarify the semantics of the flip flush handling by renaming
things a bit:
- Don't go through a gem object but take the relevant frontbuffer bits
  directly. These functions center on the plane, the actual object is
  irrelevant - even a flip to the same object as already active should
  cause a flush.
- Add a new intel_frontbuffer_flip for synchronous plane updates. It
  currently just calls intel_frontbuffer_flush since the implemenation
  differs.

This way we achieve a clear split between one-shot update events on
one side and frontbuffer rendering with potentially a very long delay
between the invalidate and flush.

Chris and I also had some discussions about mark_busy and whether it
is appropriate to call from flush. But mark busy is a state which
should be derived from the 3 events (invalidate, flush, flip) we now
have by the users, like psr does by tracking relevant information in
psr.busy_frontbuffer_bits. DRRS (the only real use of mark_busy for
frontbuffer) needs to have similar logic. With that the overall
mark_busy in the core could be removed.

v8: Only when retiring gpu buffers only flush frontbuffer bits we
actually invalidated in a batch. Just for safety since before any
additional usage/invalidate we should always retire current rendering.
Suggested by Chris Wilson.

v9: Actually use intel_frontbuffer_flip in all appropriate places.
Spotted by Chris.

v10: Address more comments from Chris:
- Don't call _flip in set_base when the crtc is inactive, avoids redunancy
  in the modeset case with the initial enabling of all planes.
- Add comments explaining that the initial/final plane enable/disable
  still has work left to do before it's fully generic.

v11: Only invalidate for gtt/cpu access when writing. Spotted by Chris.

v12: s/_flush/_flip/ in intel_overlay.c per Chris' comment.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-19 18:14:47 +02:00
Daniel Vetter a071fa0064 drm/i915: Introduce accurate frontbuffer tracking
So from just a quick look we seem to have enough information to
accurately figure out whether a given gem bo is used as a frontbuffer
and where exactly: We have obj->pin_count as a first check with no
false negatives and only negligible false positives. And then we can
just walk the modeset objects and figure out where exactly a buffer is
used as scanout.

Except that we can't due to locking order: If we already hold
dev->struct_mutex we can't acquire any modeset locks, so could
potential chase freed pointers and other evil stuff.

So we need something else. For that introduce a new set of bits
obj->frontbuffer_bits to track where a buffer object is used. That we
can then chase without grabbing any modeset locks.

Of course the consumers of this (DRRS, PSR, FBC, ...) still need to be
able to do their magic both when called from modeset and from gem
code. But that can be easily achieved by adding locks for these
specific subsystems which always nest within either kms or gem
locking.

This patch just adds the relevant update code to all places.

Note that if we ever support multi-planar scanout targets then we need
one frontbuffer tracking bit per attachment point that we expose to
userspace.

v2:
- Fix more oopsen. Oops.
- WARN if we leak obj->frontbuffer_bits when freeing a gem buffer. Fix
  the bugs this brought to light.
- s/update_frontbuffer_bits/update_fb_bits/. More consistent with the
  fb tracking functions (fb for gem object, frontbuffer for raw bits).
  And the function name was way too long.

v3: Size obj->frontbuffer_bits correctly so that all pipes fit in.

v4: Don't update fb bits in set_base on failure. Noticed by Chris.

v5: s/i915_gem_update_fb_bits/i915_gem_track_fb/ Also remove a few
local enum pipe variables which are now no longer needed to make the
function arguments no drop over the 80 char limit.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-19 10:04:41 +02:00
Daniel Vetter 3108e99ea9 drm/i915: Drop schedule_back from psr_exit
It doesn't make sense to never again schedule the work, since by the
time we might want to re-enable psr the world might have changed and
we can do it again.

The only exception is when we shut down the pipe, but that's an
entirely different thing and needs to be handled in psr_disable.

Note that later patch will again split psr_exit into psr_invalidate
and psr_flush. But the split is different and this simplification
helps with the transition.

v2: Improve the commit message a bit.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-19 09:59:19 +02:00
Daniel Vetter f25748ea73 drm/i915: Don't BUG_ON in i915_gem_obj_offset
A WARN_ON is perfectly fine.

The BUG in here seems to be the cause behind hard-hangs when I cat the
i915_gem_pageflip debugfs file (which calls this from an irq
spinlock). But only while running a full igt run after a while. I
still need to root cause the underlying issue.

I'll also start reject patches which add new BUG_ON but don't come
with a really good justification for it. The general rule really
should be to just WARN and hope the driver survives for long enough.

v2: Make the WARN a bit more useful per Chris' suggestion.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-18 00:48:37 +02:00
Ville Syrjälä beff0d0f61 drm/i915: Don't prefault the entire obj if the vma is smaller
Take the minimum of the object size and the vma size and prefault
only that much. Avoids a SIGBUS when mmapping only a portion of the
object.

Prefaulting was introduced here:
 commit b90b91d870
 Author: Chris Wilson <chris@chris-wilson.co.uk>
 Date:   Tue Jun 10 12:14:40 2014 +0100

    drm/i915: Prefault the entire object on first page fault

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Testcase: igt/gem_mmap/short-mmap
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-18 00:48:35 +02:00
Sourab Gupta 84c33a64b4 drm/i915: Replaced Blitter ring based flips with MMIO flips
This patch enables the framework for using MMIO based flip calls,
in contrast with the CS based flip calls which are being used currently.

MMIO based flip calls can be enabled on architectures where
Render and Blitter engines reside in different power wells. The
decision to use MMIO flips can be made based on workloads to give
100% residency for Media power well.

v2: The MMIO flips now use the interrupt driven mechanism for issuing the
flips when target seqno is reached. (Incorporating Ville's idea)

v3: Rebasing on latest code. Code restructuring after incorporating
Damien's comments

v4: Addressing Ville's review comments
    -general cleanup
    -updating only base addr instead of calling update_primary_plane
    -extending patch for gen5+ platforms

v5: Addressed Ville's review comments
    -Making mmio flip vs cs flip selection based on module parameter
    -Adding check for DRIVER_MODESET feature in notify_ring before calling
     notify mmio flip.
    -Other changes mostly in function arguments

v6: -Having a seperate function to check condition for using mmio flips (Ville)
    -propogating error code from i915_gem_check_olr (Ville)

v7: -Adding __must_check with i915_gem_check_olr (Chris)
    -Renaming mmio_flip_data to mmio_flip (Chris)
    -Rebasing on latest nightly

v8: -Rebasing on latest code
    -squash 3rd patch in series(mmio setbase vs page flip race) with this patch
    -Added new tiling mode update in intel_do_mmio_flip (Chris)

v9: -check for obj->last_write_seqno being 0 instead of obj->ring being NULL in
intel_postpone_flip, as this is a more restrictive condition (Chris)

v10: -Applied Chris's suggestions for squashing patches 2,3 into this patch.
These patches make the selection of CS vs MMIO flip at the page flip time, and
make the module parameter for using mmio flips as tristate, the states being
'force CS flips', 'force mmio flips', 'driver discretion'.
Changed the logic for driver discretion (Chris)

v11: Minor code cleanup(better readability, fixing whitespace errors, using
lockdep to check mutex locked status in postpone_flip, removal of __must_check
in function definition) (Chris)

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk> # snb, ivb
[danvet: Fix up parameter alignement checkpatch spotted.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-17 16:16:20 +02:00
Chris Wilson 6254b2042c drm/i915: Simplify i915_gem_release_all_mmaps()
An object can only have an active gtt mapping if it is currently bound
into the global gtt. Therefore we can simply walk the list of all bound
objects and check the flag upon those for an active gtt mapping.

From commit 48018a57a8
Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
Date:   Fri Dec 13 15:22:31 2013 -0200

    drm/i915: release the GTT mmaps when going into D3

Also note that the WARN is inappropriate for this function as GPU
activity is orthogonal to GTT mmap status. Rather it is the caller that
relies upon this condition and so it should assert that the GPU is idle
itself.

References: https://bugs.freedesktop.org/show_bug.cgi?id=80081
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-16 19:52:20 +02:00
Rodrigo Vivi 7c8f8a7007 drm/i915: Force PSR exit by inactivating it.
The perfect solution for psr_exit is the hardware tracking the changes and
doing the psr exit by itself. This scenario works for HSW and BDW with some
environments like Gnome and Wayland.

However there are many other scenarios that this isn't true. Mainly one right
now is KDE users on HSW and BDW with PSR on. User would miss many screen
updates. For instances any key typed could be seen only when mouse cursor is
moved. So this patch introduces the ability of trigger PSR exit on kernel side
on some common cases that.

Most of the cases are coverred by psr_exit at set_domain. The remaining cases
are coverred by triggering it at set_domain, busy_ioctl, sw_finish and
mark_busy.

The downside here might be reducing the residency time on the cases this
already work very wall like Gnome environment. But so far let's get focused
on fixinge issues sio PSR couild be used for everybody and we could even
get it enabled by default. Later we can add some alternatives to choose the
level of PSR efficiency over boot flag of even over crtc property.

v2: remove exit from connector_dpms. Daniel pointed this is the wrong way and
also this isn't needed for BDW and HSW anyway.

Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Vijay Purushothaman <vijay.a.purushothaman@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-13 21:21:36 +02:00
Chris Wilson b90b91d870 drm/i915: Prefault the entire object on first page fault
Inserting additional PTEs has no side-effect for us as the pfn are fixed
for the entire time the object is resident in the global GTT. The
downside is that we pay the entire cost of faulting the object upon the
first hit, for which we in return receive the benefit of removing the
per-page faulting overhead.

On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences,
Upload rate for 2 linear surfaces:	8127MiB/s -> 8134MiB/s
Upload rate for 2 tiled surfaces:	8607MiB/s -> 8625MiB/s
Upload rate for 4 linear surfaces:	8127MiB/s -> 8127MiB/s
Upload rate for 4 tiled surfaces:	8611MiB/s -> 8602MiB/s
Upload rate for 8 linear surfaces:	8114MiB/s -> 8124MiB/s
Upload rate for 8 tiled surfaces:	8601MiB/s -> 8603MiB/s
Upload rate for 16 linear surfaces:	8110MiB/s -> 8123MiB/s
Upload rate for 16 tiled surfaces:	8595MiB/s -> 8606MiB/s
Upload rate for 32 linear surfaces:	8104MiB/s -> 8121MiB/s
Upload rate for 32 tiled surfaces:	8589MiB/s -> 8605MiB/s
Upload rate for 64 linear surfaces:	8107MiB/s -> 8121MiB/s
Upload rate for 64 tiled surfaces:	2013MiB/s -> 3017MiB/s

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Goel, Akash" <akash.goel@intel.com>
Testcasee: igt/gem_fence_upload/performance
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-13 15:17:41 +02:00
Chris Wilson ef0cf27c4d drm/i915: Use the .release hook to drop the stolen drm_mm tracking
Now that we have a release hook into i915_gem_object_free, we can move
the explicit call to the internal stolen function and hook it up
throught the callback instead.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-13 15:17:36 +02:00
David Herrmann f461d1be22 drm/i915: use shmem helpers if possible
Instead of shuffling gfp-masks all the time, use the
shmem_read_mapping_page() helper. Note that __GFP_IO and __GFP_WAIT are
set in mapping_gfp_mask() for i915, so the behavior is still the same.

Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-11 16:57:38 +02:00
Dave Airlie ecb889e620 Merge tag 'drm-intel-fixes-2014-06-06' of git://anongit.freedesktop.org/drm-intel into drm-next
> Bunch of stuff for 3.16 still:
> - Mipi dsi panel support for byt. Finally! From Shobhit&others. I've
>   squeezed this in since it's a regression compared to vbios and we've
>   been ridiculed about it a bit too often ...
> - connection_mutex deadlock fix in get_connector (only affects i915).
> - Core patches from Matt's primary plane from Matt Roper, I've pushed the
>   i915 stuff to 3.17.
> - vlv power well sequencing fixes from Jesse.
> - Fix for cursor size changes from Chris.
> - agpbusy fixes from Ville.
> - A few smaller things.
>

* tag 'drm-intel-fixes-2014-06-06' of git://anongit.freedesktop.org/drm-intel: (32 commits)
  drm/i915: BDW: Adding missing cursor offsets.
  drm: Fix getconnector connection_mutex locking
  drm/i915/bdw: Only use 2g GGTT for 32b platforms
  drm/i915: Nuke pipe A quirk on i830M
  drm/i915: fix display power sw state reporting
  drm/i915: Always apply cursor width changes
  drm/i915: tell the user if both KMS and UMS are disabled
  drm/plane-helper: Add drm_plane_helper_check_update() (v3)
  drm: Check CRTC compatibility in setplane
  drm/i915: use VBT to determine whether to enumerate the VGA port
  drm/i915: Don't WARN about ring idle bit on gen2
  drm/i915: Silence the WARN if the user tries to GTT mmap an incoherent object
  drm/i915: Move the C3 LP write bit setup to gen3_init_clock_gating() for KMS
  drm/i915: Enable interrupt-based AGPBUSY# enable on 85x
  drm/i915: Flip the sense of AGPBUSY_DIS bit
  drm/i915: Set AGPBUSY# bit in init_clock_gating
  drm/i915/vlv: add pll assertion when disabling DPIO common well
  drm/i915/vlv: move DPIO common reset de-assert into __vlv_set_power_well
  drm/i915/vlv: re-order power wells so DPIO common comes after TX
  drm/i915/vlv: move CRI refclk enable into __vlv_set_power_well
  ...
2014-06-06 19:07:09 +10:00
Dave Airlie 8d4ad9d4bb Merge commit '9e9a928eed8796a0a1aaed7e0b676db86ba84594' into drm-next
Merge drm-fixes into drm-next.

Both i915 and radeon need this done for later patches.

Conflicts:
	drivers/gpu/drm/drm_crtc_helper.c
	drivers/gpu/drm/i915/i915_drv.h
	drivers/gpu/drm/i915/i915_gem.c
	drivers/gpu/drm/i915/i915_gem_execbuffer.c
	drivers/gpu/drm/i915/i915_gem_gtt.c
2014-06-05 20:28:59 +10:00
Chris Wilson ddeff6ee42 drm/i915: Silence the WARN if the user tries to GTT mmap an incoherent object
If the user tries to mmap through the GTT an object that is marked as
snooped, we report an error rather than allow the GPU to hang the
machine. The choice of EINVAL, however, was unfortunate as we turn that
into a WARN rather than a quiet SIGBUS.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-05 08:52:41 +02:00
Ville Syrjälä dbb42748ac drm/i915: Move the C3 LP write bit setup to gen3_init_clock_gating() for KMS
Move the MI_ARB_STATE MI_ARB_C3_LP_WRITE_ENABLE setup to
gen3_init_clock_gating() from i915_gem_load() when KMS is enabled. Leave
it in i915_gem_load() for the UMS case, but add an explcit check, just
to make it easier to spot it when we eventually rip out UMS support.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-05 08:52:40 +02:00
Chris Wilson d23db88c3a drm/i915: Prevent negative relocation deltas from wrapping
This is pure evil. Userspace, I'm looking at you SNA, repacks batch
buffers on the fly after generation as they are being passed to the
kernel for execution. These batches also contain self-referenced
relocations as a single buffer encompasses the state commands, kernels,
vertices and sampler. During generation the buffers are placed at known
offsets within the full batch, and then the relocation deltas (as passed
to the kernel) are tweaked as the batch is repacked into a smaller buffer.
This means that userspace is passing negative relocations deltas, which
subsequently wrap to large values if the batch is at a low address. The
GPU hangs when it then tries to use the large value as a base for its
address offsets, rather than wrapping back to the real value (as one
would hope). As the GPU uses positive offsets from the base, we can
treat the relocation address as the minimum address read by the GPU.
For the upper bound, we trust that userspace will not read beyond the
end of the buffer.

So, how do we fix negative relocations from wrapping? We can either
check that every relocation looks valid when we write it, and then
position each object such that we prevent the offset wraparound, or we
just special-case the self-referential behaviour of SNA and force all
batches to be above 256k. Daniel prefers the latter approach.

This fixes a GPU hang when it tries to use an address (relocation +
offset) greater than the GTT size. The issue would occur quite easily
with full-ppgtt as each fd gets its own VM space, so low offsets would
often be handed out. However, with the rearrangement of the low GTT due
to capturing the BIOS framebuffer, it is already affecting kernels 3.15
onwards. I think only IVB+ is susceptible to this bug, but the workaround
should only kick in rarely, so it seems sensible to always apply it.

v3: Use a bias for batch buffers to prevent small negative delta relocations
from wrapping.

v4 from Daniel:
- s/BIAS/BATCH_OFFSET_BIAS/
- Extract eb_vma_misplaced/i915_vma_misplaced since the conditions
  were growing rather cumbersome.
- Add a comment to eb_get_batch explaining why we do this.
- Apply the batch offset bias everywhere but mention that we've only
  observed it on gen7 gpus.
- Drop PIN_OFFSET_FIX for now, that slipped in from a feature patch.

v5: Add static to eb_get_batch, spotted by 0-day tester.

Testcase: igt/gem_bad_reloc
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78533
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-27 11:18:40 +03:00
Chris Wilson 00731155a7 drm/i915: Fix dynamic allocation of physical handles
A single object may be referenced by multiple registers fundamentally
breaking the static allotment of ids in the current design. When the
object is used the second time, the physical address of the first
assignment is relinquished and a second one granted. However, the
hardware is still reading (and possibly writing) to the old physical
address now returned to the system. Eventually hilarity will ensue, but
in the short term, it just means that cursors are broken when using more
than one pipe.

v2: Fix up leak of pci handle when handling an error during attachment,
and avoid a double kmap/kunmap. (Ville)
Rebase against -fixes.

v3: And fix the error handling added in v2 (Ville)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77351
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-27 11:18:39 +03:00
Oscar Mateo 273497e5cd drm/i915: s/i915_hw_context/intel_context
Up until now, contexts had one (and only one) backing object that was
used by the hardware to save/restore render ring contexts (via the
MI_SET_CONTEXT command). Other rings did not have or need this, so
our i915_hw_context struct had a 1:1 relationship with a a real HW
context.

With Logical Ring Contexts and Execlists, this is not possible anymore:
all rings need a backing object, and it cannot be reused. To prepare
for that, rename our contexts to the more generic term intel_context.

No functional changes.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-22 23:41:17 +02:00
Oscar Mateo ee1b1e5ef3 drm/i915: Split the ringbuffers from the rings (2/3)
This refactoring has been performed using the following Coccinelle
semantic script:

    @@
    struct intel_engine_cs r;
    @@
    (
    - (r).obj
    + r.buffer->obj
    |
    - (r).virtual_start
    + r.buffer->virtual_start
    |
    - (r).head
    + r.buffer->head
    |
    - (r).tail
    + r.buffer->tail
    |
    - (r).space
    + r.buffer->space
    |
    - (r).size
    + r.buffer->size
    |
    - (r).effective_size
    + r.buffer->effective_size
    |
    - (r).last_retired_head
    + r.buffer->last_retired_head
    )

    @@
    struct intel_engine_cs *r;
    @@
    (
    - (r)->obj
    + r->buffer->obj
    |
    - (r)->virtual_start
    + r->buffer->virtual_start
    |
    - (r)->head
    + r->buffer->head
    |
    - (r)->tail
    + r->buffer->tail
    |
    - (r)->space
    + r->buffer->space
    |
    - (r)->size
    + r->buffer->size
    |
    - (r)->effective_size
    + r->buffer->effective_size
    |
    - (r)->last_retired_head
    + r->buffer->last_retired_head
    )

    @@
    expression E;
    @@
    (
    - LP_RING(E)->obj
    + LP_RING(E)->buffer->obj
    |
    - LP_RING(E)->virtual_start
    + LP_RING(E)->buffer->virtual_start
    |
    - LP_RING(E)->head
    + LP_RING(E)->buffer->head
    |
    - LP_RING(E)->tail
    + LP_RING(E)->buffer->tail
    |
    - LP_RING(E)->space
    + LP_RING(E)->buffer->space
    |
    - LP_RING(E)->size
    + LP_RING(E)->buffer->size
    |
    - LP_RING(E)->effective_size
    + LP_RING(E)->buffer->effective_size
    |
    - LP_RING(E)->last_retired_head
    + LP_RING(E)->buffer->last_retired_head
    )

Note: On top of this this patch also removes the now unused ringbuffer
fields in intel_engine_cs.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
[danvet: Add note about fixup patch included here.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-22 23:27:25 +02:00
Oscar Mateo a4872ba6d0 drm/i915: s/intel_ring_buffer/intel_engine_cs
In the upcoming patches we plan to break the correlation between
engine command streamers (a.k.a. rings) and ringbuffers, so it
makes sense to refactor the code and make the change obvious.

No functional changes.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-22 23:01:05 +02:00
Chris Wilson 340fbd8ca1 drm/i915: Only discard backing storage on releasing the last ref
Before purging our pages (as opposed to copying back the contents from
the GPU), make sure that there is not an exposed CPU mmapping through
which the user can inspect the results.

Regression from

commit 5537252b6b
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Mar 25 13:23:06 2014 +0000

    drm/i915: Invalidate our pages under memory pressure

Testcase: igt/gem_mmap/new-object
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79005
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Guo Jinxian <jinxianx.guo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-22 15:06:34 +02:00
Chris Wilson 2cfcd32a92 drm/i915: Implement an oom-notifier for last resort shrinking
Before the process killer is invoked, oom-notifiers are executed for one
last try at recovering pages. We can hook into this callback to be sure
that everything that can be is purged from our page lists, and to give a
summary of how much memory is still pinned by the GPU in the case of an
oom. This should be really valuable for debugging OOM issues.

Note that the last-ditch effort call to shrink_all we've previously
called from our normal shrinker when we could free as much as the vm
demaned is moved into the oom notifier. Since the shrinker accounting
races against bind/unbind operations we might have called shrink_all
prematurely, which this approach with an oom notifier avoids.

References: https://bugs.freedesktop.org/show_bug.cgi?id=72742
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: lu hua <huax.lu@intel.com>
[danvet: Bikeshed logical | into || and pimp commit message.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-20 10:57:13 +02:00
Chris Wilson 5537252b6b drm/i915: Invalidate our pages under memory pressure
Try to flush out dirty pages into the swapcache (and from there into the
swapfile) when under memory pressure and forced to drop GEM objects from
memory. In effect, this should just allow us to discard unused pages for
memory reclaim and to start writeback earlier.

v2: Hugh Dickins warned that explicitly starting writeback from
shrink_slab was prone to deadlocks within shmemfs.

Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Robert Beckett <robert.beckett@intel.com>
Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-20 09:51:18 +02:00
Chris Wilson b453c4dbc3 drm/i915: Refactor common lock handling between shrinker count/scan
We can share a few lines of tricky lock handling we need to use for both
shrinker routines and in the process fix the return value for count()
when reporting a deadlock.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Robert Beckett <robert.beckett@intel.com>
Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-20 09:46:52 +02:00
Chris Wilson ceabbba524 drm/i915: Include bound and active pages in the count of shrinkable objects
When the machine is under a lot of memory pressure and being stressed by
multiple GPU threads, we quite often report fewer than shrinker->batch
(i.e. SHRINK_BATCH) pages to be freed. This causes the shrink_control to
skip calling into i915.ko to release pages, despite the GPU holding onto
most of the physical pages in its active lists.

References: https://bugs.freedesktop.org/show_bug.cgi?id=72742
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Robert Beckett <robert.beckett@intel.com>
Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-20 09:46:06 +02:00
Chris Wilson 0820baf39b drm/i915: Translate ENOSPC from shmem_get_page() to ENOMEM
shmemfs first checks if there is enough memory to allocate the page
and reports ENOSPC should there be insufficient, along with
the usual ENOMEM for a genuine allocation failure.

We use ENOSPC in our driver to mean that we have run out of aperture
space and so want to translate the error from shmemfs back to
our usual understanding of ENOMEM. None of the the other GEM users
appear to distinguish between ENOMEM and ENOSPC in their error handling,
hence it is easiest to do the fixup in i915.ko

Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Robert Beckett <robert.beckett@intel.com>
Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-20 09:45:22 +02:00
Chris Wilson 5cc9ed4b9a drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl
By exporting the ability to map user address and inserting PTEs
representing their backing pages into the GTT, we can exploit UMA in order
to utilize normal application data as a texture source or even as a
render target (depending upon the capabilities of the chipset). This has
a number of uses, with zero-copy downloads to the GPU and efficient
readback making the intermixed streaming of CPU and GPU operations
fairly efficient. This ability has many widespread implications from
faster rendering of client-side software rasterisers (chromium),
mitigation of stalls due to read back (firefox) and to faster pipelining
of texture data (such as pixel buffer objects in GL or data blobs in CL).

v2: Compile with CONFIG_MMU_NOTIFIER
v3: We can sleep while performing invalidate-range, which we can utilise
to drop our page references prior to the kernel manipulating the vma
(for either discard or cloning) and so protect normal users.
v4: Only run the invalidate notifier if the range intercepts the bo.
v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers
v6: Recheck after reacquire mutex for lost mmu.
v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary.
v8: Fix rebasing error after forwarding porting the back port.
v9: Limit the userptr to page aligned entries. We now expect userspace
    to handle all the offset-in-page adjustments itself.
v10: Prevent vma from being copied across fork to avoid issues with cow.
v11: Drop vma behaviour changes -- locking is nigh on impossible.
     Use a worker to load user pages to avoid lock inversions.
v12: Use get_task_mm()/mmput() for correct refcounting of mm.
v13: Use a worker to release the mmu_notifier to avoid lock inversion
v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer
     with its own locking and tree of objects for each mm/mmu_notifier.
v15: Prevent overlapping userptr objects, and invalidate all objects
     within the mmu_notifier range
v16: Fix a typo for iterating over multiple objects in the range and
     rearrange error path to destroy the mmu_notifier locklessly.
     Also close a race between invalidate_range and the get_pages_worker.
v17: Close a race between get_pages_worker/invalidate_range and fresh
     allocations of the same userptr range - and notice that
     struct_mutex was presumed to be held when during creation it wasn't.
v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory
     for the struct sg_table and to clear it before reporting an error.
v19: Always error out on read-only userptr requests as we don't have the
     hardware infrastructure to support them at the moment.
v20: Refuse to implement read-only support until we have the required
     infrastructure - but reserve the bit in flags for future use.
v21: use_mm() is not required for get_user_pages(). It is only meant to
     be used to fix up the kernel thread's current->mm for use with
     copy_user().
v22: Use sg_alloc_table_from_pages for that chunky feeling
v23: Export a function for sanity checking dma-buf rather than encode
     userptr details elsewhere, and clean up comments based on
     suggestions by Bradley.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Akash Goel <akash.goel@intel.com>
Cc: "Volkin, Bradley D" <bradley.d.volkin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
[danvet: Frob ioctl allocation to pick the next one - will cause a bit
of fuss with create2 apparently, but such are the rules.]
[danvet2: oops, forgot to git add after manual patch application]
[danvet3: Appease sparse.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-16 19:31:29 +02:00
Oscar Mateo 19656430a8 drm/i915: Gracefully handle obj not bound to GGTT in is_pin_display
Otherwise, we do a NULL pointer dereference.

I've seen this happen while handling an error in
i915_gem_object_pin_to_display_plane():

If i915_gem_object_set_cache_level() fails, we call is_pin_display()
to handle the error. At this point, the object is still not pinned
to GGTT and maybe not even bound, so we have to check before we
dereference its GGTT vma.

The IGT kms_flip/bo-too-big tests for this bug.

v2: Chris Wilson says restoring the old value is easier, but that
is_pin_display is useful as a theory of operation. Take the solomonic
decision: at least this way is_pin_display is a little more robust
(until Chris can kill it off).

v3: Chris suggests the WARN in i915_gem_obj_to_ggtt has outlived its
usefulness: add a reminder to remove it.

Issue: VIZ-3772
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Testcase: igt/kms_flip/bo-too-big
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-16 16:24:39 +02:00
Daniel Vetter 8b1bc9b4f1 drm/i915: Only do gtt cleanup in vma_unbind for the global vma
Otherwise we end up tearing down fences when e.g. the client quits
way too early. Might or might not fix a fence pin_count BUG Ville has
reported.

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-14 18:39:54 +02:00
Daniel Vetter aff10b30a1 drm/i915: Don't drop pinned fences
Userspace can currently provoke this when e.g. trying to use a pinned
scanout as a cursor or overlay target. Later on that might lead to
some fun fence pin count mayhem.

Spurred by Ville's report that something goes wrong here and
originally I've thought that this might slip through the pwrite gtt
fastpath. But that one checks of obj tiling, so should be ok.

But one thing that _does_ blow up is the vma unbinding with more than
one address space. The next patch will fix this.

v2: Use a WARN_ON - Chris pointed out that we already catch all cases
so userspace can't provoke this like I've originally feared.

While reviewing relevant code I've noticed a pile of DRM_ERROR in the
overlay&cursor code which are all triggerable by userspace. Tune them
down while at it.

v3: Split out the DRM_ERROR->DRM_DEBUG_KMS change into a separate patch,
as requested by Chris.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-14 18:39:31 +02:00
Daniel Vetter d8ffa60b52 drm/i915: WARN_ON fence pin leaks
The fence pin count should always be <= the bo pin count. If that's
not the case then we have a funny problem and are leaking references
somewhere.

Which means we can catch fence pin leaks by checking for the same
upper limit as we do for the bo pin count. Inspired by a discussion
with Ville about a fence leak igt testcase.

v2: Also check for fence->pin_count <= ggtt_vma->pin_count, since that
might catch a leak even quicker. Also de-inline them, they're getting
too big.

v3: Don't separately check for MAX_PIN_COUNT since the > vma->pin_count
check will catch that already (Chris).

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-13 17:16:12 +02:00
Chris Wilson 1cf0ba1474 drm/i915: Flush request queue when waiting for ring space
During the review of

commit 1f70999f90
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Jan 27 22:43:07 2014 +0000

    drm/i915: Prevent recursion by retiring requests when the ring is full

Ville raised the point that our interaction with request->tail was
likely to foul up other uses elsewhere (such as hang check comparing
ACTHD against requests).

However, we also need to restore the implicit retire requests that certain
test cases depend upon (e.g. igt/gem_exec_lut_handle), this raises the
spectre that the ppgtt will randomly call i915_gpu_idle() and recurse
back into intel_ring_begin().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78023
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
[danvet: Remove now unused 'tail' variable as spotted by Brad.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-08 01:23:34 +02:00
Ben Widawsky 6e7186af3b drm/i915: Make aliasing a 2nd class VM
There is a good debate to be had about how best to fit the aliasing
PPGTT into the code. However, as it stands right now, getting aliasing
PPGTT bindings is a hack, and done through implicit arguments. To make
this absolutely clear, WARN and return an error if a driver writer tries
to do something they shouldn't.

I have no issue with an eventual revert of this patch. It makes sense
for what we have today.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-07 10:01:41 +02:00
Ben Widawsky ebc348b2ad drm/i915: Move semaphore specific ring members to struct
This will be helpful in abstracting some of the code in preparation for
gen8 semaphores.

v2: Move mbox stuff to a separate struct

v3: Rebased over VCS2 work

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (v1)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 10:56:52 +02:00
Chris Wilson c8725f3dc0 drm/i915: Do not call retire_requests from wait_for_rendering
A common issue we have is that retiring requests causes recursion
through GTT manipulation or page table manipulation which we can only
handle at very specific points. However, to maintain internal
consistency (enforced through our sanity checks on write_domain at
various points in the GEM object lifecycle) we do need to retire the
object prior to marking it with a new write_domain, and also clear the
write_domain for the implicit flush following a batch.

Note that this then allows the unbound objects to still be on the active
lists, and so care must be taken when removing objects from unbound lists
(similar to the caveats we face processing the bound lists).

v2: Fix i915_gem_shrink_all() to handle updated object lifetime rules,
by refactoring it to call into __i915_gem_shrink().

v3: Missed an object-retire prior to changing cache domains in
i915_gem_object_set_cache_leve()

v4: Rebase

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:09:15 +02:00
Imre Deak 981a5aead1 drm/i915: vlv: clean up GTLC wake control/status register macros
These will be needed by the upcoming VLV RPM helpers.

Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:08:50 +02:00
Zhao Yakui 845f74a701 drm/i915:Initialize the second BSD ring on BDW GT3 machine
Based on the hardware spec, the BDW GT3 machine has two independent
BSD ring that can be used to dispatch the video commands.
So just initialize it.

V3->V4: Follow Imre's comment to do some minor updates. For example:
more comments are added to describe the semaphore between ring.

Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
[danvet: Fix up checkpatch error.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:08:46 +02:00
Chris Wilson 6099032045 drm/i915: Allow the module to load even if we fail to setup rings
Even without enabling the ringbuffers to allow command execution, we can
still control the display engines to enable modesetting. So make the
ringbuffer initialization failure soft, and mark the GPU as wedged
instead.

v2: Only treat an EIO from ring initialisation as a soft failure, and
abort module load for any other failure, such as allocation failures.

v3: Add an *ERROR* prior to declaring the GPU wedged so that it stands
out like a sore thumb in the logs

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:08:38 +02:00
Chris Wilson e3efda49e7 drm/i915: Preserve ring buffers objects across resume
Tearing down the ring buffers across resume is overkill, risks
unnecessary failure and increases fragmentation.

After failure, since the device is still active we may end up trying to
write into the dangling iomapping and trigger an oops.

v2: stop_ringbuffers() was meant to call stop(ring) not
cleanup(ring) during resume!

Reported-by: Jae-hyeon Park <jhyeon@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=72351
References: https://bugs.freedesktop.org/show_bug.cgi?id=76554
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Oscar Mateo <oscar.mateo@intel.com>
[danvet: s/ring->obj == NULL/!intel_ring_initialized(ring)/ as
suggested by Oscar.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:08:37 +02:00
Dave Airlie 444c9a08bf Merge branch 'drm-init-cleanup' of git://people.freedesktop.org/~danvet/drm into drm-next
Next pull request, this time more of the drm de-midlayering work. The big
thing is that his patch series here removes everything from drm_bus except
the set_busid callback. Thierry has a few more patches on top of this to
make that one optional to.

With that we can ditch all the non-pci drm_bus implementations, which
Thierry has already done for the fake tegra host1x drm_bus.

Reviewed by Thierry, Laurent and David and now also survived some testing
on my intel boxes to make sure the irq fumble is fixed correctly ;-) The
last minute rebase was just to add the r-b tags from Thierry for the 2
patches I've redone.

* 'drm-init-cleanup' of git://people.freedesktop.org/~danvet/drm:
  drm/<drivers>: don't set driver->dev_priv_size to 0
  drm: Remove dev->kdriver
  drm: remove drm_bus->get_name
  drm: rip out dev->devname
  drm: inline drm_pci_set_unique
  drm: remove bus->get_irq implementations
  drm: pass the irq explicitly to drm_irq_install
  drm/irq: Look up the pci irq directly in the drm_control ioctl
  drm/irq: track the irq installed in drm_irq_install in dev->irq
  drm: rename dev->count_lock to dev->buf_lock
  drm: Rip out totally bogus vga_switcheroo->can_switch locking
  drm: kill drm_bus->bus_type
  drm: remove drm_dev_to_irq from drivers
  drm/irq: remove cargo-culted locking from irq_install/uninstall
  drm/irq: drm_control is a legacy ioctl, so pci devices only
  drm/pci: fold in irq_by_busid support
  drm/irq: simplify irq checks in drm_wait_vblank
2014-05-01 09:32:21 +10:00
Dave Airlie 885ac04ab3 Merge tag 'drm-intel-next-2014-04-16' of git://anongit.freedesktop.org/drm-intel into drm-next
drm-intel-next-2014-04-16:
- vlv infoframe fixes from Jesse
- dsi/mipi fixes from Shobhit
- gen8 pageflip fixes for LRI/SRM from Damien
- cmd parser fixes from Brad Volkin
- some prep patches for CHV, DRRS, ...
- and tons of little things all over
drm-intel-next-2014-04-04:
- cmd parser for gen7 but only in enforcing and not yet granting mode - the
  batch copying stuff is still missing. Also performance is a bit ... rough
  (Brad Volkin + OACONTROL fix from Ken).
- deprecate UMS harder (i.e. CONFIG_BROKEN)
- interrupt rework from Paulo Zanoni
- runtime PM support for bdw and snb, again from Paulo
- a pile of refactorings from various people all over the place to prep for new
  stuff (irq reworks, power domain polish, ...)

drm-intel-next-2014-04-04:
- cmd parser for gen7 but only in enforcing and not yet granting mode - the
  batch copying stuff is still missing. Also performance is a bit ... rough
  (Brad Volkin + OACONTROL fix from Ken).
- deprecate UMS harder (i.e. CONFIG_BROKEN)
- interrupt rework from Paulo Zanoni
- runtime PM support for bdw and snb, again from Paulo
- a pile of refactorings from various people all over the place to prep for new
  stuff (irq reworks, power domain polish, ...)

Conflicts:
	drivers/gpu/drm/i915/i915_gem_context.c
2014-05-01 09:11:37 +10:00
Daniel Vetter bb0f1b5c16 drm: pass the irq explicitly to drm_irq_install
Unfortunately this requires a drm-wide change, and I didn't see a sane
way around that. Luckily it's fairly simple, we just need to inline
the respective get_irq implementation from either drm_pci.c or
drm_platform.c.

With that we can now also remove drm_dev_to_irq from drm_irq.c.

Reviewed-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-04-23 10:32:50 +02:00
Daniel Vetter e090c53b21 drm/irq: remove cargo-culted locking from irq_install/uninstall
The dev->struct_mutex locking in drm_irq.c only protects
dev->irq_enabled. Which isn't really much at all and only prevents
especially nasty ums userspace from concurrently installing the
interrupt handling a few times. Or at least trying.

There are tons of unlocked readers of dev->irqs_enabled in the vblank
wait code (and by extension also in the pageflip code since that uses
the same vblank timestamp engine).

Real modesetting drivers should ensure that nothing can go haywire
with a sane setup teardown sequence. So we only really need this for
the drm_control ioctl, everywhere else this will just paper over
nastiness.

Note that drm/i915 is a bit specially due to the gem+ums combination.
So there we also need to properly protect the entervt and leavevt
ioctls. But it's definitely saner to do everything in one go than to
drop the lock in-between.

Finally there's the gpu reset code in drm/i915. That one's just race
(concurrent userspace calls to for vblank waits of pageflips could
spuriously fail). So wrap it up in with a nice comment since fixing
this is more involved.

v2: Rebase and fix commit message (Thierry)

Reviewed-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-04-22 11:41:12 +02:00
Chris Wilson 691e6415c8 drm/i915: Always use kref tracking for all contexts.
If we always initialize kref for the context, even if we are using fake
contexts for hangstats when there is no hw support, we can forgo the
dance to dereference the ctx->obj and inspect whether we are permitted
to use kref inside i915_gem_context_reference() and _unreference().

My ulterior motive here is to improve the debugging of a use-after-free
of ctx->obj. This patch avoids the dereference here and instead forces
the assertion checks associated with kref.

v2: Refactor the fake contexts to being even more like the real
contexts, so that there is much less duplicated and special case code.

v3: Tweaks.
v4: Tweaks, minor.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76671
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: lu hua <huax.lu@intel.com>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
[Jani: tiny change to backport to drm-intel-fixes.]
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2014-04-11 13:29:51 +03:00
Mika Kuoppala 88b4aa8770 drm/i915: add flags to i915_ring_stop
Piglit runner and QA are both looking at the dmesg for
DRM_ERRORs with test cases. Add a flag to control those
when we they are expected from related test cases.

Also add flag to control if contexts should be banned
that introduced the hang. Hangcheck is timer based and
preventing bans by adding sleeps to testcases makes
testing slower.

v2: intel_ring_stopped(), readable comment (Chris)
v3: keep compatibility (Daniel)

References: https://bugs.freedesktop.org/show_bug.cgi?id=75876
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-04-09 15:07:42 +02:00
Dave Airlie 9f97ba806a Merge tag 'drm-intel-fixes-2014-04-04' of git://anongit.freedesktop.org/drm-intel into drm-next
Merge window -fixes pull request as usual. Well, I did sneak in Jani's
drm_i915_private_t typedef removal, need to have fun with a big sed job
too ;-)

Otherwise:
- hdmi interlaced fixes (Jesse&Ville)
- pipe error/underrun/crc tracking fixes, regression in late 3.14-rc (but
  not cc: stable since only really relevant for igt runs)
- large cursor wm fixes (Chris)
- fix gpu turbo boost/throttle again, was getting stuck due to vlv rps
  patches (Chris+Imre)
- fix runtime pm fallout (Paulo)
- bios framebuffer inherit fix (Chris)
- a few smaller things

* tag 'drm-intel-fixes-2014-04-04' of git://anongit.freedesktop.org/drm-intel: (196 commits)
  Skip intel_crt_init for Dell XPS 8700
  drm/i915: vlv: fix RPS interrupt mask setting
  Revert "drm/i915/vlv: fixup DDR freq detection per Punit spec"
  drm/i915: move power domain init earlier during system resume
  drm/i915: Fix the computation of required fb size for pipe
  drm/i915: don't get/put runtime PM at the debugfs forcewake file
  drm/i915: fix WARNs when reading DDI state while suspended
  drm/i915: don't read cursor registers on powered down pipes
  drm/i915: get runtime PM at i915_display_info
  drm/i915: don't read pp_ctrl_reg if we're suspended
  drm/i915: get runtime PM at i915_reg_read_ioctl
  drm/i915: don't schedule force_wake_timer at gen6_read
  drm/i915: vlv: reserve the GT power context only once during driver init
  drm/i915: prefer struct drm_i915_private to drm_i915_private_t
  drm/i915/overlay: prefer struct drm_i915_private to drm_i915_private_t
  drm/i915/ringbuffer: prefer struct drm_i915_private to drm_i915_private_t
  drm/i915/display: prefer struct drm_i915_private to drm_i915_private_t
  drm/i915/irq: prefer struct drm_i915_private to drm_i915_private_t
  drm/i915/gem: prefer struct drm_i915_private to drm_i915_private_t
  drm/i915/dma: prefer struct drm_i915_private to drm_i915_private_t
  ...
2014-04-05 16:14:21 +10:00
Lauri Kasanen 62347f9e0f drm: Add support for two-ended allocation, v3
Clients like i915 need to segregate cache domains within the GTT which
can lead to small amounts of fragmentation. By allocating the uncached
buffers from the bottom and the cacheable buffers from the top, we can
reduce the amount of wasted space and also optimize allocation of the
mappable portion of the GTT to only those buffers that require CPU
access through the GTT.

For other drivers, allocating small bos from one end and large ones
from the other helps improve the quality of fragmentation.

Based on drm_mm work by Chris Wilson.

v3: Changed to use a TTM placement flag
v2: Updated kerneldoc

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Christian König <deathsimple@vodafone.de>
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Signed-off-by: David Airlie <airlied@redhat.com>
2014-04-04 09:28:14 +10:00
Jani Nikula 3e31c6c017 drm/i915/gem: prefer struct drm_i915_private to drm_i915_private_t
No functional changes.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-31 15:32:38 +02:00
Chris Wilson df6f783a4e drm/i915: Fix unsafe loop iteration over vma whilst unbinding them
On non-LLC platforms, when changing the cache level of an object, we may
need to unbind it so that prefetching across page boundaries does not
cross into a different memory domain. This requires us to unbind
conflicting vma, but we did so iterating over the objects vma in an
unsafe manner (as the list was being modified as we iterated).

The regression was introduced in
commit 3089c6f239
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Wed Jul 31 17:00:03 2013 -0700

    drm/i915: make caching operate on all address spaces
apparently as far back as v3.12-rc1, but it has only just begun to
trigger real world bug reports.

Reported-and-tested-by: Nikolay Martynov <mar.kolya@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76384
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-21 16:13:08 +01:00
Paulo Zanoni 5d584b2eca drm/i915: move pc8.irqs_disabled to pm.irqs_disabled
When other platforms add runtime PM support they will also need to
disable interrupts, so move the variable to the runtime PM struct.

Also notice that the longer-term goal is to completely kill the
regsave struct, and I even have patches for that.

v2: - Rebase.
v3: - Rebase.

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-19 16:39:46 +01:00
Daniel Vetter b80d6c781e Merge branch 'topic/dp-aux-rework' into drm-intel-next-queued
Conflicts:
	drivers/gpu/drm/i915/intel_dp.c

A bit a mess with reverts which differe in details between -fixes and
-next and some other unrelated shuffling.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-19 15:54:37 +01:00
Dave Airlie d1583c9997 Merge branch 'drm-next' of git://people.freedesktop.org/~dvdhrm/linux into drm-next
This is the 3rd respin of the drm-anon patches. They allow module unloading, use
the pin_fs_* helpers recommended by Al and are rebased on top of drm-next. Note
that there are minor conflicts with the "drm-minor" branch.

* 'drm-next' of git://people.freedesktop.org/~dvdhrm/linux:
  drm: init TTM dev_mapping in ttm_bo_device_init()
  drm: use anon-inode instead of relying on cdevs
  drm: add pseudo filesystem for shared inodes
2014-03-18 19:17:02 +10:00
David Herrmann 6796cb16c0 drm: use anon-inode instead of relying on cdevs
DRM drivers share a common address_space across all character-devices of a
single DRM device. This allows simple buffer eviction and mapping-control.
However, DRM core currently waits for the first ->open() on any char-dev
to mark the underlying inode as backing inode of the device. This delayed
initialization causes ugly conditions all over the place:
  if (dev->dev_mapping)
    do_sth();

To avoid delayed initialization and to stop reusing the inode of the
char-dev, we allocate an anonymous inode for each DRM device and reset
filp->f_mapping to it on ->open().

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
2014-03-16 12:23:33 +01:00
Ville Syrjälä 3ddffb7b8a drm/i915: Unbind all vmas whose new cache_level doesn't agree with the neighbours
When we change the cache_level for an object we need to make sure
we don't put differing types of snoopable memory too close to each
other on non-LLC machines.

Currently i915_gem_object_set_cache_level() will stop looking when
it finds just one vma that has such a conflict. Drop the bogus break
statement to make sure it will unbind all vmas which need to be moved
around to avoid the conflict.

I suppose this is a theoretical issue as currently we don't enable
ppgtt on non-LLC machines, so each object can only have one vma.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-13 12:22:44 +01:00
Chris Wilson c2831a94b5 drm/i915: Do not force non-caching copies for pwrite along shmem path
We don't always want to write into main memory with pwrite. The shmem
fast path in particular is used for memory that is cacheable - under
such circumstances forcing the cache eviction is undesirable. As we will
always flush the cache when targeting incoherent buffers, we can rely on
that second pass to apply the cache coherency rules and so benefit from
in-cache copies otherwise.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-08 00:03:26 +01:00
Chris Wilson 17793c9a46 drm/i915: Process page flags once rather than per pwrite/pread
We used to lock individual pages inside the buffer object and so needed
to update the page flags every time. However, we now pin the pages into
the object for the duration of the pwrite/pread (and hopefully much
longer) and so we can forgo the flag updates until we release all the
pages.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-08 00:03:01 +01:00
Brad Volkin 4c914c0c7c drm/i915: Refactor shmem pread setup
The command parser is going to need the same synchronization and
setup logic, so factor it out for reuse.

v2: Add a check that the object is backed by shmem

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-07 22:36:59 +01:00
Damien Lespiau cb216aa844 drm/i915: Make i915_gem_retire_requests_ring() static
Its last usage outside of i915_gem.c was removed in:

  commit 1f70999f90
  Author: Chris Wilson <chris@chris-wilson.co.uk>
  Date:   Mon Jan 27 22:43:07 2014 +0000

     drm/i915: Prevent recursion by retiring requests when the ring is full

Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-05 21:30:39 +01:00
Chris Wilson ab0e7ff9f2 drm/i915: Record pid/comm of hanging task
After finding the guilty batch and request, we can use it to find the
process that submitted the batch and then add the culprit into the error
state.

This is a slightly different approach from Ben's in that instead of
adding the extra information into the struct i915_hw_context, we use the
information already captured in struct drm_file which is then referenced
from the request.

v2: Also capture the workaround buffer for gen2, so that we can compare
    its contents against the intended batch for the active request.

v3: Rebase (Mika)
v4: Check for null context (Chris)
    checkpatch warnings fixed

Link: http://lists.freedesktop.org/archives/intel-gfx/2013-August/032280.html
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> (v4)
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Cc: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-05 21:30:24 +01:00
Chris Wilson 8d9fc7fd2d drm/i915: Rely on accurate request tracking for finding hung batches
In the past, it was possible to have multiple batches per request due to
a stray signal or ENOMEM. As a result we had to scan each active object
(filtered by those having the COMMAND domain) for the one that contained
the ACTHD pointer. This was then made more complicated by the
introduction of ppgtt, whereby ACTHD then pointed into the address space
of the context and so also needed to be taken into account.

This is a fairly robust approach (though the implementation is a little
fragile and depends upon the per-generation setup, registers and
parameters). However, due to the requirements for hangstats, we needed a
robust method for associating batches with a particular request and
having that we can rely upon it for finding the associated batch object
for error capture.

If the batch buffer tracking is not robust enough, that should become
apparent quite quickly through an erroneous error capture. That should
also help to make sure that the runtime reporting to userspace is
robust. It also means that we then report the oldest incomplete batch on
each ring, which can be useful for determining the state of userspace at
the time of a hang.

v2: Use i915_gem_find_active_request (Mika)

v3: remove check for ring->get_seqno, split long lines (Ben)

v4: check that context is available (Chris)
    checkpatch warnings fixed

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> (v3)
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net> (v3)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-05 21:30:24 +01:00
Chris Wilson 64bf930379 drm/i915: Reset vma->mm_list after unbinding
In place of true activity counting, we walk the list of vma associated
with an object managing each on the vm's active/inactive list everytime
we call move-to-inactive. This depends upon the vma->mm_list being
cleared after unbinding, or else we run into difficulty when tracking
the object in multiple vm's - we see a use-after free and corruption of
the mm_list.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-05 21:30:23 +01:00
Ville Syrjälä ccc7bed05e drm/i915: Don't ban default context when stop_rings!=0
If we've explicitly stopped the rings for testing purposes, don't ban
the default context. Fixes kms_flip hang tests.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-05 21:30:14 +01:00
Chris Wilson f62a007603 drm/i915: Accurately track when we mark the hardware as idle/busy
We currently call intel_mark_idle() too often, as we do so as a
side-effect of processing the request queue. However, we the calls to
intel_mark_idle() are expected to be paired with a call to
intel_mark_busy() (or else we try to idle the hardware by accessing
registers that are already disabled). Make the idle/busy tracking
explicit to prevent the multiple calls.

v2: We can drop some of the complexity in __i915_add_request() as
queue_delayed_work() already behaves as we want (not requeuing the item
if it is already in the queue) and mark_busy/mark_idle imply that the
idle task is inactive.

v3: We do still need to cancel the pending idle task so that it is sent
again after the current busy load completes (not in the middle of it).

Reported-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-05 21:30:10 +01:00
Daniel Vetter 8ea99c9287 drm/i915: Only bind each object rather than for every execbuffer
One side-effect of the introduction of ppgtt was that we needed to
rebind the object into the appropriate vm (and global gtt in some
peculiar cases). For simplicity this was done twice for every object on
every call to execbuffer. However, that adds a tremendous amount of CPU
overhead (rewriting all the PTE for all objects into WC memory) per
draw. The fix is to push all the decision about which vm to bind into
and when down into the low-level bind routines through hints rather than
performing the bind unconditionally in the execbuffer routine.

Note that this is a regression introduced in the full ppgtt feature
branch, before this we've only done re-bound objects when the relevant
has_(aliasing_ppgtt|global_gtt)_mapping flag was clear. But since
that's per-object and not per-vma that optimization broke.

v2: Split out prep work and unrelated changes.

v3: Bring back functional change around PIN_GLOBAL that I've
accidentally split out.

v4: Remove the temporary hack for the old binding logic to avoid
bisection issues.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72906
Tested-by: jianx.zhou@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-14 14:18:38 +01:00
Daniel Vetter 262de14531 drm/i915: Directly return the vma from bind_to_vm
This is prep work for reworking the object_pin logic. Atm
it still does a (now redundant) lookup of the vma. The next
patch will fix this.

Split out from Chris vma-bind rework.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-14 14:18:30 +01:00
Daniel Vetter b287110e89 drm/i915: Simplify i915_gem_object_ggtt_unpin
Split out from Chris vma-bind rework.

Jani wondered why this is save, and the reason is that i915_vma_unbind
does all these checks, too. So they're redundant.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-14 14:18:21 +01:00
Daniel Vetter bf3d149b25 drm/i915: split PIN_GLOBAL out from PIN_MAPPABLE
With abitrary pin flags it makes sense to split out a "please bind
this into global gtt" from the "please allocate in the mappable
range".

Use this unconditionally in our global gtt pin helper since this is
what its callers want. Later patches will drop PIN_MAPPABLE where it's
not strictly needed.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-14 14:17:27 +01:00
Daniel Vetter 1ec9e26dda drm/i915: Consolidate binding parameters into flags
Anything more than just one bool parameter is just a pain to read,
symbolic constants are much better.

Split out from Chris' vma-binding rework patch.

v2: Undo the behaviour change in object_pin that Chris spotted.

v3: Split out misplaced hunk to handle set_cache_level errors,
spotted by Jani.

v4: Keep the current over-zealous binding logic in the execbuffer code
working with a quick hack while the overall binding code gets shuffled
around.

v5: Reorder the PIN_ flags for more natural patch splitup.

v6: Pull out the PIN_GLOBAL split-up again.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-14 14:16:58 +01:00
Chris Wilson 6e4930f6ee drm/i915: Flush GPU rendering with a lockless wait during a pagefault
Arjan van de Ven reported that on his test machine that he was seeing
stalls of greater than 1 frame greatly impacting the user experience. He
tracked this down to being the locked flush during a pagefault as being
the culprit hogging the struct_mutex and so blocking any other user from
proceeding. Stalling on a pagefault is bad behaviour on userspace's
part, for one it means that they are ignoring the coherency rules on
pointer access through the GTT, but fortunately we can apply the same
trick as the set-to-domain ioctl to do a lightweight, nonblocking flush
of outstanding rendering first.

"Prior to the patch it looks like this
(this one testrun does not show the 20ms+ I've seen occasionally)

  4.99 ms     2.36 ms    31360  __wait_seqno i915_wait_seqno i915_gem_object_wait_rendering i915_gem_object_set_to_gtt_domain i915_gem_fault __do_fault handle_
+pte_fault handle_mm_fault __do_page_fault do_page_fault page_fault
   4.99 ms     2.75 ms   107751  __wait_seqno i915_gem_wait_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
   4.99 ms     1.63 ms     1666  i915_mutex_lock_interruptible i915_gem_fault __do_fault handle_pte_fault handle_mm_fault __do_page_fault do_page_fault page_fa
+ult
   4.93 ms     2.45 ms      980  i915_mutex_lock_interruptible intel_crtc_page_flip drm_mode_page_flip_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_
+sysret
   4.89 ms     2.20 ms     3283  i915_mutex_lock_interruptible i915_gem_wait_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
   4.34 ms     1.66 ms     1715  i915_mutex_lock_interruptible i915_gem_pwrite_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
   3.73 ms     3.73 ms       49  i915_mutex_lock_interruptible i915_gem_set_domain_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
   3.17 ms     0.33 ms      931  i915_mutex_lock_interruptible i915_gem_madvise_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
   2.97 ms     0.43 ms     1029  i915_mutex_lock_interruptible i915_gem_busy_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
   2.55 ms     0.51 ms      735  i915_gem_get_tiling drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret

After the patch it looks like this:

   4.99 ms     2.14 ms    22212  __wait_seqno i915_gem_wait_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
   4.86 ms     0.99 ms    14170  __wait_seqno i915_gem_object_wait_rendering__nonblocking i915_gem_fault __do_fault handle_pte_fault handle_mm_fault __do_page_
+fault do_page_fault page_fault
   3.59 ms     1.31 ms      325  i915_gem_get_tiling drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
   3.37 ms     3.37 ms       65  i915_mutex_lock_interruptible i915_gem_wait_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
   2.58 ms     2.58 ms       65  i915_mutex_lock_interruptible i915_gem_do_execbuffer.isra.23 i915_gem_execbuffer2 drm_ioctl i915_compat_ioctl compat_sys_ioctl
+ia32_sysret
   2.19 ms     2.19 ms       65  i915_mutex_lock_interruptible intel_crtc_page_flip drm_mode_page_flip_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_
+sysret
   2.18 ms     2.18 ms       65  i915_mutex_lock_interruptible i915_gem_busy_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
   1.66 ms     1.66 ms       65  i915_gem_set_tiling drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret

It may not look like it, but this is quite a large difference, and I've
been unable to reproduce > 5 msec delays at all, while before they do
happen (just not in the trace above)."

gem_gtt_hog on an old Pineview (GMA3150),
before: 4969.119ms
after:  4122.749ms

Reported-by: Arjan van de Ven <arjan.van.de.ven@intel.com>
Testcase: igt/gem_gtt_hog
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-12 18:52:59 +01:00
Chris Wilson bd9b6a4ec5 drm/i915: Downgrade *ERROR* message for invalid user input
When we detect that the user passed along an invalid handle or object,
we emit a warning as an aide for debugging. Since these are indeed only
for debugging user triggerable errors (and the errors are reported back
to userspace by the errno), the messages should only be at the debug
level and not claiming that there is a catastrophic error in the
driver/hardware.

References: https://bugs.freedesktop.org/show_bug.cgi?id=74704
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-12 18:52:54 +01:00
Damien Lespiau 3d13ef2e2d drm/i915: Always use INTEL_INFO() to access the device_info structure
If we make sure that all the dev_priv->info usages are wrapped by
INTEL_INFO(), we can easily modify the ->info field to be structure and
not a pointer while keeping the const protection in the INTEL_INFO()
macro.

v2: Rebased onto latest drm-nightly

Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-12 18:52:50 +01:00
Chris Wilson 8c99e57d39 drm/i915: Treat using a purged buffer as a source of EFAULT
Since a purged buffer is one without any associated pages, attempting to
use it should generate EFAULT rather than EINVAL, as it is not strictly
an invalid parameter.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-04 17:03:32 +01:00
Chris Wilson 45d678173a drm/i915: Convert EFAULT into a silent SIGBUS
EFAULT will be a possible return code where backing storage is
transient, such after it is purged by madvise. As such it is to be
expected and so should not trigger a WARN inside i915_gem_fault() but be
converted silently to SIGBUS.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-04 17:03:27 +01:00
Mika Kuoppala e38486943e drm/i915: release mutex in i915_gem_init()'s error path
Found with smatch.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-04 12:10:45 +01:00
Mika Kuoppala 939fd76208 drm/i915: Get rid of acthd based guilty batch search
As we seek the guilty batch using request and hangcheck
score, this code is not needed anymore.

v2: Rebase. Passing dev_priv instead of getting it from last_ring

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-04 11:57:29 +01:00
Mika Kuoppala b6b0fac04d drm/i915: Use hangcheck score to find guilty context
With full ppgtt using acthd is not enough to find guilty
batch buffer. We get multiple false positives as acthd is
per vm.

Instead of scanning which vm was running on a ring,
to find corressponding context, use a different, simpler,
strategy of finding batches that caused gpu hang:

If hangcheck has declared ring to be hung, find first non complete
request on that ring and claim it was guilty.

v2: Rebase

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73652
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-04 11:57:24 +01:00
Mika Kuoppala 3fac8978f5 drm/i915: Tune down debug output when context is banned
If we have stopped rings then we know that test is running
so no need for spam. In addition, only spam when default
context gets banned.

v2: - make sure default context ban gets shown (Chris)
    - use helper for checking for default context, everywhere (Chris)

v3: - dont be quiet when debug is set (Ben, Daniel)

Reference: https://bugs.freedesktop.org/show_bug.cgi?id=73652
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-01-30 17:25:38 +01:00
Mika Kuoppala 44e2c0705a drm/i915: Use i915_hw_context to set reset stats
With full ppgtt support drm_i915_file_private gained knowledge
about the default context. Also reset stats are now inside
i915_hw_context so we can use proper abstraction.

v2: Move BUG_ON and WARN_ON to more proper locations (Ben)

v3: Pass dev directly to i915_context_is_banned to avoid the need to
dereference ctx->last_ring. Spotted by Mika when checking my
s/BUG/WARN/ change, I've missed this ->last_ring dereference.

Suggested-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> (v2)
Reviewed-by: Ben Widawsky <ben@bwidawsk.net> (v2)
[danvet: s/BUG/WARN/]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-01-30 17:24:36 +01:00
Daniel Vetter 6ba844b090 drm/i915: GEN7_MSG_CONTROL is ivb-only
At least I couldn't find it in the Haswell Bspec any more and we've
tried to test-boot a Haswell machine with num_pipes forced to 0 (i.e.
hit the PCH_NOP path) and the unclaimed register logic complained.

So restrict this dance to just ivb platforms.

v2: Art pointed out that the bits simply moved on hsw+

v3: Buy code terseneness with a notch of sublety as suggested by
Chris.

v4: Frob the right bit, spotted by Art.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Arthur Ranyan <arthur.j.runyan@intel.com>
Cc: Dave Airlie <airlied@gmail.com>
Reviewed-by: Art Runyan <arthur.j.runyan@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-01-27 17:16:47 +01:00
Jani Nikula d330a9530c drm/i915: move module parameters into a struct, in a new file
With 20+ module parameters, I think referring to them via a struct
improves clarity over just having a bunch of globals. While at it, move
the parameter initialization and definitions into a new file
i915_params.c to reduce clutter in i915_drv.c.

Apart from the ill-named i915_enable_rc6, i915_enable_fbc and
i915_enable_ppgtt parameters, for which we lose the "i915_" prefix
internally, the module parameters now look the same both on the kernel
command line and in code. For example, "i915.modeset".

The downsides of the change are losing static on a couple of variables
and not having the initialization and module_param_named() right next to
each other. On the other hand, all module parameters are now defined in
one place at i915_params.c. Plus you can do this to find all module
parameter references:

$ git grep "i915\." -- drivers/gpu/drm/i915

v2:
- move the definitions into a new file
- s/i915_params/i915/
- make i915_try_reset i915.reset, for consistency

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-01-27 17:16:45 +01:00
Daniel Vetter 0e5539b923 Merge branch 'topic/ppgtt' into drm-intel-next-queued
Because whatever.*

* This should contain a fairly long list of issues and still
unresolved resgressions, but I didn't really get a vote.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-01-25 21:14:57 +01:00
Chris Wilson f72d21eddf drm/i915: Place the Global GTT VM first in the list of VM
This is useful for debugging as we then know that the first entry is
always the global GTT, and all later entries the per-process GTT VM.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-01-25 20:07:15 +01:00
Chris Wilson 5dce5b9387 drm/i915: Wait for completion of pending flips when starved of fences
On older generations (gen2, gen3) the GPU requires fences for many
operations, such as blits. The display hardware also requires fences for
scanouts and this leads to a situation where an arbitrary number of
fences may be pinned by old scanouts following a pageflip but before we
have executed the unpin workqueue. This is unpredictable by userspace
and leads to random EDEADLK when submitting an otherwise benign
execbuffer. However, we can detect when we have an outstanding flip and
so cause userspace to wait upon their completion before finally
declaring that the system is starved of fences. This is really no worse
than forcing the GPU to stall waiting for older execbuffer to retire and
release their fences before we can reallocate them for the next
execbuffer.

v2: move the test for a pending fb unpin to a common routine for
later reuse during eviction

Reported-and-tested-by: dimon@gmx.net
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73696
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-01-22 10:34:40 +01:00
Ben Widawsky 1d62beeaeb drm/i915/ppgtt: Defer request freeing on reset
We need to defer the free request until the object/vma is capable of
being freed - or else we have a problem  when we try to destroy the
context.

The exact same issue is described and fixed here:
commit e20780439b
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Fri Dec 6 14:11:22 2013 -0800

    drm/i915: Defer request freeing

I had this fix previously, but decided not to keep it for some reason I
can no longer remember.

gem_reset_stats is a really good test at hitting the problem.

For the inquisitive:
[  170.516392] ------------[ cut here ]------------
[  170.517227] WARNING: CPU: 1 PID: 105 at drivers/gpu/drm/drm_mm.c:578 drm_mm_takedown+0x2e/0x30 [drm]()
[  170.518064] Memory manager not clean during takedown.
[  170.518941] CPU: 1 PID: 105 Comm: kworker/1:1 Not tainted 3.13.0-rc4-BEN+ #28
[  170.519787] Hardware name: Hewlett-Packard HP EliteBook 8470p/179B, BIOS 68ICF Ver. F.02 04/27/2012
[  170.520662] Call Trace:
[  170.521517]  [<ffffffff814f0589>] dump_stack+0x4e/0x7a
[  170.522373]  [<ffffffff81049e6d>] warn_slowpath_common+0x7d/0xa0
[  170.523227]  [<ffffffff81049edc>] warn_slowpath_fmt+0x4c/0x50
[  170.524079]  [<ffffffffa06c414e>] drm_mm_takedown+0x2e/0x30 [drm]
[  170.524934]  [<ffffffffa07213f3>] gen6_ppgtt_cleanup+0x23/0x110
[i915]
[  170.525777]  [<ffffffffa07837ed>] ppgtt_release.part.5+0x24/0x29
[i915]
[  170.526603]  [<ffffffffa071aaa5>] i915_gem_context_free+0x195/0x1a0
[i915]
[  170.527423]  [<ffffffffa071189d>] i915_gem_free_request+0x9d/0xb0
[i915]
[  170.528247]  [<ffffffffa0718af9>] i915_gem_reset+0x1f9/0x3f0 [i915]
[  170.529065]  [<ffffffffa0700cce>] i915_reset+0x4e/0x180 [i915]
[  170.529870]  [<ffffffffa070829d>] i915_error_work_func+0xcd/0x120
[i915]
[  170.530666]  [<ffffffff8106c13a>] process_one_work+0x1fa/0x6d0
[  170.531453]  [<ffffffff8106c0d8>] ? process_one_work+0x198/0x6d0
[  170.532230]  [<ffffffff8106c72b>] worker_thread+0x11b/0x3a0
[  170.532996]  [<ffffffff8106c610>] ? process_one_work+0x6d0/0x6d0
[  170.533771]  [<ffffffff810743ef>] kthread+0xff/0x120
[  170.534548]  [<ffffffff810742f0>] ? insert_kthread_work+0x80/0x80
[  170.535322]  [<ffffffff814f97ac>] ret_from_fork+0x7c/0xb0
[  170.536089]  [<ffffffff810742f0>] ? insert_kthread_work+0x80/0x80
[  170.536847] ---[ end trace 3d4c12892e42d58f ]---

v2: Whitespace fix. (Chris)

Note: This is a bug that only hits the ppgtt topic branch but I've
figured that doing the request cleanup in this order is generally the
right thing to do.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Add a code comment to clarify what's actually going on since
the lifetime rules aroung ppgtt cleanup are ... fuzzy a best atm. Also
add a note about why we need this.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-01-22 10:34:37 +01:00
Daniel Vetter 8664850087 drm/i915: Tune down reset_stat output from ERROR to debug
This is user-triggerable and hence we should not allow it to spam
dmesg. Also, it upsets the nice dmesg tracking piglit does.

Note that this is just extra debugging information, mostly
unwanted, in case of a hang and that there is a separate message to the
user giving instructions on how to report a bug for a GPU hang.

v2: Add note as suggests in Chris' reply.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72740
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-01-22 10:34:35 +01:00
Daniel Vetter 0d9d349d87 Merge commit origin/master into drm-intel-next
Conflicts are getting out of hand, and now we have to shuffle even
more in -next which was also shuffled in -fixes (the call for
drm_mode_config_reset needs to move yet again).

So do a proper backmerge. I wanted to wait with this for the 3.13
relaese, but alas let's just do this now.

Conflicts:
	drivers/gpu/drm/i915/i915_reg.h
	drivers/gpu/drm/i915/intel_ddi.c
	drivers/gpu/drm/i915/intel_display.c
	drivers/gpu/drm/i915/intel_pm.c

Besides the conflict around the forcewake get/put (where we chaged the
called function in -fixes and added a new parameter in -next) code all
the current conflicts are of the adjacent lines changed type.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-01-16 22:06:30 +01:00
Chris Wilson e910303802 drm/i915: Free requests after object release when retiring requests
Freeing a request triggers the destruction of the context. This needs to
occur after all objects are themselves unbound from the context, and so
the free request needs to occur after the object release during retire.

This tidies up

commit e20780439b
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Fri Dec 6 14:11:22 2013 -0800

    drm/i915: Defer request freeing

by simply swapping the order of operations rather than introducing
further complexity - as noted during review.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-01-10 08:21:52 +01:00
Daniel Vetter bfca05275a Revert "drm/i915: Do not allow buffers at offset 0"
This reverts commit 4fe9adbc36.

The patch completely lacks a detailed explanation of what exactly
blows up and how, so is insufficiently justified as a band-aid.

Otoh the justification as a safety measure against userspace botching
up relocations is also fairly weak: If we want real project we need to
at least make the gab big enough that the gpu doesn't scribble over
more important stuff. With 4k screens that would be 32MB.

Also I think this would be much better in conjunction with a (debug)
switch to disable our use of the scratch page.

Hence revert this.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 17:50:39 +01:00
Daniel Vetter 02f6bcccf7 drm/i915: Reject the pin ioctl on gen6+
Especially with ppgtt this kinda stopped making sense. And if we
indeed need this to hack around an issue, we need something that also
works for non-root.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 16:30:22 +01:00
Ben Widawsky 7e0d96bc03 drm/i915: Use multiple VMs -- the point of no return
As with processes which run on the CPU, the goal of multiple VMs is to
provide process isolation. Specific to GEN, there is also the ability to
map more objects per process (2GB each instead of 2Gb-2k total).

For the most part, all the pipes have been laid, and all we need to do
is remove asserts and actually start changing address spaces with the
context switch. Since prior to this we've converted the setting of the
page tables to a streamed version, this is quite easy.

One important thing to point out (since it'd been hotly contested) is
that with this patch, every context created will have it's own address
space (provided the HW can do it).

v2: Disable BDW on rebase

NOTE: I tried to make this commit as small as possible. I needed one
place where I could "turn everything on" and that is here. It could be
split into finer commits, but I didn't really see much point.

Cc: Eric Anholt <eric@anholt.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 16:24:52 +01:00
Daniel Vetter 3d7f0f9dcc Merge commit drm-intel-fixes into topic/ppgtt
I need the tricky do_switch fix before I can merge the final piece of
the ppgtt enabling puzzle. Otherwise the conflict will be a real pain
to resolve since the do_switch hunk from -fixes must be placed at the
exact right place within a hunk in the next patch.

Conflicts:
	drivers/gpu/drm/i915/i915_gem_context.c
	drivers/gpu/drm/i915/i915_gem_execbuffer.c
	drivers/gpu/drm/i915/intel_display.c

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 16:23:37 +01:00
Ben Widawsky 4fe9adbc36 drm/i915: Do not allow buffers at offset 0
This is primarily a band aid for an unexplainable error in
gem_reloc_vs_gpu/forked-faulting-reloc-thrashing. Essentially as soon as
a relocated buffer (which had a non-zero presumed offset) moved to
offset 0, something goes bad. Since I have been unable to solve this,
and potentially this is a good thing to do anyway, since many things can
accidentally write to offset 0, why not?

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 16:15:40 +01:00
Ben Widawsky e20780439b drm/i915: Defer request freeing
With context destruction, we always want to be able to tear down the
underlying address space. This is invoked on the last unreference to the
context which could happen before we've moved all objects to the
inactive list. To enable a clean tear down the address space, make sure
to process the request free lastly.

Without this change, we cannot guarantee to we don't still have active
objects in the VM.

As an example of a failing case:
CTX-A is created, count=1
CTX-A is used during execbuf
	does a context switch count = 2
	and add_request count = 3
CTX B runs, switches, CTX-A count = 2
CTX-A is destroyed, count = 1
retire requests is called
	free_request from CTX-A, count = 0 <--- free context with active object

As mentioned above, by doing the free request after processing the
active list, we can avoid this case.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:52:51 +01:00
Ben Widawsky 41bde5535a drm/i915: Get context early in execbuf
We need to have the address space when reserving space for the objects.
Since the address space and context are tied together, and reserve
occurs before context switch (for good reason), we must lookup our
context earlier in the process.

This leaves some room for optimizations where we no longer need to use
ctx_id in certain places. This will be addressed in a subsequent patch.

Important tricky bit:
Because slow relocations during execbuffer drop struct_mutex

Perhaps it would be best to acquire the reference when we get the
context, but I'll save that for another day (note I have written the
patch before, and I found the changes required to be uglier than this).

Note that since we currently access everything via context id, and not
the data structure this is fine, though not desirable. The next change
attempts to get the context only once via the context ID idr lookup, and
as such, the following can happen:

CTX-A is created, refcount = 1
CTX-A execbuf, mutex dropped
close IOCTL called on CTX-A, refcount = 0
CTX-A resumes in execbuf.

v2: Rebased on top of
commit b6359918b8
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date:   Wed Oct 30 15:44:16 2013 +0200

    drm/i915: add i915_get_reset_stats_ioctl

v3: Rebased on top of
commit 25b3dfc87b
Author: Mika Westerberg <mika.westerberg@linux.intel.com>
Date:   Tue Nov 12 11:57:30 2013 +0200

Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date:   Tue Nov 26 16:14:33 2013 +0200

    drm/i915: check context reset stats before relocations

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:52:42 +01:00
Ben Widawsky c482972a08 drm/i915: Piggy back hangstats off of contexts
To simplify the codepaths somewhat, we can simply always create a
context. Contexts already keep hangstat information. This prevents us
from having to differentiate at other parts in the code.

There is allocation overhead, but it should not be measurable.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:51:58 +01:00
Ben Widawsky bdf4fd7ea0 drm/i915: Do aliasing PPGTT init with contexts
We have a default context which suits the aliasing PPGTT well. Tie them
together so it looks like any other context/PPGTT pair. This makes the
code cleaner as it won't have to special case aliasing as often.

The patch has one slightly tricky part in the default context creation
function. In the future (and on aliased setup) we create a new VM for a
context (potentially). However, if we have aliasing PPGTT, which occurs
at this point in time for all platforms GEN6+, we can simply manage the
refcounting to allow things to behave as normal. Now is a good time to
recall that the aliasing_ppgtt doesn't have a real VM, it uses the GGTT
drm_mm.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:32:14 +01:00
Ben Widawsky a3d67d2396 drm/i915: PPGTT vfuncs should take a ppgtt argument
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:27:56 +01:00
Ben Widawsky 2fa48d8d4a drm/i915: Split context enabling from init
We **need** to do this for exactly 1 reason, because we want to embed a
PPGTT into the context, but we don't want to special case the default
context.

To achieve that, we must be able to initialize contexts after the GTT is
setup (so we can allocate and pin the default context's BO), but before
the PPGTT and rings are initialized. This is because, currently, context
initialization requires ring usage. We don't have rings until after the
GTT is setup. If we split the enabling part of context initialization,
the part requiring the ringbuffer, we can untangle this, and then later
embed the PPGTT

Incidentally this allows us to also adhere to the original design of
context init/fini in future patches: they were only ever meant to be
called at driver load and unload.

v2: Move hw_contexts_disabled test in i915_gem_context_enable() (Chris)

v3: BUG_ON after checking for disabled contexts. Or else it blows up pre
gen6 (Ben)

v4: Forward port
Modified enable for each ring, since that patch is earlier in the series
Dropped ring arg from create_default_context so it can be used by others

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:27:55 +01:00
Ben Widawsky acce9ffa48 drm/i915: Better reset handling for contexts
This patch adds to changes for contexts on reset:
Sets last context to default - this will prevent the context switch
happening after a reset. That switch is not possible because the
rings are hung during reset and context switch requires reset. This
behavior will need to be reworked in the future, but this is what we
want for now.

In the future, we'll also want to reset the guilty context to
uninitialized. We should wait for ARB_Robustness related code to land
for that.

This is somewhat for paranoia.  Because we really don't know what the
GPU was doing when it hung, or the state it was in (mid context write,
for example), later restoring the context is a bad idea. By setting the
flag to not initialized, the next load of that context will not restore
the state, and thus on the subsequent switch away from the context will
overwrite the old data.

NOTE: This code needs a fixup when we actually have multiple VMs. The
issue that can occur is inactive objects in a VM will need to be
destroyed before the last context unref. This can now happen via the
fake switch introduced in this patch (and it other ways in the future)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:27:54 +01:00
Ben Widawsky e422b888eb drm/i915: Add a context open function
We'll be doing a bit more stuff with each file, so having our own open
function should make things clean.

This also allows us to easily add conditionals for stuff we don't want
to do when we don't have HW contexts.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:27:51 +01:00
Ben Widawsky 6f65e29aca drm/i915: Create bind/unbind abstraction for VMAs
To sum up what goes on here, we abstract the vma binding, similarly to
the previous object binding. This helps for distinguishing legacy
binding, versus modern binding. To keep the code churn as minimal as
possible, I am leaving in insert_entries(). It serves as the per
platform pte writing basically. bind_vma and insert_entries do share a
lot of similarities, and I did have designs to combine the two, but as
mentioned already... too much churn in an already massive patchset.

What follows are the 3 commits which existed discretely in the original
submissions. Upon rebasing on Broadwell support, it became clear that
separation was not good, and only made for more error prone code. Below
are the 3 commit messages with all their history.

drm/i915: Add bind/unbind object functions to VMA
drm/i915: Use the new vm [un]bind functions
drm/i915: reduce vm->insert_entries() usage

drm/i915: Add bind/unbind object functions to VMA

As we plumb the code with more VM information, it has become more
obvious that the easiest way to deal with bind and unbind is to simply
put the function pointers in the vm, and let those choose the correct
way to handle the page table updates. This change allows many places in
the code to simply be vm->bind, and not have to worry about
distinguishing PPGTT vs GGTT.

Notice that this patch has no impact on functionality. I've decided to
save the actual change until the next patch because I think it's easier
to review that way. I'm happy to squash the two, or let Daniel do it on
merge.

v2:
Make ggtt handle the quirky aliasing ppgtt
Add flags to bind object to support above
Don't ever call bind/unbind directly for PPGTT until we have real, full
PPGTT (use NULLs to assert this)
Make sure we rebind the ggtt if there already is a ggtt binding.  This
happens on set cache levels.
Use VMA for bind/unbind (Daniel, Ben)

v3: Reorganize ggtt_vma_bind to be more concise and easier to read
(Ville). Change logic in unbind to only unbind ggtt when there is a
global mapping, and to remove a redundant check if the aliasing ppgtt
exists.

v4: Make the bind function a bit smarter about the cache levels to avoid
unnecessary multiple remaps. "I accept it is a wart, I think unifying
the pin_vma / bind_vma could be unified later" (Chris)
Removed the git notes, and put version info here. (Daniel)

v5: Update the comment to not suck (Chris)

v6:
Move bind/unbind to the VMA. It makes more sense in the VMA structure
(always has, but I was previously lazy). With this change, it will allow
us to keep a distinct insert_entries.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

drm/i915: Use the new vm [un]bind functions

Building on the last patch which created the new function pointers in
the VM for bind/unbind, here we actually put those new function pointers
to use.

Split out as a separate patch to aid in review. I'm fine with squashing
into the previous patch if people request it.

v2: Updated to address the smart ggtt which can do aliasing as needed
Make sure we bind to global gtt when mappable and fenceable. I thought
we could get away without this initialy, but we cannot.

v3: Make the global GTT binding explicitly use the ggtt VM for
bind_vma(). While at it, use the new ggtt_vma helper (Chris)

At this point the original mailing list thread diverges. ie.

v4^:
use target_obj instead of obj for gen6 relocate_entry
vma->bind_vma() can be called safely during pin. So simply do that
instead of the complicated conditionals.
Don't restore PPGTT bound objects on resume path
Bug fix in resume path for globally bound Bos
Properly handle secure dispatch
Rebased on vma bind/unbind conversion

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

drm/i915: reduce vm->insert_entries() usage

FKA: drm/i915: eliminate vm->insert_entries()

With bind/unbind function pointers in place, we no longer need
insert_entries. We could, and want, to remove clear_range, however it's
not totally easy at this point. Since it's used in a couple of place
still that don't only deal in objects: setup, ppgtt init, and restore
gtt mappings.

v2: Don't actually remove insert_entries, just limit its usage. It will
be useful when we introduce gen8. It will always be called from the vma
bind/unbind.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:27:50 +01:00
Ben Widawsky d7f46fc4e7 drm/i915: Make pin count per VMA
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:27:49 +01:00
Ben Widawsky feb822cfc2 drm/i915: Handle inactivating objects for all VMAs
This came from a patch called, "drm/i915: Move active to vma"

When moving an object to the inactive list, we do it for all VMs for
which the object is bound.

The primary difference from that patch is this time around we don't not
track 'active' per vma, but rather by object. Therefore, we only need
one unref.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:27:46 +01:00
Ben Widawsky c39538a88d drm/i915: Takedown drm_mm on failed gtt setup
This was found by code inspection. If the GTT setup fails then we are
left without properly tearing down the drm_mm.

Hopefully this never happens.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:27:45 +01:00
Ben Widawsky 6e164c3382 drm/i915: Allow ggtt lookups to not WARN
To be able to effectively use the GGTT object lookup function, we don't
want to warn when there is no GGTT mapping. Let the caller deal with it
instead.

Originally, I had intended to have this behavior, and has not
introduced the WARN. It was introduced during review with the addition
of the follow commit

commit 5c2abbeab7
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date:   Tue Sep 24 09:57:57 2013 -0700

    drm/i915: Provide a cheap ggtt vma lookup

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:27:45 +01:00
Ben Widawsky 6f425321e0 drm/i915: Don't unconditionally try to deref aliasing ppgtt
Since the beginning, the functions which try to properly reference the
aliasing PPGTT have deferences a potentially null aliasing_ppgtt member.
Since the accessors are meant to be global, this will not do.

Introduced originally in:
commit a70a3148b0
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Wed Jul 31 16:59:56 2013 -0700

    drm/i915: Make proper functions for VMs

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:27:44 +01:00
Paulo Zanoni 48018a57a8 drm/i915: release the GTT mmaps when going into D3
So we'll get a fault when someone tries to access the mmap, then we'll
wake up from D3.

v2: - Rebase
v3: - Use gtt active/inactive

Testcase: igt/pm_pc8/gem-mmap-gtt
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
[danvet: Add comment + WARN as discussed with Paulo on irc.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-14 15:35:52 +01:00
Mika Kuoppala 168c3f2151 drm/i915: dont call irq_put when irq test is on
If test is running, irq_get was not called so we should gain
balance by not doing irq_put

"So the rule is: if you access unlocked values, you use ACCESS_ONCE().
You don't say "but it can't matter". Because you simply don't know."
-- Linus

v2: use local variable so it can't change during test (Chris)

v3: update commit msg and use ACCESS_ONCE (Ville)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-12 22:58:44 +01:00
Mika Kuoppala 47e9766df0 drm/i915: Fix timeout with missed interrupts in __wait_seqno
Commit 094f9a54e3 ("drm/i915: Fix __wait_seqno to use true infinite
timeouts") added support for __wait_seqno to detect missing interrupts and
go around them by polling. As there is also timeout detection in
__wait_seqno, the polling and timeout detection were done with the same
timer.

When there has been missed interrupts and polling is needed, the timer is
set to trigger in (now + 1) jiffies in future, instead of the caller
specified timeout.

Now when io_schedule() returns, we calculate the jiffies left to timeout
using the timer expiration value. As the current jiffies is now bound to be
always equal or greater than the expiration value, the timeout_jiffies will
become zero or negative and we return -ETIME to caller even tho the
timeout was never reached.

Fix this by decoupling timeout calculation from timer expiration.

v2: Commit message with some sense in it (Chris Wilson)

v3: add parenthesis on timeout_expire calculation

v4: don't read jiffies without timeout (Chris Wilson)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-12 15:27:21 +01:00
Chris Wilson 4db080f9e9 drm/i915: Fix erroneous dereference of batch_obj inside reset_status
As the rings may be processed and their requests deallocated in a
different order to the natural retirement during a reset,

/* Whilst this request exists, batch_obj will be on the
 * active_list, and so will hold the active reference. Only when this
 * request is retired will the the batch_obj be moved onto the
 * inactive_list and lose its active reference. Hence we do not need
 * to explicitly hold another reference here.
 */

is violated, and the batch_obj may be dereferenced after it had been
freed on another ring. This can be simply avoided by processing the
status update prior to deallocating any requests.

Fixes regression (a possible OOPS following a GPU hang) from
commit aa60c664e6
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date:   Wed Jun 12 15:13:20 2013 +0300

    drm/i915: find guilty batch buffer on ring resets

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
[danvet: Add the code comment Chris supplied.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-12 10:49:05 +01:00
Paulo Zanoni f65c916898 drm/i915: add runtime put/get calls at the basic places
If I add code to enable runtime PM on my Haswell machine, start a
desktop environment, then enable runtime PM, these functions will
complain that they're trying to read/write registers while the
graphics card is suspended.

v2: - Simplify i915_gem_fault changes.

Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
[danvet: Drop the hunk in i915_hangcheck_elapsed, it's the wrong thing
to do.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-10 22:47:33 +01:00
Chris Wilson 70903c3ba8 drm/i915: Fix ordering of unbind vs unpin pages
It is useful to assert that if the object is bound, then it must have
its pages pinned to prevent the shrinker from reaping its backing store.
This is even more useful with the introduction of real-ppgtt whereupon
we may have the object bound into several vma, with each instance
pinning the backing store. This assertion breaks down during unbind
where we unpinned the backing store before decoupling the vma binding.
This can be fixed with a trivial reording of the unbind sequence, which
reinforces the

   pin pages
   bind to vma
   ...
   unbind from vma
   unpin pages

concept.

v2: Bonus comment

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-04 12:10:50 +01:00
Ville Syrjälä 0bf2134780 drm/i915: MI_PREDICATE_RESULT_2 is HSW only
The MI_PREDICATE_RESULT_2 register exits only on HSW. On other
platforms the same offset is either reserved, or contains some
other register. So write the register only on HSW.

This regression has been introduced in

commit 9435373ef8
Author: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Date:   Wed Aug 28 16:45:46 2013 -0300

    drm/i915: Report enabled slices on Haswell GT3

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
[danvet: Add regression notice.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-29 15:00:03 +01:00
Daniel Vetter c09cd6e969 Merge branch 'backlight-rework' into drm-intel-next-queued
Pull in Jani's backlight rework branch. This was merged through a
separate branch to be able to sort out the Broadwell conflicts
properly before pulling it into the main development branch.

Conflicts:
	drivers/gpu/drm/i915/intel_display.c

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-15 10:02:39 +01:00
Ben Widawsky 31a5336e1c drm/i915/bdw: Swizzling support
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-08 18:09:37 +01:00
Ben Widawsky 5ab31333ac drm/i915/bdw: Fences on gen8 look just like gen7
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-08 18:09:36 +01:00
Ben Widawsky 8245be3139 drm/i915: Require HW contexts (when possible)
v2: Fixed the botched locking on init_hw failure in i915_reset (Ville)
Call cleanup_ringbuffer on failed context create in init_hw (Ville)

v3: Add dev argument ti clean_ringbuffer

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-07 09:35:44 +01:00
Paulo Zanoni de45eaf7b9 drm/i915: fix open-coded DIV_ROUND_UP
Use the nice Kernel macro, it makes the code much more readable.

Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-21 10:04:03 +02:00
Daniel Vetter aa5f802181 drm/i915: Use unsigned long for obj->user_pin_count
At least on linux sizeof(long) == sizeof(void*) and the thinking
is that you can grab about as many references as there's memory.

Doesn't really matter, just a bit of OCD since the fixed size data
type in a pure in-kernel datastructure look off.

v2: Ville asked for an overflow check since no one prevents userspace
from incrementing the pin count forever.

v3: s/INT/LONG/, noticed by Chris.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-16 22:06:39 +02:00
Chris Wilson 45c5f2022c drm/i915: Disable all GEM timers and work on unload
We have two once very similar functions, i915_gpu_idle() and
i915_gem_idle(). The former is used as the lower level operation to
flush work on the GPU, whereas the latter is the high level interface to
flush the GEM bookkeeping in addition to flushing the GPU. As such
i915_gem_idle() also clears out the request and activity lists and
cancels the delayed work. This is what we need for unloading the driver,
unfortunately we called i915_gpu_idle() instead.

In the process, make sure that when cancelling the delayed work and
timer, which is synchronous, that we do not hold any locks to prevent a
deadlock if the work item is already waiting upon the mutex. This
requires us to push the mutex down from the caller to i915_gem_idle().

v2: s/i915_gem_idle/i915_gem_suspend/

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70334
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: xunx.fang@intel.com
[danvet: Only set ums.suspended for !kms as discussed earlier. Chris
noticed that this slipped through.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-16 19:42:14 +02:00
Ben Widawsky 3d57e5bd12 drm/i915: Do a fuller init after reset
I had this lying around from he original PPGTT series, and thought we
might try to get it in by itself.

It's convenient to just call i915_gem_init_hw at reset because we'll be
adding new things to that function, and having just one function to call
instead of reimplementing it in two places is nice.

In order to accommodate we cleanup ringbuffers in order to bring them
back up cleanly. Optionally, we could also teardown/re initialize the
default context but this was causing some problems on reset which I
wasn't able to fully debug, and is unnecessary with the previous context
init/enable split.

This essentially reverts:
commit 8e88a2bd59
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Tue Jun 19 18:40:00 2012 +0200

    drm/i915: don't call modeset_init_hw in i915_reset

It seems to work for me on ILK now. Perhaps it's due to:
commit 8a5c2ae753
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Thu Mar 28 13:57:19 2013 -0700

    drm/i915: fix ILK GPU reset for render

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-16 11:08:08 +02:00
Daniel Vetter 3bbbe706e8 drm/i915: check that the i965g/gm 4G limit is really obeyed
In truly crazy circumstances shmem might give us the wrong type of
page. So be a bit paranoid and double check this.

Reviewer: Damien Lespiau <damien.lespiau@intel.com>
Cc: Rob Clark <robdclark@gmail.com>
References: http://lkml.org/lkml/2011/7/11/238
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-10 12:47:05 +02:00
Chris Wilson d9973b4356 drm/i915: Fix type mismatch and accounting in i915_gem_shrink
The interface uses an unsigned long, and we can use the unsigned counter
throughout our code, so do so. In the process, we notice one instance
where the shrink count is based on a heuristic rather than the result,
and another where we ask for too many pages to be purged.

v2: nr_to_scan needs to be promoted to a long as well, so just use
    sc->nr_to_scan directly.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-10 12:46:48 +02:00
Chris Wilson 5035c275af drm/i915: Call io_schedule() whilst whilsting for the GPU
Since we are waiting upon IO completion, inform the kernel through use
of the io_schedule() call rather than the regular schedule(). This
should allow the kernel to make better decisions regarding scheduling
and power management.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-10 12:46:47 +02:00
Daniel Vetter 967ad7f148 Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next
The conflict in intel_drv.h tripped me up a bit since a patch in dinq
moves all the functions around, but another one in drm-next removes a
single function. So I'ev figured backing this into a backmerge would
be good.

i915_dma.c is just adjacent lines changed, nothing nefarious there.

Conflicts:
	drivers/gpu/drm/i915/i915_dma.c
	drivers/gpu/drm/i915/intel_drv.h

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-10 12:44:43 +02:00
David Herrmann 16eb5f4379 drm: kill ->gem_init_object() and friends
All drivers embed gem-objects into their own buffer objects. There is no
reason to keep drm_gem_object_alloc(), gem->driver_private and
->gem_init_object() anymore.

New drivers are highly encouraged to do the same. There is no benefit in
allocating gem-objects separately.

Cc: Dave Airlie <airlied@gmail.com>
Cc: Alex Deucher <alexdeucher@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Ben Skeggs <skeggsb@gmail.com>
Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-10-09 14:38:02 +10:00
Chris Wilson b29c19b645 drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.

This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.

In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)

Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.

Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.

v2: Limit each client to receiving a single boost for each active period.
    Tested by QA to only marginally increase power, and to demonstrably
    increase throughput in games. No latency measurements yet.

v3: Cater for front-buffer rendering with manual throttling.

v4: Tidy up.

v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.

Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-03 20:01:31 +02:00
Chris Wilson 094f9a54e3 drm/i915: Fix __wait_seqno to use true infinite timeouts
When we switched to always using a timeout in conjunction with
wait_seqno, we lost the ability to detect missed interrupts. Since, we
have had issues with interrupts on a number of generations, and they are
required to be delivered in a timely fashion for a smooth UX, it is
important that we do log errors found in the wild and prevent the
display stalling for upwards of 1s every time the seqno interrupt is
missed.

Rather than continue to fix up the timeouts to work around the interface
impedence in wait_event_*(), open code the combination of
wait_event[_interruptible][_timeout], and use the exposed timer to
poll for seqno should we detect a lost interrupt.

v2: In order to satisfy the debug requirement of logging missed
interrupts with the real world requirments of making machines work even
if interrupts are hosed, we revert to polling after detecting a missed
interrupt.

v3: Throw in a debugfs interface to simulate broken hw not reporting
interrupts.

v4: s/EGAIN/EAGAIN/ (Imre)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Imre Deak <imre.deak@intel.com>
[danvet: Don't use the struct typedef in new code.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-03 20:01:30 +02:00
Chris Wilson b52b89da09 drm/i915: Add a tracepoint for using a semaphore
So that we can find the callers who introduce a ring stall. A single
ring stall is not too unwelcome, the right issue becomes when they start
to interlock and prevent any concurrent work. That, however, is a little
tricker to detect with a mere tracepoint!

v2: Rebrand it as a ring event, rather than an object event.
v3: Include the seqno in the tracepoint for posterity or something.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-01 07:45:24 +02:00
Ben Widawsky e2d05a8b1e drm/i915: Convert active API to VMA
Even though we track object activity and not VMA, because we have the
active_list be based on the VM, it makes the most sense to use VMAs in
the APIs.

NOTE: Daniel intends to eventually rip out active/inactive LRUs, but for
now, leave them be.

v2: Remove leftover hunk from the previous patch which didn't keep
i915_gem_object_move_to_active. That patch had to rely on the ring to
get the dev instead of the obj. (Chris)

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-01 07:45:21 +02:00
Ben Widawsky 5c2abbeab7 drm/i915: Provide a cheap ggtt vma lookup
"We do fairly often lookup the ggtt vma for an obj." - Chris Wilson. As
such, provide a function to offer slightly cheaper access to the vma.
Not performance tested. By my quick estimation it saves at least 3
pointer dereferences from the existing mechanism.

This patch mostly matches code from Chris in
<20130911221430.GB7825@nuc-i3427.alporthouse.com>

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-01 07:45:21 +02:00
Chris Wilson f740334775 drm/i915: Do not unlock upon error in i915_gem_idle()
We never took the lock ourselves and all callers expect the struct_mutex
to be locked upon return (be it success or error), thereore dropping the
lock along the error paths looks to be a vestigial error from

commit db1b76ca6a
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Tue Jul 9 16:51:37 2013 +0200

    drm/i915: don't frob mm.suspended when not using ums

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-01 07:45:03 +02:00
Daniel Vetter b14c5679dd drm/i915: use pointer = k[cmz...]alloc(sizeof(*pointer), ...) pattern
Done while reviewing all our allocations for fubar. Also a few errant
cases of lacking () for the sizeof operator - just a bit of OCD.

I've left out all the conversions that also should use kcalloc from
this patch  (it's only 2).

Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-01 07:45:01 +02:00
Dave Airlie 4821ff14a3 Merge tag 'drm-intel-next-2013-09-21-merged' of git://people.freedesktop.org/~danvet/drm-intel into drm-next
drm-intel-next-2013-09-21:
- clock state handling rework from Ville
- l3 parity handling fixes for hsw from Ben
- some more watermark improvements from Ville
- ban badly behaved context from Mika
- a few vlv improvements from Jesse
- VGA power domain handling from Ville
drm-intel-next-2013-09-06:
- Basic mipi dsi support from Jani. Not yet converted over to drm_bridge
  since that was too fresh, but the porting is in progress already.
- More vma patches from Ben, this time the code to convert the execbuffer
  code. Now that the shrinker recursion bug is tracked down we can move
  ahead here again. Yay!
- Optimize hw context switching to not generate needless interrupts (Chris
  Wilson). Also some shuffling for the oustanding request allocation.
- Opregion support for SWSCI, although not yet fully wired up (we need a
  bit of runtime D3 support for that apparently, due to Windows design
  deficiencies), from Jani Nikula.
- A few smaller changes all over.

[airlied: merge conflict fix in i9xx_set_pipeconf]

* tag 'drm-intel-next-2013-09-21-merged' of git://people.freedesktop.org/~danvet/drm-intel: (119 commits)
  drm/i915: assume all GM45 Acer laptops use inverted backlight PWM
  drm/i915: cleanup a min_t() cast
  drm/i915: Pull intel_init_power_well() out of intel_modeset_init_hw()
  drm/i915: Add POWER_DOMAIN_VGA
  drm/i915: Refactor power well refcount inc/dec operations
  drm/i915: Add intel_display_power_{get, put} to request power for specific domains
  drm/i915: Change i915_request power well handling
  drm/i915: POSTING_READ IPS_CTL before waiting for the vblank
  drm/i915: don't disable ERR_INT on the IRQ handler
  drm/i915/vlv: disable rc6p and rc6pp residency reporting on BYT
  drm/i915/vlv: honor i915_enable_rc6 boot param on VLV
  drm/i915: s/HAS_L3_GPU_CACHE/HAS_L3_DPF
  drm/i915: Do remaps for all contexts
  drm/i915: Keep a list of all contexts
  drm/i915: Make l3 remapping use the ring
  drm/i915: Add second slice l3 remapping
  drm/i915: Fix HSW parity test
  drm/i915: dump crtc timings from the pipe config
  drm/i915: register backlight device also when backlight class is a module
  drm/i915: write D_COMP using the mailbox
  ...

Conflicts:
	drivers/gpu/drm/i915/intel_display.c
2013-10-01 10:00:50 +10:00
Daniel Vetter d32270460f drm/i915: Fix up usage of SHRINK_STOP
In

commit 81e49f8114
Author: Glauber Costa <glommer@openvz.org>
Date:   Wed Aug 28 10:18:13 2013 +1000

    i915: bail out earlier when shrinker cannot acquire mutex

SHRINK_STOP was added to tell the core shrinker code to bail out and
go to the next shrinker since the i915 shrinker couldn't acquire
required locks. But the SHRINK_STOP return code was added to the
->count_objects callback and not the ->scan_objects callback as it
should have been, resulting in tons of dmesg noise like

shrink_slab: i915_gem_inactive_scan+0x0/0x9c negative objects to delete nr=-xxxxxxxxx

Fix discusssed with Dave Chinner.

References: http://www.spinics.net/lists/intel-gfx/msg33597.html
Reported-by: Knut Petersen <Knut_Petersen@t-online.de>
Cc: Knut Petersen <Knut_Petersen@t-online.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Glauber Costa <glommer@openvz.org>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-26 00:31:51 +02:00
Daniel Vetter b599c89e8c Linux 3.12-rc2
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQEcBAABAgAGBQJSQMORAAoJEHm+PkMAQRiGj14H/1bjhtfNjPdX7MVQAzA+WpwX
 s7h1IQu2Si9S5S1lBiM2sBTOssVcmfheO9x4yqm7JNOD1RnssWKOM3q+zVOLstwd
 GD3gluJPeraD5EyYSqEJ9ILPQ3gbxb4wOlT0Z291TW6E8XhLRr0RTOJPksRsgvLH
 Ckm9uJh6ArS6ZXfXiaDQfd+xHAQJkUfW6nMSA0g9ZO9C6KIDRvcbUmrY3m4HhfIk
 mK0TXCBs+AXGDIjTEB8JgIQL/5y1Qn0c4R+2uTU/4YWwyLvJTV1e44kGoleukMMT
 6Pw/TNlUEN161dbSaqCyF3sfXHDYQ5valycI2PDgitMtPSxbzsU1VDizS8+daRg=
 =lEmF
 -----END PGP SIGNATURE-----

Merge tag 'v3.12-rc2' into drm-intel-next

Backmerge Linux 3.12-rc2 to prep for a bunch of -next patches:
- Header cleanup in intel_drv.h, both changed in -fixes and my current
  -next pile.
- Cursor handling cleanup for -next which depends upon the cursor
  handling fix merged into -rc2.

All just trivial conflicts of the "changed adjacent lines" type:
	drivers/gpu/drm/i915/i915_gem.c
	drivers/gpu/drm/i915/intel_display.c
	drivers/gpu/drm/i915/intel_drv.h

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-24 09:32:53 +02:00
Linus Torvalds d8524ae9d6 Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
 - some small fixes for msm and exynos
 - a regression revert affecting nouveau users with old userspace
 - intel pageflip deadlock and gpu hang fixes, hsw modesetting hangs

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (22 commits)
  Revert "drm: mark context support as a legacy subsystem"
  drm/i915: Don't enable the cursor on a disable pipe
  drm/i915: do not update cursor in crtc mode set
  drm/exynos: fix return value check in lowlevel_buffer_allocate()
  drm/exynos: Fix address space warnings in exynos_drm_fbdev.c
  drm/exynos: Fix address space warning in exynos_drm_buf.c
  drm/exynos: Remove redundant OF dependency
  drm/msm: drop unnecessary set_need_resched()
  drm/i915: kill set_need_resched
  drm/msm: fix potential NULL pointer dereference
  drm/i915/dvo: set crtc timings again for panel fixed modes
  drm/i915/sdvo: Robustify the dtd<->drm_mode conversions
  drm/msm: workaround for missing irq
  drm/msm: return -EBUSY if bo still active
  drm/msm: fix return value check in ERR_PTR()
  drm/msm: fix cmdstream size check
  drm/msm: hangcheck harder
  drm/msm: handle read vs write fences
  drm/i915/sdvo: Fully translate sync flags in the dtd->mode conversion
  drm/i915: Use proper print format for debug prints
  ...
2013-09-22 19:51:49 -07:00
Ben Widawsky 040d2baa62 drm/i915: s/HAS_L3_GPU_CACHE/HAS_L3_DPF
We'd only ever used this define to denote whether or not we have the
dynamic parity feature (DPF) and never to determine whether or not L3
exists. Baytrail is a good example of where L3 exists, and not DPF.

This patch provides clarify in the code for future use cases which might
want to actually query whether or not L3 exists.

v2: Add /* DPF == dynamic parity feature */

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-19 20:41:00 +02:00
Ben Widawsky a33afea5ff drm/i915: Keep a list of all contexts
I have implemented this patch before without creating a separate list
(I'm having trouble finding the links, but the messages ids are:
<1364942743-6041-2-git-send-email-ben@bwidawsk.net>
<1365118914-15753-9-git-send-email-ben@bwidawsk.net>)

However, the code is much simpler to just use a list and it makes the
code from the next patch a lot more pretty.

As you'll see in the next patch, the reason for this is to be able to
specify when a context needs to get L3 remapping. More details there.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-19 20:39:43 +02:00
Ben Widawsky c3787e2eac drm/i915: Make l3 remapping use the ring
Using LRI for setting the remapping registers allows us to stream l3
remapping information. This is necessary to handle per context remaps as
we'll see implemented in an upcoming patch.

Using the ring also means we don't need to frob the DOP clock gating
bits.

v2: Add comment about lack of worry for concurrent register access
(Daniel)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
[danvet: Bikeshed the comment a bit by doing a s/XXX/Note - there's
nothing to fix.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-19 20:38:00 +02:00
Ben Widawsky 35a85ac606 drm/i915: Add second slice l3 remapping
Certain HSW SKUs have a second bank of L3. This L3 remapping has a
separate register set, and interrupt from the first "slice". A slice is
simply a term to define some subset of the GPU's l3 cache. This patch
implements both the interrupt handler, and ability to communicate with
userspace about this second slice.

v2:  Remove redundant check about non-existent slice.
Change warning about interrupts of unknown slices to WARN_ON_ONCE
Handle the case where we get 2 slice interrupts concurrently, and switch
the tracking of interrupts to be non-destructive (all Ville)
Don't enable/mask the second slice parity interrupt for ivb/vlv (even
though all docs I can find claim it's rsvd) (Ville + Bryan)
Keep BYT excluded from L3 parity

v3: Fix the slice = ffs to be decremented by one (found by Ville). When
I initially did my testing on the series, I was using 1-based slice
counting, so this code was correct. Not sure why my simpler tests that
I've been running since then didn't pick it up sooner.

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-19 20:37:04 +02:00
Linus Torvalds 26935fb06e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile 4 from Al Viro:
 "list_lru pile, mostly"

This came out of Andrew's pile, Al ended up doing the merge work so that
Andrew didn't have to.

Additionally, a few fixes.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (42 commits)
  super: fix for destroy lrus
  list_lru: dynamically adjust node arrays
  shrinker: Kill old ->shrink API.
  shrinker: convert remaining shrinkers to count/scan API
  staging/lustre/libcfs: cleanup linux-mem.h
  staging/lustre/ptlrpc: convert to new shrinker API
  staging/lustre/obdclass: convert lu_object shrinker to count/scan API
  staging/lustre/ldlm: convert to shrinkers to count/scan API
  hugepage: convert huge zero page shrinker to new shrinker API
  i915: bail out earlier when shrinker cannot acquire mutex
  drivers: convert shrinkers to new count/scan API
  fs: convert fs shrinkers to new scan/count API
  xfs: fix dquot isolation hang
  xfs-convert-dquot-cache-lru-to-list_lru-fix
  xfs: convert dquot cache lru to list_lru
  xfs: rework buffer dispose list tracking
  xfs-convert-buftarg-lru-to-generic-code-fix
  xfs: convert buftarg LRU to generic code
  fs: convert inode and dentry shrinking to be node aware
  vmscan: per-node deferred work
  ...
2013-09-12 15:01:38 -07:00
Daniel Vetter 571c608d06 drm/i915: kill set_need_resched
This is just a remnant from the old days when our reset handling was
horribly racy, suffered from terribly locking issues and often happily
live-locked. Those days are now gone so we can drop the hacks and just
rip the reschedule-point out.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-12 22:40:36 +02:00
Ben Widawsky 23f5448398 drm/i915: Synchronize pread/pwrite with wait_rendering
lifted from Daniel:
pread/pwrite isn't about the object's domain at all, but purely about
synchronizing for outstanding rendering. Replacing the call to
set_to_gtt_domain with a wait_rendering would imo improve code
readability. Furthermore we could pimp pread to only block for
outstanding writes and not for reads.

Since you're not the first one to trip over this: Can I volunteer you
for a follow-up patch to fix this?

v2: Switch the pwrite patch to use \!read_only. This was a typo in the
original code. (Chris, Daniel)

Recommended-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Fix up the logic fumble - wait_rendering has a bool readonly
paramater, set_to_gtt_domain otoh has bool write. Breakage reported by
Jani Nikula, I've double-checked that igt/gem_concurrent_blt/prw-*
would have caught this.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-12 21:56:52 +02:00
Glauber Costa 81e49f8114 i915: bail out earlier when shrinker cannot acquire mutex
The main shrinker driver will keep trying for a while to free objects if
the returned value from the shrink scan procedure is 0.  That means "no
objects now", but a retry could very well succeed.

But what we should say here is a different thing: that it is impossible to
shrink, and we would better bail out soon.  We find this behavior more
appropriate for the case where the lock cannot be taken.  Specially given
the hammer behavior of the i915: if another thread is already shrinking,
we are likely not to be able to shrink anything anyway when we finally
acquire the mutex.

Signed-off-by: Glauber Costa <glommer@openvz.org>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10 18:56:32 -04:00
Dave Chinner 7dc19d5aff drivers: convert shrinkers to new count/scan API
Convert the driver shrinkers to the new API.  Most changes are compile
tested only because I either don't have the hardware or it's staging
stuff.

FWIW, the md and android code is pretty good, but the rest of it makes me
want to claw my eyes out.  The amount of broken code I just encountered is
mind boggling.  I've added comments explaining what is broken, but I fear
that some of the code would be best dealt with by being dragged behind the
bike shed, burying in mud up to it's neck and then run over repeatedly
with a blunt lawn mower.

Special mention goes to the zcache/zcache2 drivers.  They can't co-exist
in the build at the same time, they are under different menu options in
menuconfig, they only show up when you've got the right set of mm
subsystem options configured and so even compile testing is an exercise in
pulling teeth.  And that doesn't even take into account the horrible,
broken code...

[glommer@openvz.org: fixes for i915, android lowmem, zcache, bcache]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@openvz.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10 18:56:32 -04:00
Chris Wilson 5a1d5eb020 drm/i915: Remove the double-list iteration from bound_any()
The purpose of the function is to find out whether the object is still
bound in any address space. This can be easily checked by looking at the
vma currently associated with the object, rather than asking if any of
the global address spaces have an active vma on the object.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-10 16:14:06 +02:00
Mika Kuoppala be62acb4cc drm/i915: ban badly behaving contexts
Now when we have mechanism in place to track which context
was guilty of hanging the gpu, it is possible to punish
for bad behaviour.

If context has recently submitted a faulty batchbuffers guilty of
gpu hang and submits another batch which hangs gpu in quick
succession, ban it permanently. If ctx is banned, no more
batchbuffers will be queued for execution.

There is no need for global wedge machinery anymore and
it would be unwise to wedge the whole gpu if we have multiple
hanging batches queued for execution. Instead just ban
the guilty ones and carry on.

v2: Store guilty ban status bool in gpu_error instead of pointers
    that might become danling before hang is declared.

v3: Use return value for banned status instead of stashing state
    into gpu_error (Chris Wilson)

v4: - rebase on top of fixed hang stats api
    - add define for ban period
    - rename commit and improve commit msg

v5: - rely context banning instead of wedging the gpu
    - beautification and fix for ban calculation (Chris)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-06 17:55:50 +02:00
Chris Wilson 57094f8246 drm/i915: Hold an object reference whilst we shrink it
Whilst running the shrinker, we need to hold a reference as we unbind
the objects, or else we may end up waiting for and retiring requests,
which in turn may result in this object being freed.

This is very similar to the eviction code which also has to be very
careful to keep a reference to its objects as it retires and unbinds
them.

Another similarity, that Ben pointed out, is that as we may call
retire-requests, the unbound_list is outside of our control. We must
only process a single element of that list at a time, that is we can not
rely on the "safe" next pointer being valid after a call to
i915_vma_unbind().

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
  IP: [<ffffffffa0082892>] i915_gem_gtt_finish_object+0x68/0xbd [i915]
  PGD 758d3067 PUD ac0d6067 PMD 0
  Oops: 0000 [#1] SMP
  Modules linked in: dm_mod snd_hda_codec_realtek iTCO_wdt iTCO_vendor_support pcspkr snd_hda_intel i2c_i801 snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer snd lpc_ich mfd_core soundcore battery ac option usb_wwan usbserial uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev i915 video button drm_kms_helper drm acpi_cpufreq mperf freq_table
  CPU: 1 PID: 16835 Comm: fbo-maxsize Not tainted 3.11.0-rc7_nightlytop_8fdad4_20130902_+ #7977
  task: ffff8800712106d0 ti: ffff880028e4a000 task.ti: ffff880028e4a000
  RIP: 0010:[<ffffffffa0082892>]  [<ffffffffa0082892>] i915_gem_gtt_finish_object+0x68/0xbd [i915]
  RSP: 0018:ffff880028e4b9e8  EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff880145734000 RCX: ffff880145735328
  RDX: ffff8801457353fc RSI: 0000000000000000 RDI: ffff88007597cc00
  RBP: ffff88007597cc00 R08: 0000000000000001 R09: ffff88014f257f00
  R10: ffffea0001d65f00 R11: 0000000000bba60b R12: ffff880149e5b000
  R13: ffff880145734001 R14: ffff88007597ccc8 R15: ffff88007597cc00
  FS:  00007ff5bc919740(0000) GS:ffff88014f240000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000008 CR3: 0000000028f4c000 CR4: 00000000001407e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Stack:
   0000000000000000 ffff88007597cc00 ffff8801440d6840 0000000000000000
   ffff880145734000 ffffffffa007c854 0000000000000010 ffff88007597c900
   0000000000018000 00000000004a1201 ffff88007597cc60 ffffffffa007d183
  Call Trace:
   [<ffffffffa007c854>] ? i915_vma_unbind+0xe2/0x1d1 [i915]
   [<ffffffffa007d183>] ? __i915_gem_shrink+0xf1/0x162 [i915]
   [<ffffffffa007d2ee>] ? i915_gem_object_get_pages_gtt+0xfa/0x303 [i915]
   [<ffffffffa00795f4>] ? i915_gem_object_get_pages+0x54/0x89 [i915]
   [<ffffffffa007cbda>] ? i915_gem_object_pin+0x238/0x5ce [i915]
   [<ffffffff812cba5f>] ? __sg_page_iter_next+0x2b/0x58
   [<ffffffffa0082056>] ? gen6_ppgtt_insert_entries+0xf2/0x114 [i915]
   [<ffffffffa007fe4b>] ? i915_gem_execbuffer_reserve_vma.isra.13+0x79/0x18d [i915]
   [<ffffffffa008017c>] ? i915_gem_execbuffer_reserve+0x21d/0x347 [i915]
   [<ffffffffa0080bfb>] ? i915_gem_do_execbuffer.isra.17+0x4f3/0xe61 [i915]
   [<ffffffffa00795f4>] ? i915_gem_object_get_pages+0x54/0x89 [i915]
   [<ffffffffa007e405>] ? i915_gem_pwrite_ioctl+0x743/0x7a5 [i915]
   [<ffffffffa0081a46>] ? i915_gem_execbuffer2+0x15e/0x1e4 [i915]
   [<ffffffffa000e20d>] ? drm_ioctl+0x2a5/0x3c4 [drm]
   [<ffffffffa00818e8>] ? i915_gem_execbuffer+0x37f/0x37f [i915]
   [<ffffffff816f64c0>] ? __do_page_fault+0x3ab/0x449
   [<ffffffff810be3da>] ? do_mmap_pgoff+0x2b2/0x341
   [<ffffffff810e49be>] ? vfs_ioctl+0x1e/0x31
   [<ffffffff810e5194>] ? do_vfs_ioctl+0x3ad/0x3ef
   [<ffffffff810e5224>] ? SyS_ioctl+0x4e/0x7e
   [<ffffffff816f88d2>] ? system_call_fastpath+0x16/0x1b
  Code: 52 0c a0 48 c7 c6 22 30 0d a0 31 c0 e8 ef 00 f9 ff bf c6 a7 00 00 e8 90 5d 24 e1 f6 85 13 01 00 00 10 75 44 48 8b 85 18 01 00 00 <8b> 50 08 48 8b 30 49 8b 84 24 88 02 00 00 48 89 c7 48 81 c7 98
  RIP  [<ffffffffa0082892>] i915_gem_gtt_finish_object+0x68/0xbd [i915]
  RSP <ffff880028e4b9e8>
  CR2: 0000000000000008

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68171
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
[danvet: Bikeshed the comments a bit as discussed with Chris.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-05 14:47:59 +02:00
Chris Wilson 3c0e234c84 drm/i915; Preallocate the lazy request
It is possible for us to be forced to perform an allocation for the lazy
request whilst running the shrinker. This allocation may fail, leaving
us unable to reclaim any memory leading to premature OOM. A neat
solution to the problem is to preallocate the request at the same time
as acquiring the seqno for the ring transaction. This means that we can
report ENOMEM prior to touching the rings.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-05 12:03:53 +02:00
Chris Wilson 1823521d2b drm/i915: Rename ring->outstanding_lazy_request
Prior to preallocating an request for lazy emission, rename the existing
field to make way (and differentiate the seqno from the request struct).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-05 12:03:12 +02:00
Chris Wilson 9a7e0c2a1b drm/i915: Rearrange the comments in i915_add_request()
The comments were a little out-of-sequence with the code, forcing the
reader to jump around whilst reading. Whilst moving the comments around,
add one to explain the context reference.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-04 17:34:54 +02:00
Chris Wilson c0321e2c5a drm/i915: Do not add an interrupt for a context switch
We use the request to ensure we hold a reference to the context for the
duration that it remains in use by the ring. Each request only holds a
reference to the current context, hence we emit a request after
switching contexts with the final reference to the old context. However,
the extra interrupt caused by that request is not useful (no timing
critical function will wait for the context object), instead the overhead
of servicing the IRQ shows up in some (lightweight) benchmarks. In order
to keep the useful property of using the request to manage the context
lifetime, we want to add a dummy request that is associated with the
interrupt from the subsequent real request following the batch.

The extra interrupt was added as a side-effect of using
i915_add_request() in

commit 112522f678
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu May 2 16:48:07 2013 +0300

    drm/i915: put context upon switching

v2: Daniel convinced me that the request here was solely for context
lifetime tracking and that we have the active ref to keep the object
alive whilst the MI_SET_CONTEXT. So the only concern then is which
context should get the blame for MI_SET_CONTEXT failing. The old scheme
added a request for the old context so that any hang upto and including
the switch away would mark the old context as guilty. Now any hang here
implicates the new context. However since we have already gone through a
complete flush with the last context in its last request, and all that
lies in no-man's-land is an invalidate flush and the MI_SET_CONTEXT, we
should be safe in not unduly placing blame on the new context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-04 17:34:53 +02:00
Daniel Vetter 0ff501cbb5 drm/i915: Fix list corruption in vma_unbind
The saga around the breadcrumb vmas used by execbuf continues ...

This time around we've managed to unconditionally move the object to
the unbound list on the last vma unbind even though it might never
have been on either the bound or unbound list. Hilarity ensued.

Chris Wilson tracked this one down but compared to his patches I've
simply opted to completely separate the unbound case for not-yet bound
vmas. Otherwise we imo end up with semantically hard to parse checks
around the list_move_tail(global_list, ...).

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68462
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-04 17:34:52 +02:00
Rodrigo Vivi 9435373ef8 drm/i915: Report enabled slices on Haswell GT3
Batchbuffers constructed by userspace can conditionalise their URB
allocations through the use of the MI_SET_PREDICATE command. This
command can read the MI_PREDICATE_RESULT_2 register to see how many
slices are enabled on GT3, and by virtue of the result, scale their
memory allocations to fit enabled memory.

Of course, this only works if the kernel sets the appropriate bit in the
register first.

v2: Better commit subject and message by Chris Wilson.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Credits-to: Yejun Guo <yejun.guo@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-04 17:34:51 +02:00
Daniel Vetter b93dab6e9d drm/i915: More vma fixups around unbind/destroy
The important bugfix here is that we must not unlink the vma when
we keep it around as a placeholder for the execbuf code. Since then we
won't find it again when execbuf gets interrupt and restarted and
create a 2nd vma. And since the code as-is isn't fit yet to deal with
more than one vma, hilarity ensues.

Specifically the dma map/unmap of the sg table isn't adjusted for
multiple vmas yet and will blow up like this:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffffa008fb37>] i915_gem_gtt_finish_object+0x73/0xc8 [i915]
PGD 56bb5067 PUD ad3dd067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in: tcp_lp ppdev parport_pc lp parport ipv6 dm_mod dcdbas snd_hda_codec_hdmi pcspkr snd_hda_codec_realtek serio_raw i2c_i801 iTCO_wdt iTCO_vendor_support snd_hda_intel snd_hda_codec lpc_ich snd_hwdep mfd_core snd_pcm snd_page_alloc snd_timer snd soundcore acpi_cpufreq i915 video button drm_kms_helper drm mperf freq_table
CPU: 1 PID: 16650 Comm: fbo-maxsize Not tainted 3.11.0-rc4_nightlytop_d93f59_debug_20130814_+ #6957
Hardware name: Dell Inc. OptiPlex 9010/03JR84, BIOS A01 05/04/2012
task: ffff8800563b3f00 ti: ffff88004bdf4000 task.ti: ffff88004bdf4000
RIP: 0010:[<ffffffffa008fb37>]  [<ffffffffa008fb37>] i915_gem_gtt_finish_object+0x73/0xc8 [i915]
RSP: 0018:ffff88004bdf5958  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8801135e0000 RCX: ffff8800ad3bf8e0
RDX: ffff8800ad3bf8e0 RSI: 0000000000000000 RDI: ffff8801007ee780
RBP: ffff88004bdf5978 R08: ffff8800ad3bf8e0 R09: 0000000000000000
R10: ffffffff86ca1810 R11: ffff880036a17101 R12: ffff8801007ee780
R13: 0000000000018001 R14: ffff880118c4e000 R15: ffff8801007ee780
FS:  00007f401a0ce740(0000) GS:ffff88011e280000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000005635c000 CR4: 00000000001407e0
Stack:
 ffff8801007ee780 ffff88005c253180 0000000000018000 ffff8801135e0000
 ffff88004bdf59a8 ffffffffa0088e55 0000000000000011 ffff8801007eec00
 0000000000018000 ffff880036a17101 ffff88004bdf5a08 ffffffffa0089026
Call Trace:
 [<ffffffffa0088e55>] i915_vma_unbind+0xdf/0x1ab [i915]
 [<ffffffffa0089026>] __i915_gem_shrink+0x105/0x177 [i915]
 [<ffffffffa0089452>] i915_gem_object_get_pages_gtt+0x108/0x309 [i915]
 [<ffffffffa0085ba9>] i915_gem_object_get_pages+0x61/0x90 [i915]
 [<ffffffffa008f22b>] ? gen6_ppgtt_insert_entries+0x103/0x125 [i915]
 [<ffffffffa008a113>] i915_gem_object_pin+0x1fa/0x5df [i915]
 [<ffffffffa008cdfe>] i915_gem_execbuffer_reserve_object.isra.6+0x8d/0x1bc [i915]
 [<ffffffffa008d156>] i915_gem_execbuffer_reserve+0x229/0x367 [i915]
 [<ffffffffa008dbf6>] i915_gem_do_execbuffer.isra.12+0x4dc/0xf3a [i915]
 [<ffffffff810fc823>] ? might_fault+0x40/0x90
 [<ffffffffa008eb89>] i915_gem_execbuffer2+0x187/0x222 [i915]
 [<ffffffffa000971c>] drm_ioctl+0x308/0x442 [drm]
 [<ffffffffa008ea02>] ? i915_gem_execbuffer+0x3ae/0x3ae [i915]
 [<ffffffff817db156>] ? __do_page_fault+0x3dd/0x481
 [<ffffffff8112fdba>] vfs_ioctl+0x26/0x39
 [<ffffffff811306a2>] do_vfs_ioctl+0x40e/0x451
 [<ffffffff817deda7>] ? sysret_check+0x1b/0x56
 [<ffffffff8113073c>] SyS_ioctl+0x57/0x87
 [<ffffffff8135bbfe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff817ded82>] system_call_fastpath+0x16/0x1b
Code: 48 c7 c6 84 30 0e a0 31 c0 e8 d0 e9 f7 ff bf c6 a7 00 00 e8 07 af 2c e1 41 f6 84 24 03 01 00 00 10 75 44 49 8b 84 24 08 01 00 00 <8b> 50 08 48 8b 30 49 8b 86 b0 04 00 00 48 89 c7 48 81 c7 98 00
RIP  [<ffffffffa008fb37>] i915_gem_gtt_finish_object+0x73/0xc8 [i915]
 RSP <ffff88004bdf5958>
CR2: 0000000000000008

As a consequence we need to change the "only one vma for now" check in
vma_unbind - since vma_destroy isn't always called the obj->vma_list
might not be empty. Instead check that the vma list is singular at the
beginning of vma_unbind. This is also more symmetric with bind_to_vm.

This fixes the igt/gem_evict_everything|alignment testcases.

v2:
- Add a paranoid WARN to mark_free in the eviction code to make sure
  we never try to evict a vma used by the execbuf code right now.
- Move the check for a temporary execbuf vma into vma_destroy -
  otherwise the failure path cleanup in bind_to_vm will blow up.

Our first attempting at fixing this was

commit 1be81a2f2cfd8789a627401d470423358fba2d76
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Aug 20 12:56:40 2013 +0100

    drm/i915: Don't destroy the vma placeholder during execbuffer reservation

Squash with this when merging!

v3: Improvements suggested in Chris' review:
- Move the WARN_ON in vma_destroy that checks for vmas with an drm_mm
  allocation before the early return.
- Bail out if we hit the WARN in mark_free to hopefully make the
  kernel survive for long enough to capture it.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68298
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68171
Tested-by: lu hua <huax.lu@intel.com> (v2)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-04 17:34:42 +02:00
Chris Wilson aaa0566792 drm/i915: Don't destroy the vma placeholder during execbuffer reservation
The execbuffer handle and exec_link were moved from the object into the
vma. As the vma may be unbound and destroyed whilst attempting to
reserve the execbuffer objects (either through a forced unbind to fix up
a misalignment or through an evict-everything call) we need to prevent
the free of the i915_vma itself. Otherwise not only is the list of
objects to reserve corrupt, but we continue to reference stale vma
entries.

Fixes kernel crash with i-g-t/gem_evict_everything

This regression has been introduced in

commit 04038a515d6eda6dd0857c0ade0b3950d372f4c0
Author:     Ben Widawsky <ben@bwidawsk.net>
AuthorDate: Wed Aug 14 11:38:36 2013 +0200

    drm/i915: Convert execbuf code to use vmas

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
References: http://www.spinics.net/lists/intel-gfx/msg32038.html
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68298
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-04 17:34:42 +02:00
Daniel Vetter e656a6cba0 drm/i915: inline vma_create into lookup_or_create_vma
In the execbuf code we don't clean up any vmas which ended up not
getting bound for code simplicity. To make sure that we don't end up
creating multiple vma for the same vm kill the somewhat dangerous
vma_create function and inline it into lookup_or_create.

This is just a safety measure to prevent surprises in the future.

Also update the somewhat confused comment in the execbuf code and
clarify what kind of magic is going on with a new one.

v2: Keep the function separate as requested by Chris. But give it a __
prefix for paranoia and move it tighter together with the other vma
stuff.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-04 17:34:41 +02:00
Ben Widawsky 27173f1f95 drm/i915: Convert execbuf code to use vmas
In order to transition more of our code over to using a VMA instead of
an <OBJ, VM> pair - we must have the vma accessible at execbuf time. Up
until now, we've only had a VMA when actually binding an object.

The previous patch helped handle the distinction on bound vs. unbound.
This patch will help us catch leaks, and other issues before we actually
shuffle a bunch of stuff around.

This attempts to convert all the execbuf code to speak in vmas. Since
the execbuf code is very self contained it was a nice isolated
conversion.

The meat of the code is about turning eb_objects into eb_vma, and then
wiring up the rest of the code to use vmas instead of obj, vm pairs.

Unfortunately, to do this, we must move the exec_list link from the obj
structure. This list is reused in the eviction code, so we must also
modify the eviction code to make this work.

WARNING: This patch makes an already hotly profiled path slower. The cost is
unavoidable. In reply to this mail, I will attach the extra data.

v2: Release table lock early, and two a 2 phase vma lookup to avoid
having to use a GFP_ATOMIC. (Chris)

v3: s/obj_exec_list/obj_exec_link/
Updates to address
commit 6d2b888569
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Aug 7 18:30:54 2013 +0100

    drm/i915: List objects allocated from stolen memory in debugfs

v4: Use obj = vma->obj for neatness in some places (Chris)
need_reloc_mappable() should return false if ppgtt (Chris)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Split out prep patches. Also remove a FIXME comment which is
now taken care of.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-04 17:34:41 +02:00
Damien Lespiau d2933a5b8f drm/i915: Don't call sg_free_table() if sg_alloc_table() fails
One needs to call __sg_free_table() if __sg_alloc_table() fails, but
sg_alloc_table() does that for us already.

Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewd-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-03 19:18:00 +02:00
Joe Perches fac15c1082 i915_gem: Convert kmem_cache_alloc(...GFP_ZERO) to kmem_cache_zalloc
The helper exists, might as well use it instead of __GFP_ZERO.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-03 19:17:56 +02:00
Dave Airlie efa27f9cec Merge tag 'drm-intel-next-2013-08-23' of git://people.freedesktop.org/~danvet/drm-intel into drm-next
Need to get my stuff out the door ;-) Highlights:
- pc8+ support from Paulo
- more vma patches from Ben.
- Kconfig option to enable preliminary support by default (Josh
  Triplett)
- Optimized cpu cache flush handling and support for write-through caching
  of display planes on Iris (Chris)
- rc6 tuning from Stéphane Marchesin for more stability
- VECS seqno wrap/semaphores fix (Ben)
- a pile of smaller cleanups and improvements all over

Note that I've ditched Ben's execbuf vma conversion for 3.12 since not yet
ready. But there's still other vma conversion stuff in here.

* tag 'drm-intel-next-2013-08-23' of git://people.freedesktop.org/~danvet/drm-intel: (62 commits)
  drm/i915: Print seqnos as unsigned in debugfs
  drm/i915: Fix context size calculation on SNB/IVB/VLV
  drm/i915: Use POSTING_READ in lcpll code
  drm/i915: enable Package C8+ by default
  drm/i915: add i915.pc8_timeout function
  drm/i915: add i915_pc8_status debugfs file
  drm/i915: allow package C8+ states on Haswell (disabled)
  drm/i915: fix SDEIMR assertion when disabling LCPLL
  drm/i915: grab force_wake when restoring LCPLL
  drm/i915: drop WaMbcDriverBootEnable workaround
  drm/i915: Cleaning up the relocate entry function
  drm/i915: merge HSW and SNB PM irq handlers
  drm/i915: fix how we mask PMIMR when adding work to the queue
  drm/i915: don't queue PM events we won't process
  drm/i915: don't disable/reenable IVB error interrupts when not needed
  drm/i915: add dev_priv->pm_irq_mask
  drm/i915: don't update GEN6_PMIMR when it's not needed
  drm/i915: wrap GEN6_PMIMR changes
  drm/i915: wrap GTIMR changes
  drm/i915: add the FCLK case to intel_ddi_get_cdclk_freq
  ...
2013-08-30 09:47:41 +10:00
Paulo Zanoni c67a470b1d drm/i915: allow package C8+ states on Haswell (disabled)
This patch allows PC8+ states on Haswell. These states can only be
reached when all the display outputs are disabled, and they allow some
more power savings.

The fact that the graphics device is allowing PC8+ doesn't mean that
the machine will actually enter PC8+: all the other devices also need
to allow PC8+.

For now this option is disabled by default. You need i915.allow_pc8=1
if you want it.

This patch adds a big comment inside i915_drv.h explaining how it
works and how it tracks things. Read it.

v2: (this is not really v2, many previous versions were already sent,
     but they had different names)
    - Use the new functions to enable/disable GTIMR and GEN6_PMIMR
    - Rename almost all variables and functions to names suggested by
      Chris
    - More WARNs on the IRQ handling code
    - Also disable PC8 when there's GPU work to do (thanks to Ben for
      the help on this), so apps can run caster
    - Enable PC8 on a delayed work function that is delayed for 5
      seconds. This makes sure we only enable PC8+ if we're really
      idle
    - Make sure we're not in PC8+ when suspending
v3: - WARN if IRQs are disabled on __wait_seqno
    - Replace some DRM_ERRORs with WARNs
    - Fix calls to restore GT and PM interrupts
    - Use intel_mark_busy instead of intel_ring_advance to disable PC8
v4: - Use the force_wake, Luke!
v5: - Remove the "IIR is not zero" WARNs
    - Move the force_wake chunk to its own patch
    - Only restore what's missing from RC6, not everything

Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-23 14:52:33 +02:00
Ben Widawsky accfef2e5a drm/i915: prepare bind_to_vm for preallocated vma
In the new execbuf code we want to track buffers using the vmas even
before they're all properly mapped. Which means that bind_to_vm needs
to deal with buffers which have preallocated vmas which aren't yet
bound.

This patch implements this prep work and adjusts our WARN/BUG checks.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Split out from Ben's big execbuf patch. Also move one BUG
back to its original place to deflate the diff a notch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-22 13:31:53 +02:00
Ben Widawsky 82a55ad1a0 drm/i915: Switch eviction code to use vmas
The execbuf wants to do relocations usings vmas, so we need a
vma->exec_list. The eviction code also uses the old obj execbuf list
for it's own book-keeping, but would really prefer to deal in vmas
only. So switch it over to the new list.

Again this is just a prep patch for the big execbuf vma conversion.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Split out from Ben's big execbuf vma patch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-22 13:31:52 +02:00
Ben Widawsky b25cb2f882 drm/i915: s/obj->exec_list/obj->obj_exec_link in debugfs
To convert the execbuf code over to use vmas natively we need to
shuffle the exec_list a bit. This patch here just prepares things with
the debugfs code, which also uses the old exec_list list_head, newly
called obj_exec_link.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Split out from Ben's big patch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-22 13:31:51 +02:00
Chris Wilson 4b6d846e9a drm/i915: Drop the overzealous warning from i915_gem_set_cache_level
By our earlier reckoning, move from a snooped/llc setting to an uncached
setting, leaves the CPU cache in a consistent state irrespective of our
domain tracking - so we can forgo the warning about the lack of
invalidation. Similarly for any writes posted to the snooped CPU domain,
we know will be safely clflushed to the uncached PTEs after forcing the
domain change.

This WARN started to pop up with

commit d46f1c3f13
Author:     Chris Wilson <chris@chris-wilson.co.uk>
AuthorDate: Thu Aug 8 14:41:06 2013 +0100

    drm/i915: Allow the GPU to cache stolen memory

Ville brought up a scenario where the interaction of a set_caching
ioctl call from userspace on a scanout buffer (i.e. obj->pin_display
is set) resulted in the code getting confused and not properly
flushing stale cpu cachelines. Luckily we already prevent this by
rejecting caching changes when obj->pin_count is set.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68040
Tested-by: cancan,feng <cancan.feng@intel.com>
[danvet: Add buglink, bisect result and explain why Ville's scenario
is already taken care of.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-22 13:31:46 +02:00
Daniel Vetter 49987099e2 drm/i915: use vma->node directly and rewrap map&fence in bind
Use () to make for neater alignment of the split lines, too. With this
we ditch another jump through the obj_gtt_size/offset indirection
maze.

Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-22 13:31:45 +02:00