Commit Graph

587 Commits

Author SHA1 Message Date
Daniel Vetter 48ecfa1090 drm/i915: properly set ppgtt cacheability on snb
For some reason snb has 2 fields to set ppgtt cacheability. This one
here does not exist on gen7.

This might explain why ppgtt wasn't a win on snb like on ivb - not
enough pte caching.

v2: Fixup rebase fail.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-17 11:19:59 +02:00
Daniel Vetter be901a5a1b drm/i915: set w/a bit for snb pagefaults
Bspec says that we need to set this: vol1c.3 "Blitter Command
Streamer", Section 1.1.2.1 "GAB_CTL_REG - GAB Unit Control Register".

We don't really rely on pagefaults, but who knows what this all
affects.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-17 11:19:56 +02:00
Daniel Vetter 767878908e Linux 3.4-rc3
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQEbBAABAgAGBQJPi3XOAAoJEHm+PkMAQRiGnsUH9RjHwH4YFVyuP/DKtKa6zs74
 wqkpT15yITQ5WWMog4JaJFFg5rJCUd8QZr7AS/HSn0ijDyZX5VU7Rcs9cMudDzNR
 H/5K/AscS4fjb0HwWVqoltTWHRb9QGSwVN3+E3VCDLt9P89YJ0o3QztkkuEX5dkZ
 jc7reVXTfRnCcILEa9jleOzrn+OLM3j/jAjQ2hGunl8EDLzD4b17HHPoli4jEZ/5
 5ibpSVsPD+AqzN+glbXvYjVItl12D0IQos/JdOwfuZriCVWLxysSSwHZTbPCyvBZ
 LHH4HR5T+XLSXbjJeNkUFHLzqU+d5gVRadIoWtJCxqxFjKbOs2YtzJ5Ai0nDiw==
 =kTkC
 -----END PGP SIGNATURE-----

Merge tag 'v3.4-rc3' into drm-intel-next-queued

Backmerge Linux 3.4-rc3 into drm-intel-next to resolve a few things
that conflict/depend upon patches in -rc3:
- Second part of the Sandybridge workaround series - it changes some
  of the same registers.
- Preparation for Chris Wilson's fencing cleanup - we need the fix
  from -rc3 merged before we can move around all that code.
- Resolve the gmbus conflict - gmbus has been disabled in 3.4 again,
  but should be enabled on all generations in 3.5.

Conflicts:
	drivers/gpu/drm/i915/intel_i2c.c

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-17 11:16:20 +02:00
Daniel Vetter c07496fa61 drm/i915: don't pwrite tiled objects through the gtt
... we will botch up the bit17 swizzling. Furthermore tiled pwrite is
a (now) unused slowpath, so no one really cares.

This fixes the last swizzling issues I have with i-g-t on my bit17
swizzling i915G. No regression, it's been broken since the dawn of
gem, but it's nice for regression tracking when really _all_ i-g-t
tests work.

Actually this is not true, Chris Wilson noticed while reviewing this
patch that the commit

commit d9e86c0ee6
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Nov 10 16:40:20 2010 +0000

    drm/i915: Pipelined fencing [infrastructure]

contained a functional change that broke things.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-15 19:37:42 +02:00
Ben Widawsky 1500f7ea06 drm/i915: hide (seqno-1) in ringbuffer code
Waiting for seqno-1 in our object synchronization code is an
implementation detail given how we've decided to do the waits within the
rest of our code.

Requested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-12 21:14:14 +02:00
Ben Widawsky e3a5a2250a drm/i915: fix for when semaphore updates fail
This fixes a long standing issue where emitting the semaphore updates
may have failed, but we've already updated our internal data structure.

Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-12 21:14:13 +02:00
Ben Widawsky 5816d648d5 drm/i915: i915_gem_object_sync must handle NULL
When I extracted the synchronization code for implementing semaphorified
pageflips (74f5f6e0), I neglected the non pipelined case which also
calls this code. The modesetting code wants to make sure the object has
finished rendering to the frame before configuring the scanout (ie.
non-pipelined case).

As a result of a follow on discussion on IRC, I've decided to add a
comment about the function itself which received much inspiration from
Chris as well. So really, this patch was ghost-written by Chris :).

Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-12 21:14:13 +02:00
Chris Wilson f84131905b drm/i915: Allow concurrent read access between CPU and GPU domain
Similar to allowing a buffer to be simultaneously read by the GPU and
through the GTT, we wish to allow readback of the pages through the CPU
domain whilst they are also being read by the GPU. Domain coherency
is managed by allowing multiple readers, but only a single writer.

This is used by mesa for its program cache which it may search for every
new program every frame and then renews should it need to add. During
renewal, mesa copies the program bo currently executing through a CPU
mapping onto the new bo. This patch allows the search and that copy to
proceed without causing a stall on the current batch.

Testcase: i-g-t/tests/gem_cpu_concurrent_blit
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-12 21:14:10 +02:00
Ben Widawsky 2911a35b2e drm/i915: use semaphores for the display plane
In theory this will have performance and power improvements. Performance
because we don't need to stall when the scanout BO is busy, and power
because we don't have to stall when the BO is busy (and the ring can
even go to sleep if the HW supports it).

v2:
squash 2 patches into 1 (me)
un-inline the enable_semaphores function (Daniel)
remove comment about SNB hangs from i915_gem_object_sync (Chris)
rename intel_enable_semaphores to i915_semaphore_is_enabled (me)
removed page flip comment; "no why" (Chris)

To address other comments from Daniel (irc):
update the comment to say 'vt-d is crap, don't enable semaphores'
  - I think you misinterpreted Chris' comment, it already exists.
checking out whether we can pageflip on the render ring on ivb (didn't
work on early silicon)
  - We don't want to enable workarounds for early silicon unless we have
    to.
  - I can't find any references in the docs about this.
optionally use it if the fb is already busy on the render ring
  - This should be how the code already worked, unless I am
    misunderstanding your meaning.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-12 21:14:05 +02:00
Chris Wilson 9a5a53b392 drm/i915: Reorganise rules for get_fence/put_fence
By simplifying the rules to calling get_fence when writing to the
through the GTT in a tiled manner, and calling put_fence before writing
to the object through the GTT in a linear manner, the code becomes
clearer and there is less chance of making a mistake.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[danvet: fixed up conflict with ppgtt code and spelling in a new
comment.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-12 21:14:04 +02:00
Dave Airlie effbc4fd8e Merge branch 'drm-intel-next' of git://people.freedesktop.org/~danvet/drm-intel into drm-core-next
Daniel Vetter wrote
First pull request for 3.5-next, slightly large than usual because new
things kept coming in since the last pull for 3.4.
Highlights:
- first batch of hw enablement for vlv (Jesse et al) and hsw (Eugeni). pci
 ids are not yet added, and there's still quite a few patches to merge
 (mostly modesetting). To make QA easier I've decided to merge this stuff
 in pieces.
- loads of cleanups and prep patches spurred by the above. Especially vlv
 is a real frankenstein chip, but also hsw is stretching our driver's
 code design. Expect more to come in this area for 3.5.
- more gmbus fixes, cleanups and improvements by Daniel Kurtz. Again,
 there are more patches needed (and some already queued up), but I wanted
 to split this a bit for better testing.
- pwrite/pread rework and retuning. This series has been in the works for
 a few months already and a lot of i-g-t tests have been created for it.
 Now it's finally ready to be merged.  Note that one patch in this series
 touches include/pagemap.h, that patch is acked-by akpm.
- reduce mappable pressure and relocation throughput improvements from
 Chris.
- mmap offset exhaustion mitigation by Chris Wilson.
- a start at figuring out which codepaths in our messy dri1/ums+gem/kms
 driver we actually need to support by bailing out of unsupported case.
 The driver now refuses to load without kms on gen6+ and disallows a few
 ioctls that userspace never used in certain cases. More of this will
 definitely come.
- More decoupling of global gtt and ppgtt.
- Improved dual-link lvds detection by Takashi Iwai.
- Shut up the compiler + plus fix the fallout (Ben)
- Inverted panel brightness handling (mostly Acer manages to break things
 in this way).
- Small fixlets and adjustements and some minor things to help debugging.

Regression-wise QA reported quite a few issues on ivb, but all of them
turned out to be hw stability issues which are already fixed in
drm-intel-fixes (QA runs the nightly regression tests on -next alone,
without -fixes automatically merged in). There's still one issue open on
snb, it looks like occlusion query writes are not quite as cache coherent
as we've expected. With some of the pwrite adjustements we can now
reliably hit this. Kernel workaround for it is in the works."

* 'drm-intel-next' of git://people.freedesktop.org/~danvet/drm-intel: (101 commits)
  drm/i915: VCS is not the last ring
  drm/i915: Add a dual link lvds quirk for MacBook Pro 8,2
  drm/i915: make quirks more verbose
  drm/i915: dump the DMA fetch addr register on pre-gen6
  drm/i915/sdvo: Include YRPB as an additional TV output type
  drm/i915: disallow gem init ioctl on ilk
  drm/i915: refuse to load on gen6+ without kms
  drm/i915: extract gt interrupt handler
  drm/i915: use render gen to switch ring irq functions
  drm/i915: rip out old HWSTAM missed irq WA for vlv
  drm/i915: open code gen6+ ring irqs
  drm/i915: ring irq cleanups
  drm/i915: add SFUSE_STRAP registers for digital port detection
  drm/i915: add WM_LINETIME registers
  drm/i915: add WRPLL clocks
  drm/i915: add LCPLL control registers
  drm/i915: add SSC offsets for SBI access
  drm/i915: add port clock selection support for HSW
  drm/i915: add S PLL control
  drm/i915: add PIXCLK_GATE register
  ...

Conflicts:
	drivers/char/agp/intel-agp.h
	drivers/char/agp/intel-gtt.c
	drivers/gpu/drm/i915/i915_debugfs.c
2012-04-12 10:27:01 +01:00
Daniel Vetter 15a13bbdff drm/i915: clear fencing tracking state when retiring requests
This fixes a resume regression introduced in

commit 7dd4906586
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Mar 21 10:48:18 2012 +0000

    drm/i915: Mark untiled BLT commands as fenced on gen2/3

which fixed fencing tracking for untiled blt commands.

A side effect of that patch was that now also untiled objects have a
non-zero obj->last_fenced_seqno to track when a fence can be set up
after a pipelined tiling change. Unfortunately this was only cleared
by the fence setup and teardown code, resulting in tons of untiled but
inactive objects with non-zero last_fenced_seqno.

Now after resume we completely reset the seqno tracking, both on the
driver side (by setting dev_priv->next_seqno = 1) and on the hw side
(by allocating a new hws page, which contains the seqnos). Hilarity
and indefinite waits ensued from the stale seqnos in
obj->last_fenced_seqno from before the suspend.

The fix is to properly clear the fencing tracking state like we
already do for the normal gpu rendering while moving objects off the
active list.

Reported-and-tested-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Jiri Slaby <jslaby@suse.cz>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-12 09:02:37 +02:00
Daniel Vetter f534bc0b22 drm/i915: disallow gem init ioctl on ilk
Ums is already disabled, but on ilk we can additionally disable gem
initialization when using user mode setting. Upstream never support
ilk without kernel modesetting and not even the RHEL ilk ums backport
needs gem - that driver is based on xf86-video-intel version 2.2,
which is pre-gem.

Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-09 18:04:08 +02:00
Chris Wilson 7dd4906586 drm/i915: Mark untiled BLT commands as fenced on gen2/3
The BLT commands on gen2/3 utilize the fence registers and so we cannot
modify any fences for the object whilst those commands are in flight.
Currently we marked tiled commands as occupying a fence, but forgot to
restrict the untiled commands from preventing a fence being assigned
before they were completed.

One side-effect is that we ten have to double check that a fence was
allocated for a fenced buffer during move-to-active.

Reported-by: Jiri Slaby <jirislaby@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43427
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47990
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Testcase: i-g-t/tests/gem_tiled_after_untiled_blt
Tested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-01 12:26:05 +02:00
Daniel Vetter 55a254ac63 drm/i915: properly restore the ppgtt page directory on resume
The ppgtt page directory lives in a snatched part of the gtt pte
range. Which naturally gets cleared on hibernate when we pull the
power. Suspend to ram (which is what I've tested) works because
despite the fact that this is a mmio region, it is actually back by
system ram.

Fix this by moving the page directory setup code to the ppgtt init
code (which gets called on resume).

This fixes hibernate on my ivb and snb.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-01 12:25:29 +02:00
Jesse Barnes 23e3f9b37e drm/i915: check for disabled interrupts on ValleyView
Haven't seen this yet, but it doesn't hurt.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-29 00:11:46 +02:00
Daniel Vetter e7e58eb5c0 drm/i915: mark pwrite/pread slowpaths with unlikely
Beside helping the compiler untangle this maze they double-up as
documentation for which parts of the code aren't performance-critical
but just around to keep old (but already dead-slow) userspace from
breaking.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:41:41 +02:00
Daniel Vetter 23c18c71da drm/i915: fixup in-line clflushing on bit17 swizzled bos
The issue is that with inline clflushing the clflushing isn't properly
swizzled. Fix this by
- always clflushing entire 128 byte chunks and
- unconditionally flush before writes when swizzling a given page.
  We could be clever and check whether we pwrite a partial 128 byte
  chunk instead of a partial cacheline, but I've figured that's not
  worth it.

Now the usual approach is to fold this into the original patch series, but
I've opted against this because
- this fixes a corner case only very old userspace relies on and
- I'd like to not invalidate all the testing the pwrite rewrite has gotten.

This fixes the regression notice by tests/gem_tiled_partial_prite_pread
from i-g-t. Unfortunately it doesn't fix the issues with partial pwrites to
tiled buffers on bit17 swizzling machines. But that is also broken without
the pwrite patches, so likely a different issue (or a problem with the
testcase).

v2: Simplify the patch by dropping the overly clever partial write
logic for swizzled pages.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:40:57 +02:00
Daniel Vetter f56f821feb mm: extend prefault helpers to fault in more than PAGE_SIZE
drm/i915 wants to read/write more than one page in its fastpath
and hence needs to prefault more than PAGE_SIZE bytes.

Add new functions in filemap.h to make that possible.

Also kill a copy&pasted spurious space in both functions while at it.

v2: As suggested by Andrew Morton, add a multipage parameter to both
functions to avoid the additional branch for the pagemap.c hotpath.
My gcc 4.6 here seems to dtrt and indeed reap these branches where not
needed.

v3: Becaus I couldn't find a way around adding a uaddr += PAGE_SIZE to
the filemap.c hotpaths (that the compiler couldn't remove again),
let's go with separate new functions for the multipage use-case.

v4: Adjust comment to CodingStlye and fix spelling.

Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:36:30 +02:00
Daniel Vetter d174bd6472 drm/i915: extract copy helpers from shmem_pread|pwrite
While moving around things, this two functions slowly grew out of any
sane bounds. So extract a few lines that do the copying and
clflushing. Also add a few comments to explain what's going on.

v2: Again do s/needs_clflush/needs_clflush_after/ in the write paths
as suggested by Chris Wilson.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:30:33 +02:00
Daniel Vetter 117babcdd5 drm/i915: use uncached writes in pwrite
It's around 20% faster.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:29:38 +02:00
Daniel Vetter ffc62976d2 drm/i915: fall back to shmem pwrite when the buffer is not accessible
It's too expensive to move it around just for that pwrite, especially
when we're trashing on the mappable gtt part like crazy.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:29:08 +02:00
Daniel Vetter 586428852a drm/i915: implement inline clflush for pwrite
In micro-benchmarking of the usual pwrite use-pattern of alternating
pwrites with gtt domain reads from the gpu, this yields around 30%
improvement of pwrite throughput across all buffers size. The trick is
that we can avoid clflush cachelines that we will overwrite completely
anyway.

Furthermore for partial pwrites it gives a proportional speedup on top
of the 30% percent because we only clflush back the part of the buffer
we're actually writing.

v2: Simplify the clflush-before-write logic, as suggested by Chris
Wilson.

v3: Finishing touches suggested by Chris Wilson:
- add comment to needs_clflush_before and only set this if the bo is
  uncached.
- s/needs_clflush/needs_clflush_after/ in the write paths to clearly
  differentiate it from needs_clflush_before.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:28:45 +02:00
Daniel Vetter 96d79b5270 drm/i915: don't clobber userspace memory before commiting to the pread
The pagemap.h prefault helpers do the prefaulting by simply writing
some data into every page. Hence we should not prefault when we're not
yet commited to to actually writing data to userspace. The problem is
now that
- we can't prefault while holding dev->struct_mutex for we could
  deadlock with our own pagefault handler
- we need to grab dev->struct_mutex before copying to sync up with any
  outsanding gpu writes.

Therefore only prefault when we're dropping the lock the first time in
the pread slowpath - at that point we're committed to the write, don't
wait on the gpu anymore and hence won't return early (with e.g.
-EINTR).

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:28:32 +02:00
Daniel Vetter 935aaa692e drm/i915: drop gtt slowpath
With the proper prefault, it's extremely unlikely that we fall back
to the gtt slowpath.

So just kill it and use the shmem_pwrite path as fallback.

To further clean up the code, move the preparatory gem calls into the
respective pwrite functions. This way the gtt_fast->shmem fallback
is much more obvious.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:27:21 +02:00
Daniel Vetter 692a576b9d drm/i915: don't call shmem_read_mapping unnecessarily
This speeds up pwrite and pread from ~120 µs ro ~100 µs for
reading/writing 1mb on my snb (if the backing storage pages
are already pinned, of course).

v2: Chris Wilson pointed out a glaring page reference bug - I've
unconditionally dropped the reference. With that fixed (and the
associated reduction of dirt in dmesg) it's now even a notch faster.

v3: Unconditionaly grab a page reference when dropping
dev->struct_mutex to simplify the code-flow.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:27:03 +02:00
Daniel Vetter 3ae5378330 drm/i915: don't use gtt_pwrite on LLC cached objects
~120 µs instead fo ~210 µs to write 1mb on my snb. I like this.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:25:45 +02:00
Daniel Vetter a0356fc373 drm/i915: kill ranged cpu read domain support
No longer needed.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:25:32 +02:00
Daniel Vetter 8489731c9b drm/i915: move clflushing into shmem_pread
This is obviously gonna slow down pread. But for a half-way realistic
micro-benchmark, it doesn't matter: Non-broken userspace reads back
data from the gpu once before the gpu again dirties it.

So all this ranged clflush tracking is just a waste of time.

No pread performance change (neglecting the dumb benchmark of
constantly reading the same data) measured.

As an added bonus, this avoids clflush on read on coherent objects.
Which means that partial preads on snb are now roughly 4x as fast.
This will be usefull for e.g. the libva encoder - when I finally get
around to fix that up.

v2: Properly sync with the gpu on LLC machines.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:20:01 +02:00
Daniel Vetter dbf7bff074 drm/i915: merge shmem_pread slow&fast-path
With the previous rewrite, they've become essential identical.

v2: Simplify the page_do_bit17_swizzling logic as suggested by Chris
Wilson.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:19:11 +02:00
Daniel Vetter e244a443bf drm/i915: merge shmem_pwrite slow&fast-path
With the previous rewrite, they've become essential identical.

v2: Simplify the page_do_bit17_swizzling logic as suggested by Chris
Wilson.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:18:58 +02:00
Chris Wilson dabdfe021a drm/i915: Avoid using mappable space for relocation processing through the CPU
We try to avoid writing the relocations through the uncached GTT, if the
buffer is currently in the CPU write domain and so will be flushed out to
main memory afterwards anyway. Also on SandyBridge we can safely write
to the pages in cacheable memory, so long as the buffer is LLC mapped.
In either of these cases, we therefore do not need to force the
reallocation of the buffer into the mappable region of the GTT, reducing
the aperture pressure.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:16:17 +02:00
Daniel Vetter 644ec02b5d drm/i915: s/i915_gem_do_init/i915_gem_init_global_gtt
... because this is what it actually doesn now that we have the global
gtt vs. ppgtt split.

Also move it to the other global gtt functions in i915_gem_gtt.c

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:14:59 +02:00
Chris Wilson a14917eeb2 drm/i915: Release the mmap offset when purging a buffer
If we discard a buffer due to memory pressure, also release its alloted
mmap address space. As it may be sometime before userspace wakes up
and notices that it has buffers to purge from its cache, we may waste
valuable address space on unusable objects for a period of time.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47738
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-23 11:04:35 +01:00
Ben Widawsky eb2c0c818a drm/i915: [dinq] shut up two instances -Wunitialized
Introduced in commit 8461d226 and 8c59967c

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: s/fix/shut up/ in the commit msg.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-22 17:44:09 +01:00
Daniel Vetter 0ebb982993 drm/i915: enable lazy global-gtt binding
Now that everything is in place, only bind to the global gtt
when actually required. Patch split-up suggested by Chris Wilson.

Reviewed-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-20 21:55:16 +01:00
Daniel Vetter 74898d7edc drm/i915: bind objects to the global gtt only when needed
And track the existence of such a binding similar to the aliasing
ppgtt case. Speeds up binding/unbinding in the common case where we
only need a ppgtt binding (which is accessed in a cpu coherent fashion
by the gpu) and no gloabl gtt binding (which needs uc writes for the
ptes).

This patch just puts the required tracking in place.

v2: Check that global gtt mappings exist in the error_state capture
code (with Chris Wilson's llc reloc patches batchbuffers are no longer
relocated as mappable in all situations, so this matters). Suggested
by Chris Wilson.

v3: Adapted to Chris' latest llc-reloc patches.

v4: Fix a bug in the i915 error state capture code noticed by Chris
Wilson.

Reviewed-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-20 21:52:01 +01:00
Daniel Vetter 741639079c drm/i915: split out dma mapping from global gtt bind/unbind functions
Note that there's a functional change buried in this patch wrt the ilk
dmar workaround: We now only idle the gpu while tearing down the dmar
mappings, not while clearing the gtt. Keeping the current semantics
would have made for some really ugly code and afaik the issue is only
with the dmar unmapping that needs a fully idle gpu.

Reviewed-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-20 21:51:41 +01:00
Chris Wilson c501ae7f33 drm/i915: Only clear the GPU domains upon a successful finish
By clearing the GPU read domains before waiting upon the buffer, we run
the risk of the wait being interrupted and the domains prematurely
cleared. The next time we attempt to wait upon the buffer (after
userspace handles the signal), we believe that the buffer is idle and so
skip the wait.

There are a number of bugs across all generations which show signs of an
overly haste reuse of active buffers.

Such as:

  https://bugs.freedesktop.org/show_bug.cgi?id=29046
  https://bugs.freedesktop.org/show_bug.cgi?id=35863
  https://bugs.freedesktop.org/show_bug.cgi?id=38952
  https://bugs.freedesktop.org/show_bug.cgi?id=40282
  https://bugs.freedesktop.org/show_bug.cgi?id=41098
  https://bugs.freedesktop.org/show_bug.cgi?id=41102
  https://bugs.freedesktop.org/show_bug.cgi?id=41284
  https://bugs.freedesktop.org/show_bug.cgi?id=42141

A couple of those pre-date i915_gem_object_finish_gpu(), so may be
unrelated (such as a wild write from a userspace command buffer), but
this does look like a convincing cause for most of those bugs.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-01 21:36:13 +01:00
Chris Wilson eadb29a9c5 drm/i915: Silence the error message from i915_wait_request()
This error message has since been superseded by the hangcheck, and does
not add any salient information beyond that already printed by hangcheck
discovering the GPU hang that lead to i915_wait_request() bombing out in
the first place.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-02-27 18:08:22 +01:00
Chris Wilson a71d8d9452 drm/i915: Record the tail at each request and use it to estimate the head
By recording the location of every request in the ringbuffer, we know
that in order to retire the request the GPU must have finished reading
it and so the GPU head is now beyond the tail of the request. We can
therefore provide a conservative estimate of where the GPU is reading
from in order to avoid having to read back the ring buffer registers
when polling for space upon starting a new write into the ringbuffer.

A secondary effect is that this allows us to convert
intel_ring_buffer_wait() to use i915_wait_request() and so consolidate
upon the single function to handle the complicated task of waiting upon
the GPU. A necessary precaution is that we need to make that wait
uninterruptible to match the existing conditions as all the callers of
intel_ring_begin() have not been audited to handle ERESTARTSYS
correctly.

By using a conservative estimate for the head, and always processing all
outstanding requests first, we prevent a race condition between using
the estimate and direct reads of I915_RING_HEAD which could result in
the value of the head going backwards, and the tail overflowing once
again. We are also careful to mark any request that we skip over in
order to free space in ring as consumed which provides a
self-consistency check.

Given sufficient abuse, such as a set of unthrottled GPU bound
cairo-traces, avoiding the use of I915_RING_HEAD gives a 10-20% boost on
Sandy Bridge (i5-2520m):
  firefox-paintball  18927ms -> 15646ms: 1.21x speedup
  firefox-fishtank   12563ms -> 11278ms: 1.11x speedup
which is a mild consolation for the performance those traces achieved from
exploiting the buggy autoreported head.

v2: Add a few more comments and make request->tail a conservative
estimate as suggested by Daniel Vetter.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: resolve conflicts with retirement defering and the lack of
the autoreport head removal (that will go in through -fixes).]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-02-15 14:26:03 +01:00
Daniel Vetter 53d227f282 drm/i915: fixup seqno allocation logic for lazy_request
Currently we reserve seqnos only when we emit the request to the ring
(by bumping dev_priv->next_seqno), but start using it much earlier for
ring->oustanding_lazy_request. When 2 threads compete for the gpu and
run on two different rings (e.g. ddx on blitter vs. compositor)
hilarity ensued, especially when we get constantly interrupted while
reserving buffers.

Breakage seems to have been introduced in

commit 6f392d5486
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Aug 7 11:01:22 2010 +0100

    drm/i915: Use a common seqno for all rings.

This patch fixes up the seqno reservation logic by moving it into
i915_gem_next_request_seqno. The ring->add_request functions now
superflously still return the new seqno through a pointer, that will
be refactored in the next patch.

Note that with this change we now unconditionally allocate a seqno,
even when ->add_request might fail because the rings are full and the
gpu died. But this does not open up a new can of worms because we can
already leave behind an outstanding_request_seqno if e.g. the caller
gets interrupted with a signal while stalling for the gpu in the
eviciton paths. And with the bugfix we only ever have one seqno
allocated per ring (and only that ring), so there are no ordering
issues with multiple outstanding seqnos on the same ring.

v2: Keep i915_gem_get_seqno (but move it to i915_gem.c) to make it
clear that we only have one seqno counter for all rings. Suggested by
Chris Wilson.

v3: As suggested by Chris Wilson use i915_gem_next_request_seqno
instead of ring->oustanding_lazy_request to make the follow-up
refactoring more clearly correct. Also improve the commit message
with issues discussed on irc.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45181
Tested-by: Nicolas Kalkhof nkalkhof()at()web.de
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-02-13 10:55:57 +01:00
Daniel Vetter 5391d0cffe drm/i915: outstanding_lazy_request is a u32
So don't assign it false, that's just confusing ... No functional
change here.

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-02-13 10:55:48 +01:00
Daniel Vetter e21af88d39 drm/i915: enable ppgtt
We want to unconditionally enable ppgtt for two reasons:
- Windows uses this on snb and later.
- We need the basic hw support to work before we can think about real
  per-process address spaces and other cool features we want.

But Chris Wilson was complaining all over irc and intel-gfx that this
will blow up if we don't have a module option to disable it. Hence add
one, to prevent this.

ppgtt support seems to slightly change the timings and make crashy
things slightly more or less crashy. Now in my testing and the testing
this got on troublesome snb machines, it seems to have improved things
only. But on ivb it makes quite a few crashes happen much more often,
see

https://bugs.freedesktop.org/show_bug.cgi?id=41353

Luckily Eugeni Dodonov seems to have a set of workarounds that fix
this issue.

v2: Don't try to enable ppgtt on pre-snb.

v3: Pimp commit message and make Chris Wilson less grumpy by adding a
module option.

v4: New try at making Chris Wilson happy.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-02-09 21:49:30 +01:00
Daniel Vetter 7bddb01fb9 drm/i915: ppgtt binding/unbinding support
This adds support to bind/unbind objects and wires it up. Objects are
only put into the ppgtt when necessary, i.e. at execbuf time.

Objects are still unconditionally put into the global gtt.

v2: Kill the quick hack and explicitly pass cache_level to ppgtt_bind
like for the global gtt function. Noticed by Chris Wilson.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-02-09 21:25:23 +01:00
Daniel Vetter 11782b0233 drm/i915: consolidate swizzling control bit frobbing
On gen5 we also need to correctly set up swizzling in the display
scanout engine, but only there. Consolidate this into the same
function.

This has a small effect on ums setups - the kernel now also sets this
bit in addition to userspace setting it. Given that this code only
runs when userspace either can't (resume, gpu reset) or explicitly
won't(gem_init) touch the hw this shouldn't have an adverse effect.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-02-08 23:18:27 +01:00
Daniel Vetter f691e2f4ce drm/i915: swizzling support for snb/ivb
We have to do this manually. Somebody had a Great Idea.

I've measured speed-ups just a few percent above the noise level
(below 5% for the best case), but no slowdows. Chris Wilson measured
quite a bit more (10-20% above the usual snb variance) on a more
recent and better tuned version of sna, but also recorded a few
slow-downs on benchmarks know for uglier amounts of snb-induced
variance.

v2: Incorporate Ben Widawsky's preliminary review comments and
elaborate a bit about the performance impact in the changelog.

v3: Add a comment as to why we don't need to check the 3rd memory
channel.

v4: Fixup whitespace.

Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-02-08 23:16:24 +01:00
Daniel Vetter 8461d22677 drm/i915: rewrite shmem_pread_slow to use copy_to_user
Like for shmem_pwrite_slow. The only difference is that because we
read data, we can leave the fetched cachelines in the cpu: In the case
that the object isn't in the cpu read domain anymore, the clflush for
the next cpu read domain invalidation will simply drop these
cachelines.

slow_shmem_bit17_copy is now ununsed, so kill it.

With this patch tests/gem_mmap_gtt now actually works.

v2: add __ to copy_to_user_swizzled as suggested by Chris Wilson.

v3: Fixup the swizzling logic, it swizzled the wrong pages.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38115
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-01-30 23:34:34 +01:00
Daniel Vetter 8c59967c44 drm/i915: rewrite shmem_pwrite_slow to use copy_from_user
... instead of get_user_pages, because that fails on non page-backed
user addresses like e.g. a gtt mapping of a bo.

To get there essentially copy the vfs read path into pagecache. We
can't call that right away because we have to take care of bit17
swizzling. To not deadlock with our own pagefault handler we need
to completely drop struct_mutex, reducing the atomicty-guarantees
of our userspace abi. Implications for racing with other gem ioctl:

- execbuf, pwrite, pread: Due to -EFAULT fallback to slow paths there's
  already the risk of the pwrite call not being atomic, no degration.
- read/write access to mmaps: already fully racy, no degration.
- set_tiling: Calling set_tiling while reading/writing is already
  pretty much undefined, now it just got a bit worse. set_tiling is
  only called by libdrm on unused/new bos, so no problem.
- set_domain: When changing to the gtt domain while copying (without any
  read/write access, e.g. for synchronization), we might leave unflushed
  data in the cpu caches. The clflush_object at the end of pwrite_slow
  takes care of this problem.
- truncating of purgeable objects: the shmem_read_mapping_page call could
  reinstate backing storage for truncated objects. The check at the end
  of pwrite_slow takes care of this.

v2:
- add missing intel_gtt_chipset_flush
- add __ to copy_from_user_swizzled as suggest by Chris Wilson.

v3: Fixup bit17 swizzling, it swizzled the wrong pages.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-01-30 23:34:21 +01:00
Daniel Vetter 5c0480f21f drm/i915: fall through pwrite_gtt_slow to the shmem slow path
The gtt_pwrite slowpath grabs the userspace memory with
get_user_pages. This will not work for non-page backed memory, like a
gtt mmapped gem object. Hence fall throuh to the shmem paths if we hit
-EFAULT in the gtt paths.

Now the shmem paths have exactly the same problem, but this way we
only need to rearrange the code in one write path.

v2: v1 accidentaly falls back to shmem pwrite for phys objects. Fixed.

v3: Make the codeflow around phys_pwrite cleara as suggested by Chris
Wilson.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-01-30 23:34:07 +01:00
Chris Wilson 068c6ff1cb drm/i915: Remove the upper limit on the bo size for mapping into the CPU domain
The original intention of comparing the bo against the mappable GTT
limits was to prevent a subsequent faulting of the bo into the GTT from
clearing the entire GTT in vain. However, that was clearly a cut'n'paste
mistake as a CPU mapping never binds the bo into the aperture. Whilst
there may be some merit to limiting the maximum size of the bo to
something that can be utilized by the GPU, that limit itself does not
belong as a safeguard to mmapping the bo, so remove the check entirely.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-01-30 17:54:35 +01:00
Daniel Vetter 39965b3766 drm/i915: don't trash the gtt when running out of fences
With the fence accounting fixed up in the previous commit not finding
enough fences is a fatal error and userspace bug. Trashing the entire
gtt is not gonna turn up that missing fence, so don't to this by
returning another error thatn ENOSPC.

This has the added benefit that it's easier to distinguish fence
accounting errors from gtt space accounting issues.

TTM serves as precendence for the EDEADLK error code - it returns it
when the reservation code needs resources already blocked by the
current reservation.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-01-29 18:24:10 +01:00
Chris Wilson 1690e1eb7a drm/i915: Separate fence pin counting from normal bind pin counting
In order to correctly account for reserving space in the GTT and fences
for a batch buffer, we need to independently track whether the fence is
pinned due to a fenced GPU access in the batch or whether the buffer is
pinned in the aperture. Currently we count the fenced as pinned if the
buffer has already been seen in the execbuffer. This leads to a false
accounting of available fence registers, causing frequent mass evictions.
Worse, if coupled with the change to make i915_gem_object_get_fence()
report EDADLK upon fence starvation, the batchbuffer can fail with only
one fence required...

Fixes intel-gpu-tools/tests/gem_fenced_exec_thrash

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38735
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: Paul Neumann <paul104x@yahoo.de>
[danvet: Resolve the functional conflict with Jesse Barnes sprite
patches, acked by Chris Wilson on irc.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-01-29 18:23:37 +01:00
Ben Widawsky b93f9cf14e drm/i915: argument to control retiring behavior
Sometimes it may be the case when we idle the gpu or wait on something
we don't actually want to process the retiring list. This patch allows
callers to choose the behavior.

Reviewed-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-01-26 11:19:19 +01:00
Eugeni Dodonov 3d29b842e5 drm/i915: add a LLC feature flag in device description
LLC is not SNB/IVB-specific, so we should check for it in a more generic
way.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-01-17 20:01:45 +01:00
Eric Anholt e959b5db4a drm/i915: Make the fallback IRQ wait not sleep.
The waits we do here are generally so short that sleeping is a bad
idea unless we have an IRQ to wake us up.  Improves regression test
performance from 18 minutes to 3.5 minutes on gen7, which is now
consistent with the previous generation.

Signed-off-by: Eric Anholt <eric@anholt.net>
Tested-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Keith Packard <keithp@keithp.com>
2012-01-03 09:31:16 -08:00
Eric Anholt 7ea29b13e5 drm/i915: Do the fallback non-IRQ wait in ring throttle, too.
As a workaround for IRQ synchronization issues in the gen7 BLT ring,
we want to turn the two wait functions into polling loops.

Signed-off-by: Eric Anholt <eric@anholt.net>
Tested-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Keith Packard <keithp@keithp.com>
2012-01-03 09:31:14 -08:00
Linus Torvalds ed4a51842a Revert "drm/i915: fix infinite recursion on unbind due to ilk vt-d w/a"
This reverts commit eb1711bb94.

It blows up the i915 seqno tracking, resulting in the

	BUG_ON(seqno == 0);

in i915_wait_request() triggering, which will cause lock-ups.

See for example
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/903010
  https://lkml.org/lkml/2011/12/14/395

Reported-requested-and-tested-by: Dirk Hohndel <dirk@hohndel.org>
Reported-by: Richard Eames <Richard.Eames@flinders.edu.au>
Reported-by: Rocko Requin <rockorequin@hotmail.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Keith Packard <keithp@keithp.com>
Cc: Eric Anholt <eric@anholt.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-16 12:58:39 -08:00
Daniel Vetter eb1711bb94 drm/i915: fix infinite recursion on unbind due to ilk vt-d w/a
The recursion loop goes retire_requests->unbind->gpu_idle->retire_reqeusts.

Every time we go through this we need a
- active object that can be retired
- and there are no other references to that object than the one from
  the active list, so that it gets unbound and freed immediately.
Otherwise the recursion stops. So the recursion is only limited by the
number of objects that fit these requirements sitting in the active list
any time retire_request is called.

Issue exercised by tests/gem_unref_active_buffers from i-g-t.

There's been a decent bikeshed discussion whether it wouldn't be
better to pass around a flag, but imo this is o.k. for such a limited
case that only supports a w/a.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42180

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson>
[ickle- we built better bikesheds, but this keeps the rain off for now]
Tested-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-07 10:44:40 +00:00
Rakib Mullick 457eafce61 drm, i915: Fix memory leak in i915_gem_busy_ioctl().
A call to i915_add_request() has been made in function i915_gem_busy_ioctl(). i915_add_request can fail,
so in it's exit path previously allocated memory needs to be freed.

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Reviewed-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-11-17 12:57:45 -08:00
Jesse Barnes 680da876f4 drm/i915: enable cacheable objects on Ivybridge
IVB supports these bits as well.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-11-03 16:17:57 -07:00
Daniel Vetter 4b9de737fa drm/i915: add constants to size fence arrays and fields
In preparation of to support 32 fences on Ivybdrigde.

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-11-03 09:20:37 -07:00
Eric Anholt ff56b0bc84 drm/i915: Fix object refcount leak on mmappable size limit error path.
I've been seeing memory leaks on my system in the form of large
(300-400MB) GEM objects created by now-dead processes laying around
clogging up memory.  I usually notice when it gets to about 1.2GB of
them.  Hopefully this clears up the issue, but I just found this bug
by inspection.

Signed-off-by: Eric Anholt <eric@anholt.net>
Cc: stable@kernel.org
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-11-01 09:15:17 -07:00
Ben Widawsky f372b85463 drm/i915: Remove early exit on i915_gpu_idle
[Description from: Daniel Vetter]
I've just discussed this quickly with Chris on irc and it's probably
best to just kill the list_empty early bailout. gpu_idle isn't a
fastpath, so who cares. One candidate where we emit commands to the ring
without adding anything onto these lists is e.g. pageflip. There are
probably more.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-10-20 15:26:38 -07:00
Daniel Vetter 130c2561de drm/i915: drop KM_USER0 argument to k(un)map_atomic
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-10-20 15:26:37 -07:00
Chris Wilson 8ffc024681 drm/i915: Defend against userspace creating a gem object with size==0
We currently only round up the userspace size to the next page. We
assume that userspace hasn't made a mistake and requested a zero-length
gem object and all through our internal code we then presume that every
object is backed by at least a single page. Fix that oversight and
report EINVAL back to userspace if they try to create a zero length
object.

[danvet: This fixes tests/gem_bad_length]

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-10-20 14:11:19 -07:00
Daniel Vetter 6dacfd2faa drm/i915: simplify swapin/out swizzle checking a bit
Use the helper function already employed by the pwrite/pread
functions.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-10-20 14:11:18 -07:00
Dave Airlie 88ef4e3f4f Merge branch 'drm-intel-next' of git://people.freedesktop.org/~keithp/linux into drm-next
* 'drm-intel-next' of git://people.freedesktop.org/~keithp/linux:
  Drivers: i915: Fix all space related issues.
2011-09-20 09:36:22 +01:00
Akshay Joshi 0206e353a0 Drivers: i915: Fix all space related issues.
Various issues involved with the space character were generating
warnings in the checkpatch.pl file. This patch removes most of those
warnings.

Signed-off-by: Akshay Joshi <me@akshayjoshi.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-09-19 18:01:47 -07:00
Rob Clark b464e9a25c drm/i915: use common functions for mmap offset creation
Signed-off-by: Rob Clark <rob@ti.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-08-30 11:07:00 +01:00
Keith Packard df7976797f Merge branch 'drm-intel-fixes' into drm-intel-next 2011-07-22 13:40:42 -07:00
Keith Packard f0b69efc29 drm/i915: Skip GPU wait for scanout pin while wedged
Failing to pin a scanout buffer will most likely lead to a black
screen, so if the GPU is wedged, then just let the pin happen and hope
that things work out OK.

v2: Just ignore any error from i915_gem_object_wait_rendering, as
suggested by Chris Wilson

Signed-off-by: Keith Packard <keithp@keithp.com>
2011-07-21 20:18:31 -07:00
Chris Wilson e28f871165 drm/i915: Fix unfenced alignment on pre-G33 hardware
Align unfenced buffers on older hardware to the power-of-two object
size.  The docs suggest that it should be possible to align only to a
power-of-two tile height, but using the already computed fence size is
easier and always correct. We also have to make sure that we unbind
misaligned buffers upon tiling changes.

In order to prevent a repetition of this bug, we change the interface
to the alignment computation routines to force the caller to provide
the requested alignment and size of the GTT binding rather than assume
the current values on the object.

Reported-and-tested-by: Sitosfe Wheeler <sitsofe@yahoo.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36326
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-07-18 14:02:06 -07:00
Keith Packard 8eb2c0ee67 Merge branch 'drm-intel-fixes' into drm-intel-next 2011-06-29 10:34:54 -07:00
Ben Widawsky 3e0dc6b01f drm/i915: hangcheck disable parameter
Provide a parameter to disable hanghcheck. This is useful mostly for
developers trying to debug known problems, and probably should not be
touched by normal users.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-06-29 10:32:08 -07:00
Linus Torvalds 0d72c6fcb5 Merge branch 'drm-intel-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/keithp/linux-2.6
* 'drm-intel-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/keithp/linux-2.6:
  drm/i915: Use chipset-specific irq installers
  drm/i915: forcewake fix after reset
  drm/i915: add Ivy Bridge page flip support
  drm/i915: split page flip queueing into per-chipset functions
2011-06-28 11:15:57 -07:00
Keith Packard 6ae77e6b6a Merge branch 'drm-intel-fixes' into drm-intel-next 2011-06-28 10:29:47 -07:00
Chris Wilson f01c22fd59 drm/i915: Use chipset-specific irq installers
Konstantin Belousov pointed out that 4697995b98 replaced the generic
i915_driver_irq_*install() functions with chipset specific routines
accessible only through driver->irq_*install(). So update the sanity
check in i915_request_wait() to match.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-06-28 10:20:06 -07:00
Hugh Dickins e2377fe0b6 drm/i915: use shmem_truncate_range
The interface to ->truncate_range is changing very slightly: once "tmpfs:
take control of its truncate_range" has been applied, this can be applied.
 For now there is only a slight inefficiency while this remains unapplied,
but it will soon become essential for managing shmem's use of swap.

Change i915_gem_object_truncate() to use shmem_truncate_range() directly:
which should also spare i915 later change if we switch from
inode_operations->truncate_range to file_operations->fallocate.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Keith Packard <keithp@keithp.com>
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-27 18:00:14 -07:00
Hugh Dickins 5949eac4d9 drm/i915: use shmem_read_mapping_page
Soon tmpfs will stop supporting ->readpage and read_cache_page_gfp(): once
"tmpfs: add shmem_read_mapping_page_gfp" has been applied, this patch can
be applied to ease the transition.

Make i915_gem_object_get_pages_gtt() use shmem_read_mapping_page_gfp() in
the one place it's needed; elsewhere use shmem_read_mapping_page(), with
the mapping's gfp_mask properly initialized.

Forget about __GFP_COLD: since tmpfs initializes its pages with memset,
asking for a cold page is counter-productive.

Include linux/shmem_fs.h also in drm_gem.c: with shmem_file_setup() now
declared there too, we shall remove the prototype from linux/mm.h later.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Keith Packard <keithp@keithp.com>
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-27 18:00:13 -07:00
Keith Packard b97c3d9c16 drm/i915: i915_gem_object_finish_gtt must always release gtt mmap
Even if the object is no longer in the GTT domain, there may still be
a user space mapping which needs to be released.

Without this fix, render-based text (mostly in firefox) would
occasionally get corrupted when the system was under load.

Signed-off-by: Keith Packard <keithp@keithp.com>
2011-06-24 21:02:59 -07:00
Keith Packard 2cd1176bd9 Merge branch 'drm-intel-fixes' into drm-intel-next 2011-06-21 12:02:57 -07:00
Eric Anholt e92d03bff9 Revert "drm/i915: Kill GTT mappings when moving from GTT domain"
This reverts commit 4a684a4117.
Userland has always been required to set the object's domain to GTT
before using it through a GTT mapping, it's not something that the
kernel is supposed to enforce.  (The pagefault support is so that we
can handle multiple mappings without userland having to pin across
them, not so that userland can use GTT after GPU domains without
telling the kernel).

Fixes 19.2% +/- 0.8% (n=6) performance regression in cairo-gl
firefox-talos-gfx on my T420 latop.

Signed-off-by: Keith Packard <keithp@keithp.com>
2011-06-21 11:11:02 -07:00
Jesper Juhl b65552f06c drm/i915: Don't leak in i915_gem_shmem_pread_slow()
It seems to me that we are leaking 'user_pages' in
drivers/gpu/drm/i915/i915_gem.c::i915_gem_shmem_pread_slow() if
read_cache_page_gfp() fails.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-06-14 11:00:54 +10:00
Eric Anholt a187111207 drm/i915: Use the LLC mode on gen6 for everything but display.
Improves full-screen openarena on my laptop 20.3% +/- 4.0% (n=3)
Improves 800x600 nexuiz on my laptop 12.3% +/- 0.1% (n=3)

We have more room to improve with doing LLC caching for display using
GFDT, and in doing LLC+MLC caching, but this was an easy performance
win and incremental improvement toward those two.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2011-06-09 21:51:22 -07:00
Eric Anholt a7ef0640d9 drm/i915: Use the uncached domain for the display planes
The simplest and common method for ensuring scanout coherency on all
chipsets is to mark the scanout buffers as uncached (and for
userspace to remember to flush the render cache every so often).

We can improve upon this for later generations by marking scanout
objects as GFDT and only flush those cachelines when required. However,
we start simple.

[v2: Move the set to uncached above the clflush.  Otherwise, we'd skip
the clflush and try to scan out data that was still sitting in the
cache.]

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2011-06-09 21:51:20 -07:00
Chris Wilson 2da3b9b940 drm/i915: Combine pinning with setting to the display plane
We need to perform a few operations in order to move the object into the
display plane (where it can be accessed coherently by the display
engine) that are important for future safety to forbid whilst pinned. As a
result, we want to need to perform some of the operations before pinning,
but some are required once we have been bound into the GTT. So combine
the pinning performed by all the callers with set_to_display_plane(), so
this complication is contained within the single function.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2011-06-09 21:51:19 -07:00
Chris Wilson e4ffd173a1 drm/i915: Add an interface to dynamically change the cache level
[anholt v2: Don't forget that when going from cached to uncached, we
haven't been tracking the write domain from the CPU perspective, since
we haven't needed it for GPU coherency.]

[ickle v3: We also need to make sure we relinquish any fences on older
chipsets and clear the GTT for sane domain tracking.]

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2011-06-09 21:51:16 -07:00
Chris Wilson b5ffc9bc38 drm/i915: Introduce i915_gem_object_finish_gtt()
Like its siblings finish_gpu(), this function clears the object from the
GTT domain forcing it to be trigger a domain invalidation should we ever
need to use via the GTT again.

Note that the most important side-effect of finishing the GTT domain
(aside from clearing the tracking read/write domains) is that it imposes
an memory barrier so that all accesses are complete before it returns,
which is important if you intend to be modifying translation tables
shortly afterwards. The second most important side-effect is that it
tears down the GTT mappings forcing a page-fault and invalidation on
next user access to the object.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2011-06-09 21:51:14 -07:00
Chris Wilson a8198eea15 drm/i915: Introduce i915_gem_object_finish_gpu()
... reincarnated from i915_gem_object_flush_gpu(). The semantic
difference is that after calling finish_gpu() the object no longer
resides in any GPU domain, and so will cause the GPU caches to be
invalidated if it is ever used again.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2011-06-09 11:43:47 -07:00
Daniel Vetter c8ebc2b076 drm/915: fix relaxed tiling on gen2: tile height
A tile on gen2 has a size of 2kb, stride of 128 bytes and 16 rows.

Userspace was broken and assumed 8 rows. Chris Wilson noted that the
kernel unfortunately can't reliable check that because libdrm rounds
up the size to the next bucket.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-06-04 10:41:12 -07:00
Chris Wilson c8cbbb8ba9 drm/i915: s/addr & ~PAGE_MASK/offset_in_page(addr)/
Convert our open coded offset_in_page() to the common macro.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-06-04 10:40:42 -07:00
Ying Han 1495f230fa vmscan: change shrinker API by passing shrink_control struct
Change each shrinker's API by consolidating the existing parameters into
shrink_control struct.  This will simplify any further features added w/o
touching each file of shrinker.

[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: fix warning]
[kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API]
[akpm@linux-foundation.org: fix xfs warning]
[akpm@linux-foundation.org: update gfs2]
Signed-off-by: Ying Han <yinghan@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:39:26 -07:00
Eric Anholt 25aebfc30b drm/i915: Add support for fence registers on Ivybridge.
The registers are the same as on Sandybridge.  Fixes scrambled display
in X when it does software drawing to the GTT, and scans the results
out as tiled.

Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-05-13 18:12:51 -07:00
Eric Anholt 10ed13e4a5 drm/i915: Use existing function instead of open-coding fence reg clear.
This is once less place to miss a new INTEL_INFO(dev)->gen update now.

Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-05-13 18:12:50 -07:00
Chris Wilson 9c23f7fc4c drm/i915: Do not clflush snooped objects
Rely on the GPU snooping into the CPU cache for appropriately bound
objects on MI_FLUSH. Or perhaps one day we will have a cache-coherent
CPU/GPU package...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-05-10 13:56:44 -07:00
Chris Wilson 93dfb40cd8 drm/i915: Rename agp_type to cache_level
... to clarify just how we use it inside the driver and remove the
confusion of the poorly matching agp_type names. We still need to
translate through agp_type for interface into the fake AGP driver.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-05-10 13:56:43 -07:00
Chris Wilson f6e47884e7 drm/i915: Avoid unmapping pages from a NULL address space
Found by gem_stress.

As we perform retirement from a workqueue, it is possible for us to free
and unbind objects after the last close on the device, and so after the
address space has been torn down and reset to NULL:

BUG: unable to handle kernel NULL pointer dereference at 00000054
IP: [<c1295a20>] mutex_lock+0xf/0x27
*pde = 00000000
Oops: 0002 [#1] SMP
last sysfs file: /sys/module/vt/parameters/default_utf8

Pid: 5, comm: kworker/u:0 Not tainted 2.6.38+ #214
EIP: 0060:[<c1295a20>] EFLAGS: 00010206 CPU: 1
EIP is at mutex_lock+0xf/0x27
EAX: 00000054 EBX: 00000054 ECX: 00000000 EDX: 00012fff
ESI: 00000028 EDI: 00000000 EBP: f706fe20 ESP: f706fe18
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process kworker/u:0 (pid: 5, ti=f706e000 task=f7060d00 task.ti=f706e000)
Stack:
 f5aa3c60 00000000 f706fe74 c107e7df 00000246 dea55380 00000054 f5aa3c60
 f706fe44 00000061 f70b4000 c13fff84 00000008 f706fe54 00000000 00000000
 00012f00 00012fff 00000028 c109e575 f6b36700 00100000 00000000 f706fe90
Call Trace:
 [<c107e7df>] unmap_mapping_range+0x7d/0x1e6
 [<c109e575>] ? mntput_no_expire+0x52/0xb6
 [<c11c12f6>] i915_gem_release_mmap+0x49/0x58
 [<c11c3449>] i915_gem_object_unbind+0x4c/0x125
 [<c11c353f>] i915_gem_free_object_tail+0x1d/0xdb
 [<c11c55a2>] i915_gem_free_object+0x3d/0x41
 [<c11a6be2>] ? drm_gem_object_free+0x0/0x27
 [<c11a6c07>] drm_gem_object_free+0x25/0x27
 [<c113c3ca>] kref_put+0x39/0x42
 [<c11c0a59>] drm_gem_object_unreference+0x16/0x18
 [<c11c0b15>] i915_gem_object_move_to_inactive+0xba/0xbe
 [<c11c0c87>] i915_gem_retire_requests_ring+0x16e/0x1a5
 [<c11c3645>] i915_gem_retire_requests+0x48/0x63
 [<c11c36ac>] i915_gem_retire_work_handler+0x4c/0x117
 [<c10385d1>] process_one_work+0x140/0x21b
 [<c103734c>] ? __need_more_worker+0x13/0x2a
 [<c10373b1>] ? need_to_create_worker+0x1c/0x35
 [<c11c3660>] ? i915_gem_retire_work_handler+0x0/0x117
 [<c1038faf>] worker_thread+0xd4/0x14b
 [<c1038edb>] ? worker_thread+0x0/0x14b
 [<c103be1b>] kthread+0x68/0x6d
 [<c103bdb3>] ? kthread+0x0/0x6d
 [<c12970f6>] kernel_thread_helper+0x6/0x10
Code: 00 e8 98 fe ff ff 5d c3 55 89 e5 3e 8d 74 26 00 ba 01 00 00 00 e8
84 fe ff ff 5d c3 55 89 e5 53 8d 64 24 fc 3e 8d 74 26 00 89 c3 <f0> ff
08 79 05 e8 ab ff ff ff 89 e0 25 00 e0 ff ff 89 43 10 58
EIP: [<c1295a20>] mutex_lock+0xf/0x27 SS:ESP 0068:f706fe18
CR2: 0000000000000054

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Keith Packard <keithp@keithp.com>
2011-03-23 09:17:03 +00:00
Chris Wilson 26e12f8943 drm/i915: Fix use after free within tracepoint
Detected by scripts/coccinelle/free/kfree.cocci.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Keith Packard <keithp@keithp.com>
2011-03-23 09:17:02 +00:00
Chris Wilson 36d527dead drm/i915: Restore missing command flush before interrupt on BLT ring
We always skipped flushing the BLT ring if the request flush did not
include the RENDER domain. However, this neglects that we try to flush
the COMMAND domain after every batch and before the breadcrumb interrupt
(to make sure the batch is indeed completed prior to the interrupt
firing and so insuring CPU coherency). As a result of the missing flush,
incoherency did indeed creep in, most notable when using lots of command
buffers and so potentially rewritting an active command buffer (i.e.
the GPU was still executing from it even though the following interrupt
had already fired and the request/buffer retired).

As all ring->flush routines now have the same preconditions, de-duplicate
and move those checks up into i915_gem_flush_ring().

Fixes gem_linear_blit.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=35284
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: mengmeng.meng@intel.com
2011-03-23 09:17:01 +00:00