Yet another remnant ... this might explain why l3 remapping didn't
really work on HSW.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=57441
Spotted-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As explained by Chris Wilson gem objects in stolen memory are always
coherent with the GPU so we don't need to ever flush the CPU caches for
these.
This fixes a breakage - at least with the compact sg patches applied -
during the resume/restore gtt mappings path, when we tried to clflush an
FB object in stolen memory, but since stolen objects don't have backing
pages we passed an invalid page pointer to drm_clflush_page().
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The ring initialization will differ a bit in upcoming generations, and
this split will prepare the code for what's needed.
This patch also fixes a bug introduced in:
commit 9943393195
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date: Tue Jan 22 14:12:17 2013 +0200
drm/i915: use gem_set_seqno() on hardware init
After doing the extraction, the bad error handling became obvious. I
acknowledge that this should be two patches, but it's a pretty
small/trivial patch. If requested, I can certainly do the fix as a
distinct patch.
v2: Should be cleanup blt, not init blt on failure (Chris)
v3: Forgot to git add on v2
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Daniel writes:
"Probably the last feature pull for 3.9, there's some fixes outstanding
thought that I'd like to sneak in. And maybe 3.8 takes a bit longer ...
Anyway, highlights of this pull:
- Kill the horrible IS_DISPLAYREG hack to handle the mmio offset movements
on vlv, big thanks to Ville.
- Dynamic power well support for Haswell, shaves away a bit when only
using the eDP port on pipe A (Paulo). Plus unclaimed register fixes
uncovered by this.
- Clarifications of the gpu hang/reset state transitions, hopefully fixing
a few spurious -EIO deaths in userspace.
- Haswell ELD fixes.
- Some more (pp)gtt cleanups from Ben.
- A few smaller things all over.
Plus all the stuff from the previous rather small pull request:
- Broadcast RBG improvements and reduced color range fixes from Ville.
- Ben is on a "kill legacy gtt code for good" spree, first pile of patches
included.
- No-relocs and bo lut improvements for faster execbuf from Chris.
- Some refactorings from Imre."
* tag 'drm-intel-next-2013-02-01' of git://people.freedesktop.org/~danvet/drm-intel: (101 commits)
GPU/i915: Fix acpi_bus_get_device() check in drivers/gpu/drm/i915/intel_opregion.c
drm/i915: Set the SR01 "screen off" bit in i915_redisable_vga() too
drm/i915: Kill IS_DISPLAYREG()
drm/i915: Introduce i915_vgacntrl_reg()
drm/i915: gen6_gmch_remove can be static
drm/i915: dynamic Haswell display power well support
drm/i915: check the power down well on assert_pipe()
drm/i915: don't send DP "idle" pattern before "normal" on HSW PORT_A
drm/i915: don't run hsw power well code on !hsw
drm/i915: kill cargo-culted locking from power well code
drm/i915: Only run idle processing from i915_gem_retire_requests_worker
drm/i915: Fix CAGF for HSW
drm/i915: Reclaim GTT space for failed PPGTT
drm/i915: remove intel_gtt structure
drm/i915: Add probe and remove to the gtt ops
drm/i915: extract hw ppgtt setup/cleanup code
drm/i915: pte_encode is gen6+
drm/i915: vfuncs for ppgtt
drm/i915: vfuncs for gtt_clear_range/insert_entries
drm/i915: Error state should print /sys/kernel/debug
...
When adding the fb idle detection to mark-inactive, it was forgotten
that userspace can drive the processing of retire-requests. We assumed
that it would be principally driven by the retire requests worker,
running once every second whilst active and so we would get the deferred
timer for free. Instead we spend too many CPU cycles reclocking the LVDS
preventing real work from being done.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reported-and-tested-by: Alexander Lam <lambchop468@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=58843
Cc: stable@vger.kernel.org
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When machine was rebooted or module was reloaded,
gem_hw_init() set last_seqno to be identical to next_seqno.
This lead to situation that waits for first ever request
always passed immediately regardless if it was actually
executed.
Use gem_set_seqno() to be consistent how hw is
initialized on init, wrap and on resume.
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With the previous patch the state transition handling of the reset
code itself is now (hopefully) race free and solid. But that still
leaves out everyone else - with the various lock-free wait paths
we have there's the possibility that the reset happens between the
point where we read the seqno we should wait on and the actual wait.
And if __wait_seqno then never sees the RESET_IN_PROGRESS state, we'll
happily wait for a seqno which will in all likelyhood never signal.
In practice this is not a big problem since the X server gets
constantly interrupted, and can then submit more work (hopefully) to
unblock everyone else: As soon as a new seqno write lands, all waiters
will unblock. But running the i-g-t reset testcase ZZ_hangman can
expose this race, especially on slower hw with fewer cpu cores.
Now looking forward to ARB_robustness and friends that's not the best
possible behaviour, hence this patch adds a reset_counter to be able
to detect any reset, even if a given thread never observed the
in-progress state.
The important part is to correctly order things:
- The write side needs to increment the counter after any seqno gets
reset. Hence we need to do that at the end of the reset work, and
again wake everyone up. We also need to place a barrier in between
any possible seqno changes and the counter increment, since any
unlock operations only guarantee that nothing leaks out, but not
that at later load operation gets moved ahead.
- On the read side we need to ensure that no reset can sneak in and
invalidate the seqno. In all cases we can use the one-sided barrier
that unlock operations guarantee (of the lock protecting the
respective seqno/ring pair) to ensure correct ordering. Hence it is
sufficient to place the atomic read before the mutex/spin_unlock and
no additional barriers are required.
The end-result of all this is that we need to wake up everyone twice
in a reset operation:
- First, before the reset starts, to get any lockholders of the locks,
so that the reset can proceed.
- Second, after the reset is completed, to allow waiters to properly
and reliably detect the reset condition and bail out.
I admit that this entire reset_counter thing smells a bit like
overkill, but I think it's justified since it makes it really explicit
what the bail-out condition is. And we need a reset counter anyway to
implement ARB_robustness, and imo with finer-grained locking on the
horizont this is the most resilient scheme I could think of.
v2: Drop spurious change in the wait_for_error EXIT_COND - we only
need to wait until we leave the reset-in-progress wedged state.
v3: Don't play tricks with barriers in the throttle ioctl, the
spin_unlock is barrier enough.
I've also considered using a little helper to grab the current
reset_counter, but then decided that hiding the atomic_read isn't a
great idea, since having it explicitly show up in the code is a nice
remainder to reviews to check the memory barriers.
v4: Add a comment to explain why we need to fall through in
__wait_seqno in the end variable assignments.
v5: Review from Damien:
- s/smb/smp/ in a comment
- don't increment the reset counter after we've set it to WEDGED. Now
we (again) properly wedge the gpu when the reset fails.
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The aim of this locking rework is that ioctls which a compositor should be
might call for every frame (set_cursor, page_flip, addfb, rmfb and
getfb/create_handle) should not be able to block on kms background
activities like output detection. And since each EDID read takes about
25ms (in the best case), that always means we'll drop at least one frame.
The solution is to add per-crtc locking for these ioctls, and restrict
background activities to only use the global lock. Change-the-world type
of events (modeset, dpms, ...) need to grab all locks.
Two tricky parts arose in the conversion:
- A lot of current code assumes that a kms fb object can't disappear while
holding the global lock, since the current code serializes fb
destruction with it. Hence proper lifetime management using the already
created refcounting for fbs need to be instantiated for all ioctls and
interfaces/users.
- The rmfb ioctl removes the to-be-deleted fb from all active users. But
unconditionally taking the global kms lock to do so introduces an
unacceptable potential stall point. And obviously changing the userspace
abi isn't on the table, either. Hence this conversion opportunistically
checks whether the rmfb ioctl holds the very last reference, which
guarantees that the fb isn't in active use on any crtc or plane (thanks
to the conversion to the new lifetime rules using proper refcounting).
Only if this is not the case will the code go through the slowpath and
grab all modeset locks. Sane compositors will never hit this path and so
avoid the stall, but userspace relying on these semantics will also not
break.
All these cases are exercised by the newly added subtests for the i-g-t
kms_flip, tested on a machine where a full detect cycle takes around 100
ms. It works, and no frames are dropped any more with these patches
applied. kms_flip also contains a special case to exercise the
above-describe rmfb slowpath.
* 'drm-kms-locking' of git://people.freedesktop.org/~danvet/drm-intel: (335 commits)
drm/fb_helper: check whether fbcon is bound
drm/doc: updates for new framebuffer lifetime rules
drm: don't hold crtc mutexes for connector ->detect callbacks
drm: only grab the crtc lock for pageflips
drm: optimize drm_framebuffer_remove
drm/vmwgfx: add proper framebuffer refcounting
drm/i915: dump refcount into framebuffer debugfs file
drm: refcounting for crtc framebuffers
drm: refcounting for sprite framebuffers
drm: fb refcounting for dirtyfb_ioctl
drm: don't take modeset locks in getfb ioctl
drm: push modeset_lock_all into ->fb_create driver callbacks
drm: nest modeset locks within fpriv->fbs_lock
drm: reference framebuffers which are on the idr
drm: revamp framebuffer cleanup interfaces
drm: create drm_framebuffer_lookup
drm: revamp locking around fb creation/destruction
drm: only take the crtc lock for ->cursor_move
drm: only take the crtc lock for ->cursor_set
drm: add per-crtc locks
...
Now that we seem to have brought order to the GTT barriers, the last one
to review is the terminal barrier before we unbind the buffer from the
GTT. This needs to only be performed if the buffer still resides in the
GTT domain, and so we can skip some needless barriers otherwise.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With a fence, we only need to insert a memory barrier around the actual
fence alteration for CPU accesses through the GTT. Performing the
barrier in flush-fence was inserting unnecessary and expensive barriers
for never fenced objects.
Note removing the barriers from flush-fence, which was effectively a
barrier before every direct access through the GTT, revealed that we
where missing a barrier before the first access through the GTT. Lack of
that barrier was sufficient to cause GPU hangs.
v2: Add a couple more comments to explain the new barriers
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We have two important transitions of the wedged state in the current
code:
- 0 -> 1: This means a hang has been detected, and signals to everyone
that they please get of any locks, so that the reset work item can
do its job.
- 1 -> 0: The reset handler has completed.
Now the last transition mixes up two states: "Reset completed and
successful" and "Reset failed". To distinguish these two we do some
tricks with the reset completion, but I simply could not convince
myself that this doesn't race under odd circumstances.
Hence split this up, and add a new terminal state indicating that the
hw is gone for good.
Also add explicit #defines for both states, update comments.
v2: Split out the reset handling bugfix for the throttle ioctl.
v3: s/tmp/wedged/ sugested by Chris Wilson. Also fixup up a rebase
error which prevented this patch from actually compiling.
v4: To unify the wedged state with the reset counter, keep the
reset-in-progress state just as a flag. The terminally-wedged state is
now denoted with a big number.
v5: Add a comment to the reset_counter special values explaining that
WEDGED & RESET_IN_PROGRESS needs to be true for the code to be
correct.
v6: Fixup logic errors introduced with the wedged+reset_counter
unification. Since WEDGED implies reset-in-progress (in a way we're
terminally stuck in the dead-but-reset-not-completed state), we need
ensure that we check for this everywhere. The specific bug was in
wait_for_error, which would simply have timed out.
v7: Extract an inline i915_reset_in_progress helper to make the code
more readable. Also annote the reset-in-progress case with an
unlikely, to help the compiler optimize the fastpath. Do the same for
the terminally wedged case with i915_terminally_wedged.
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
While auditing the code I've noticed one place (the throttle ioctl)
which does not yet wait for the reset handler to complete and doesn't
properly decode the wedge state into -EAGAIN/-EIO. Fix this up by
calling the right helpers. This might explain the oddball "my
compositor just died in a successfull gpu reset" reports. Or maybe not, since
current mesa doesn't use this ioctl to throttle command submission.
The throttle ioctl doesn't take the struct_mutex, so to avoid busy-looping
with -EAGAIN while a reset is in process, check for errors first and wait
for the handler to complete if a reset is pending by calling
i915_gem_wait_for_error.
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
And to make Ben Widawsky happier, use the gpu_error instead of
the entire device as the argument in some functions.
Drop the outdated comment on ->wedged for now, a follow-up patch will
change the semantics and add a proper comment again.
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This has been sprinkled all over the place in dev_priv. I think
it'd be good to also move all the code into a separate file like
i915_gem_error.c, but that's for another patch.
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Mappable_end, ie. size is almost always what you want as opposed to the
number of entries. Since we already have that information, we can scrap
the number of entries and only calculate it when needed.
If gtt_start is !0, this will have slightly different behavior. This
difference can only occur in DRI1, and exists when we try to kick out
the firmware fb. The new code seems like a bugfix to me.
The other case where we've changed the behavior is during init we check
the mappable region against our current known upper and lower limits
(64MB, and 512MB). This now matches the comment, and makes things more
convenient after removing gtt_mappable_entries.
Also worth noting is the setting of mappable_end is taken out of setup
because we do it earlier now in the DRI2 case and therefore need to add
that tiny hunk to support the DRI1 IOCTL.
v2: Move up mappable end to before legacy AGP init
v3: Add the dev_priv inclusion here from previous rebase error in patch
5
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> (v2)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: squash in fix for a printk format flag mismatch warning.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The purpose of the gtt structure is to help isolate our gtt specific
properties from the rest of the code (in doing so it help us finish the
isolation from the AGP connection).
The following members are pulled out (and renamed):
gtt_start
gtt_total
gtt_mappable_end
gtt_mappable
gtt_base_addr
gsm
The gtt structure will serve as a nice place to put gen specific gtt
routines in upcoming patches. As far as what else I feel belongs in this
structure: it is meant to encapsulate the GTT's physical properties.
This is why I've not added fields which track various drm_mm properties,
or things like gtt_mtrr (which is itself a pretty transient field).
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
[Ben modified commit messages]
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Move the existing checking inside bind_to_gtt() to the more appropriate
layer in order to prevent recreation of the pages after they have been
explicitly truncated.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As a means to investigate some bad system behaviour related to the
purging of the active, inactive and unbound lists, it is useful to be
able to manually control when those lists should be cleared.
v2: use _safe list iterators as we kick objects from the list as we
walk.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Add a small comment explaining why we don't need to check and
wait for gpu resets, acked by Chris on irc.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The two functions are rather similar, so merge them.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The two functions are rather similar, so merge them.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Daniel writes:
- seqno wrap fixes and debug infrastructure from Mika Kuoppala and Chris
Wilson
- some leftover kill-agp on gen6+ patches from Ben
- hotplug improvements from Damien
- clear fb when allocated from stolen, avoids dirt on the fbcon (Chris)
- Stolen mem support from Chris Wilson, one of the many steps to get to
real fastboot support.
- Some DDI code cleanups from Paulo.
- Some refactorings around lvds and dp code.
- some random little bits&pieces
* tag 'drm-intel-next-2012-12-21' of git://people.freedesktop.org/~danvet/drm-intel: (93 commits)
drm/i915: Return the real error code from intel_set_mode()
drm/i915: Make GSM void
drm/i915: Move GSM mapping into dev_priv
drm/i915: Move even more gtt code to i915_gem_gtt
drm/i915: Make next_seqno debugs entry to use i915_gem_set_seqno
drm/i915: Introduce i915_gem_set_seqno()
drm/i915: Always clear semaphore mboxes on seqno wrap
drm/i915: Initialize hardware semaphore state on ring init
drm/i915: Introduce ring set_seqno
drm/i915: Missed conversion to gtt_pte_t
drm/i915: Bug on unsupported swizzled platforms
drm/i915: BUG() if fences are used on unsupported platform
drm/i915: fixup overlay stolen memory leak
drm/i915: clean up PIPECONF bpc #defines
drm/i915: add intel_dp_set_signal_levels
drm/i915: remove leftover display.update_wm assignment
drm/i915: check for the PCH when setting pch_transcoder
drm/i915: Clear the stolen fb before enabling
drm/i915: Access to snooped system memory through the GTT is incoherent
drm/i915: Remove stale comment about intel_dp_detect()
...
Conflicts:
drivers/gpu/drm/i915/intel_display.c
This partially reverts
commit 6c085a728c
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Mon Aug 20 11:40:46 2012 +0200
drm/i915: Track unbound pages
Closer inspection of that patch revealed a bunch of unrelated changes
in the shrinker:
- The shrinker count is now in pages instead of objects.
- For counting the shrinkable objects the old code only looked at the
inactive list, the new code looks at all bounds objects (including
pinned ones). That is obviously in addition to the new unbound list.
- The shrinker cound is no longer scaled with
sysctl_vfs_cache_pressure. Note though that with the default tuning
value of vfs_cache_pressue = 100 this doesn't affect the shrinker
behaviour.
- When actually shrinking objects, the old code first dropped
purgeable objects, then normal (inactive) objects. Only then did it,
in a last-ditch effort idle the gpu and evict everything. The new
code omits the intermediate step of evicting normal inactive
objects.
Safe for the first change, which seems benign, and the shrinker count
scaling, which is a bit a different story, the endresult of all these
changes is that the shrinker is _much_ more likely to fall back to the
last-ditch resort of idling the gpu and evicting everything. The old
code could only do that if something else evicted lots of objects
meanwhile (since without any other changes the nr_to_scan will be
smaller than the object count).
Reverting the vfs_cache_pressure behaviour itself is a bit bogus: Only
dentry/inode object caches should scale their shrinker counts with
vfs_cache_pressure. Originally I've had that change reverted, too. But
Chris Wilson insisted that it's too bogus and shouldn't again see the
light of day.
Hence revert all these other changes and restore the old shrinker
behaviour, with the minor adjustment that we now first scan the
unbound list, then the inactive list for each object category
(purgeable or normal).
A similar patch has been tested by a few people affected by the gen4/5
hangs which started to appear in 3.7, which some people bisected to
the "drm/i915: Track unbound pages" commit. But just disabling the
unbound logic alone didn't change things at all.
Note that this patch doesn't fix the referenced bugs, it only hides
the underlying bug(s) well enough to restore pre-3.7 behaviour. The
key to achieve that is to massively reduce the likelyhood of going
into a full gpu stall and evicting everything.
v2: Reword commit message a bit, taking Chris Wilson's comment into
account.
v3: On Chris Wilson's insistency, do not reinstate the rather bogus
vfs_cache_pressure change.
Tested-by: Greg KH <gregkh@linuxfoundation.org>
Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=55984
References: https://bugs.freedesktop.org/show_bug.cgi?id=57122
References: https://bugs.freedesktop.org/show_bug.cgi?id=56916
References: https://bugs.freedesktop.org/show_bug.cgi?id=57136
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As along the error path we do not correct the user pin-count for the
failure, we may end up with userspace believing that it has a pinned
object at offset 0 (when interrupted by a signal for example).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Some fixes for 3.8:
- Watermark fixups from Chris Wilson (4 pieces).
- 2 snb workarounds, seem to be recently added to our internal DB.
- workaround for the infamous i830/i845 hang, seems now finally solid!
Based on Chris' fix for SNA, now also for UXA/mesa&old SNA.
- Some more fixlets for shrinker-pulls-the-rug issues (Chris&me).
- Fix dma-buf flags when exporting (you).
- Disable the VGA plane if it's enabled on lid open - similar fix in
spirit to the one I've sent you last weeek, BIOS' really like to mess
with the display when closing the lid (awesome debug work from Krzysztof
Mazur).
* 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel:
drm/i915: disable shrinker lock stealing for create_mmap_offset
drm/i915: optionally disable shrinker lock stealing
drm/i915: fix flags in dma buf exporting
i915: ensure that VGA plane is disabled
drm/i915: Preallocate the drm_mm_node prior to manipulating the GTT drm_mm manager
drm: Export routines for inserting preallocated nodes into the mm manager
drm/i915: don't disable disconnected outputs
drm/i915: Implement workaround for broken CS tlb on i830/845
drm/i915: Implement WaSetupGtModeTdRowDispatch
drm/i915: Implement WaDisableHiZPlanesWhenMSAAEnabled
drm/i915: Prefer CRTC 'active' rather than 'enabled' during WM computations
drm/i915: Clear self-refresh watermarks when disabled
drm/i915: Double the cursor self-refresh latency on Valleyview
drm/i915: Fixup cursor latency used for IVB lp3 watermarks
This really should have been part of the kill agp series.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The mmap offset structure is not part of the drm/i915 code, but
provided by gem helpers. To avoid leaky abstractions (by either
depending upon implementation details of said helper wrt to
preallocations, or reimplementing it in our code and so fuzzing
around in internal details of that helpr) simply disable
the shrinker lock stealing accross calls into the helper functions.
This should fix igt/gem_tiled_swapping.
v2: Fix cleanup path confusion bemoaned by Chris Wilson.
Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
commit 5774506f15
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed Nov 21 13:04:04 2012 +0000
drm/i915: Borrow our struct_mutex for the direct reclaim
added a nice trick to steal the struct_mutex lock in the shrinker if
it's the current task holding it. But this also caused the requirement
that every place which allocates memory needs to be careful about the
gem state of objects, since the shrinker could have pulled the rug out
from under it. We've usually solved this by carefully preallocating
things or ensure that buffers are pinned already.
But the shrinker also reaps mmap offset, so allocating those needs to
be careful, too. Now that code has been factored out into some common
helpers, so either we have fragile code depending upon the common
helper not doing something we don't want it to do. Or we need to
reimplement the mmap offset creation and so also leak implementation
details into our code.
Since this all results in leaky abstraction, cop out by disabling the
lock borrowing trick while calling down into the helpers. That way our
craziness is nicely confined to files in drm/i915.
v2: Split out the change to create_mmap_offset as request by Chris Wilson.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This function can be used to set the driver's next_seqno
to arbitrary value.
i915_gem_set_seqno() will idle the gpu, retire outstanding
requests, clear the semaphore mailboxes and set the hardware
status page's seqno index.
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In preparation for setting the seqno to arbitrary value on init or
through debugfs. We need to always clear the semaphores and set the
hws page seqno index by calling intel_ring_init_seqno().
v2: rewrote the commit message as suggested by Chris Wilson.
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Hardware status page needs to have proper seqno set
as our initial seqno can be arbitrary. If initial seqno is close
to wrap boundary on init and i915_seqno_passed() (31bit space)
refers to hw status page which contains zero, errorneous result
will be returned.
v2: clear mboxes and set hws page directly instead of going
through rings. Suggested by Chris Wilson.
v3: hws needs to be updated for all gens. Noticed by Chris
Wilson.
References: https://bugs.freedesktop.org/show_bug.cgi?id=58230
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we may reap neighbouring objects in order to free up pages for
allocations, we need to be careful not to allocate in the middle of the
drm_mm manager. To accomplish this, we can simply allocate the
drm_mm_node up front and then use the combined search & insert
drm_mm routines, reducing our code footprint in the process.
Fixes (partially) i-g-t/gem_tiled_swapping
Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
[danvet: Again fixup atomic bikeshed.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Pull DRM updates from Dave Airlie:
"This is the one and only next pull for 3.8, we had a regression we
found last week, so I was waiting for that to resolve itself, and I
ended up with some Intel fixes on top as well.
Highlights:
- new driver: nvidia tegra 20/30/hdmi support
- radeon: add support for previously unused DMA engines, more HDMI
regs, eviction speeds ups and fixes
- i915: HSW support enable, agp removal on GEN6, seqno wrapping
- exynos: IPP subsystem support (image post proc), HDMI
- nouveau: display class reworking, nv20->40 z compression
- ttm: start of locking fixes, rcu usage for lookups,
- core: documentation updates, docbook integration, monotonic clock
usage, move from connector to object properties"
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (590 commits)
drm/exynos: add gsc ipp driver
drm/exynos: add rotator ipp driver
drm/exynos: add fimc ipp driver
drm/exynos: add iommu support for ipp
drm/exynos: add ipp subsystem
drm/exynos: support device tree for fimd
radeon: fix regression with eviction since evict caching changes
drm/radeon: add more pedantic checks in the CP DMA checker
drm/radeon: bump version for CS ioctl support for async DMA
drm/radeon: enable the async DMA rings in the CS ioctl
drm/radeon: add VM CS parser support for async DMA on cayman/TN/SI
drm/radeon/kms: add evergreen/cayman CS parser for async DMA (v2)
drm/radeon/kms: add 6xx/7xx CS parser for async DMA (v2)
drm/radeon: fix htile buffer size computation for command stream checker
drm/radeon: fix fence locking in the pageflip callback
drm/radeon: make indirect register access concurrency-safe
drm/radeon: add W|RREG32_IDX for MM_INDEX|DATA based mmio accesss
drm/exynos: support extended screen coordinate of fimd
drm/exynos: fix x, y coordinates for right bottom pixel
drm/exynos: fix fb offset calculation for plane
...
We ignore all the user requests to handle flushing to the GTT domain if
the user requests such on a snoopable bo, and as such access through the
GTT to such pages remains incoherent. The specs even warn that such
behaviour is undefined - a strong reason never to do so.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
To gain confidence in the wrap handling, make it happen quite
soon after the boot.
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The complication is that during seqno wrapping we must be extremely
careful not to write to any ring as that will require a new seqno, and
so would recurse back into the seqno wrap handler. So we cannot call
i915_gpu_idle() as that does additional work beyond simply retiring the
current set of requests, and instead must do the minimal work ourselves
during seqno wrapping.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If wrap just happened we need to prevent emitting waits for
pre wrap values. Detect this and emit no-ops instead.
v2: Use olr > seqno to detect wrap instead of *seqno == 0
as suggested by Chris Wilson.
v3: Use last used seqno to detect the wraparound. From
Chris Wilson
v4: Fixed unnecessary last_seqno assigment
References: https://bugs.freedesktop.org/show_bug.cgi?id=57967
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This reverts commits a50915394f and
d7c3b937bd.
This is a revert of a revert of a revert. In addition, it reverts the
even older i915 change to stop using the __GFP_NO_KSWAPD flag due to the
original commits in linux-next.
It turns out that the original patch really was bogus, and that the
original revert was the correct thing to do after all. We thought we
had fixed the problem, and then reverted the revert, but the problem
really is fundamental: waking up kswapd simply isn't the right thing to
do, and direct reclaim sometimes simply _is_ the right thing to do.
When certain allocations fail, we simply should try some direct reclaim,
and if that fails, fail the allocation. That's the right thing to do
for THP allocations, which can easily fail, and the GPU allocations want
to do that too.
So starting kswapd is sometimes simply wrong, and removing the flag that
said "don't start kswapd" was a mistake. Let's hope we never revisit
this mistake again - and certainly not this many times ;)
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On a machine with bit17 swizzling, we need to store the bit17 of the
physical page address in put-pages. This requires a memory allocation,
on average less than a page, which may be difficult to satisfy is the
request to put-pages is on behalf of the shrinker. We could allow that
allocation to pull from the reserved memory pools, but it seems much
safer to preallocate the array for tiled objects on affected machines.
v2: Export i915_gem_object_needs_bit17_swizzle() for reuse.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If there are pre-wrap values in semaphore-mbox registers after wrap,
syncing against some after-wrap request will complete immediately.
Fix this by emitting ring commands to set mbox registers to zero
when the wrap happens.
v2: Use __intel_ring_begin to emit ring commands, from
Chris Wilson.
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Add a small comment to handle_seqno_wrap.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
- __iomem where there is none (I love how we mix these things up).
- Use gfp_t instead of an other plain type.
- Unconfuse one place about enum pipe vs enum transcoder - for the pch
transcoder we actually use the pipe enum. Fixup the other cases
where we assign the pipe to the cpu transcoder with explicit casts.
- Declare the mch_lock properly in a header.
There is still a decent mess in intel_bios.c about __iomem, but heck,
this is x86 and we're allowed to do that.
Makes-sparse-happy: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Use a space after the cast consistently and fix up the
newly-added cast in i915_irq.c to properly use __iomem.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we may actually allocate in order to save the physical swizzling bits
during the free, we have to be careful not to trigger the shrinker on
the same object.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Added a small comment in the code to really drive the
scariness of this patch home.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The primary purpose of this was to debug some use-after-free memory
corruption that was causing an OOPS inside drm/i915. As it turned out
the corruption was being caused elsewhere and i915.ko as a major user of
many objects was being hit hardest.
Indeed as we do frequent the generic kmalloc caches, dedicating one to
ourselves (or at least naming one for us depending upon the core) aids
debugging our own slab usage.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Allow for the creation of GEM objects backed by stolen memory. As these
are not backed by ordinary pages, we create a fake dma mapping and store
the address in the scatterlist rather than obj->pages.
v2: Mark _i915_gem_object_create_stolen() as static, as noticed by Jesse
Barnes.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Since we drop dev->struct_mutex when going through the slowpath, the
object might have been moved out of the cpu domain. Hence we need to
clflush the entire object to ensure that after the ioctl returns,
everything is coherent again (interwoven writes are ill-defined
anyway).
But we only need to do this if we start in the cpu domain and the
object requires flushing for coherency. So don't do the flushing if
the object is coherent anyway or if we've done in-line clfushing
already.
v2: i915_gem_clflush_object already checks whether the object is
coherent and if so, drops the flushing. Hence we don't need to check
that ourselves, simplifying the condition.
v3: Reorder the checks for better clarity (and adjust the comment
accordingly), suggested by Chris Wilson.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The shmem paths for pwrite/pread used a clever trick to hold onto a
single page when dropping the big dev->struct_mutex for the slowpath.
But this ran the risk of reinstating (or not completely purging) the
backing storage when dropping purgeable objects.
Hence the code needed to keep track of whether it ever dropped the
lock, and if it did, manually check whether it needs to re-purge the
backing storage. But thanks to the pages pin count introduced in
commit a5570178c0
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Sep 4 21:02:54 2012 +0100
drm/i915: Pin backing pages whilst exporting through a dmabuf vmap
which allowed us to pin the backing storage and remove that page
reference trick from shmem_pwrite/read in
commit f60d7f0c1d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Sep 4 21:02:56 2012 +0100
drm/i915: Pin backing pages for pread
and
commit 755d22184f
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Sep 4 21:02:55 2012 +0100
drm/i915: Pin backing pages for pwrite
we can now abolish this check. The slowpath cleanup completely
disappears from pread, and for pwrite we're only left with the domain
fixup in case someone moved the object out of the cpu domain from
under us. A follow-on patch will optimize that a notch more.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
i915_gem_handle_seqno_wrap() will zero all sync_seqnos but as the
wrap can happen inside ring->sync_to(), pre wrap seqno was
carried over and overwrote the zeroed sync_seqno.
When wrap is handled, all outstanding requests will be retired and
objects moved to inactive queue, causing their last_read_seqno to be zero.
Use this to update the sync_seqno correctly.
RING_SYNC registers after wrap will contain pre wrap values which
are >= seqno. So injecting the semaphore wait into ring completes
immediately.
Original idea for using last_read_seqno from Chris Wilson.
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Replace the wait for the ring to be clear with the more common wait for
the ring to be idle. The principle advantage is one less exported
intel_ring_wait function, and the removal of a hardcoded value.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we now always preallocate the seqno before writing to the ring, we
can trivially test if we have any pending activity on the ring by
inspecting the olr. This makes it then possible to flush operations that
are not normally associated with a request, like power-management.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Based on the work by Mika Kuoppala, we realised that we need to handle
seqno wraparound prior to committing our changes to the ring. The most
obvious point then is to grab the seqno inside intel_ring_begin(), and
then to reuse that seqno for all ring operations until the next request.
As intel_ring_begin() can fail, the callers must already be prepared to
handle such failure and so we can safely add further checks.
This patch looks like it should be split up into the interface
changes and the tweaks to move seqno wrapping from the execbuffer into
the core seqno increment. However, I found no easy way to break it into
incremental steps without introducing further broken behaviour.
v2: Mika found a silly mistake and a subtle error in the existing code;
inside i915_gem_retire_requests() we were resetting the sync_seqno of
the target ring based on the seqno from this ring - which are only
related by the order of their allocation, not retirement. Hence we were
applying the optimisation that the rings were synchronised too early,
fortunately the only real casualty there is the handling of seqno
wrapping.
v3: Do not forget to reset the sync_seqno upon module reinitialisation,
ala resume.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=863861
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> [v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In commit 69c2fc8913
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Fri Jul 20 12:41:03 2012 +0100
drm/i915: Remove the per-ring write list
the explicit flush was removed from i915_ring_idle(). However, we
continued to wait upon the next seqno which now did not correspond to
any request (except for the unusual condition of a failure to queue a
request after execbuffer) and so would wait indefinitely.
This has an important side-effect that i915_gpu_idle() does not cause
the seqno to be incremented. This is vital if we are to be able to idle
the GPU to handle seqno wraparound, as in subsequent patches.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If we have hit oom whilst holding our struct_mutex, then currently we
cannot reap our own GPU buffers which likely pin most of memory, making
an outright OOM more likely. So if we are running in direct reclaim and
already hold the mutex, attempt to free buffers knowing that the
original function can not continue until we return.
v2: Add a note explaining that the mutex may be stolen due to
pre-emption, and that is bad.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we may invoke the shrinker whilst trying to allocate memory to hold
the gtt_space for this object, we need to be careful not to mark the
drm_mm_node as activated (by assigning it to this object) before we
have finished our sequence of allocations.
Note: We also need to move the binding of the object into the actual
pagetables down a bit. The best way seems to be to move it out into
the callsites.
Reported-by: Imre Deak <imre.deak@gmail.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Added small note to commit message to summarize review
discussion.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In order to prevent reaping of the object whilst setting it up to
handle the pagefault, we need to mark it as pinned. This has the nice
side-effect of eliminating some special cases from the pagefault handler
as well!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In the circumstances that the shrinker is allowed to steal the mutex
in order to reap pages, we need to be careful to prevent it operating on
the current object and shooting ourselves in the foot.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Daniel writes:
Highlights of this -next round:
- ivb fdi B/C fixes
- hsw sprite/plane offset fixes from Damien
- unified dp/hdmi encoder for hsw, finally external dp support on hsw
(Paulo)
- kill-agp and some other prep work in the gtt code from Ben
- some fb handling fixes from Ville
- massive pile of patches to align hsw VGA with the spec and make it
actually work (Paulo)
- pile of workarounds from Jesse, mostly for vlv, but also some other
related platforms
- start of a dev_priv reorg, that thing grew out of bounds and chaotic
- small bits&pieces all over the place, down to better error handling for
load-detect on gen2 (Chris, Jani, Mika, Zhenyu, ...)
On top of the previous pile (just copypasta):
- tons of hsw dp prep patches form Paulo
- round scheduled work items and timers to nearest second (Chris)
- some hw workarounds (Jesse&Damien)
- vlv dp support and related fixups (Vijay et al.)
- basic haswell dp support, not yet wired up for external ports (Paulo)
- edp support (Paulo)
- tons of refactorings to prepare for the above (Paulo)
- panel rework, unifiying code between lvds and edp panels (Jani)
- panel fitter scaling modes (Jani + Yuly Novikov)
- panel power improvements, should now work without the BIOS setting it up
- extracting some dp helpers from radeon/i915 and move them to
drm_dp_helper.c
- randome pile of workarounds (Damien, Ben, ...)
- some cleanups for the register restore code for suspend/resume
- secure batchbuffer support, should enable tear-free blits on gen6+
Chris)
- random smaller fixlets and cleanups.
* 'for-airlied' of git://people.freedesktop.org/~danvet/drm-intel: (231 commits)
drm/i915: Restore physical HWS_PGA after resume
drm/i915: Report amount of usable graphics memory in MiB
drm/i915/i2c: Track users of GMBUS force-bit
drm/i915: Allocate the proper size for contexts.
drm/i915: Update load-detect failure paths for modeset-rework
drm/i915: Clear unused fields of mode for framebuffer creation
drm/i915: Always calculate 8xx WM values based on a 32-bpp framebuffer
drm/i915: Fix sparse warnings in from AGP kill code
drm/i915: Missed lock change with rps lock
drm/i915: Move the remaining gtt code
drm/i915: flush system agent TLBs on SNB
drm/i915: Kill off now unused gen6+ AGP code
drm/i915: Calculate correct stolen size for GEN7+
drm/i915: Stop using AGP layer for GEN6+
drm/i915: drop the double-OP_STOREDW usage in blt_ring_flush
drm/i915: don't rewrite the GTT on resume v4
drm/i915: protect RPS/RC6 related accesses (including PCU) with a new mutex
drm/i915: put ring frequency and turbo setup into a work queue v5
drm/i915: don't block resume on fb console resume v2
drm/i915: extract l3_parity substruct from dev_priv
...
It's pretty much all consolidated now that we've killed AGP. We can move
the one outlier, and defines too.
(Kill some unused defines in the process)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As a quick hack we make the old intel_gtt structure mutable so we can
fool a bunch of the existing code which depends on elements in that data
structure. We can/should try to remove this in a subsequent patch.
This should preserve the old gtt init behavior which upon writing these
patches seems incorrect. The next patch will fix these things.
The one exception is VLV which doesn't have the preserved flush control
write behavior. Since we want to do that for all GEN6+ stuff, we'll
handle that in a later patch. Mainstream VLV support doesn't actually
exist yet anyway.
v2: Update the comment to remove the "voodoo"
Check that the last pte written matches what we readback
v3: actually kill cache_level_to_agp_type since most of the flags will
disappear in an upcoming patch
v4: v3 was actually not what we wanted (Daniel)
Make the ggtt bind assertions better and stricter (Chris)
Fix some uncaught errors at gtt init (Chris)
Some other random stuff that Chris wanted
v5: check for i==0 in gen6_ggtt_bind_object to shut up gcc (Ben)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by [v4]: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Make the cache_level -> agp_flags conversion for pre-gen6 a
tad more robust by mapping everything != CACHE_NONE to the cached agp
flag - we have a 1:1 uncached mapping, but different modes of
cacheable (at least on later generations). Suggested by Chris Wilson.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Pretty astonishing how far apart these two members landed ... Especially since
I've already removed almost 200 lines in between.
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJQgvdwAAoJEHm+PkMAQRiG+3AH/i2XsqqN3VctL0nnbWfvds+Q
aKulfIdJTjKiVAsawPUtRqReZ8ijiebrgA/53lZLlrFOoPPQ5+LHmnSyQF6gErOY
NuAE1lijXDRM1pwBlhvOBbAj26wUobGjqONFJ9OkKr758Ue8ds/Q7UdxyEgmYgmg
tvVMzfRcICzryUV3PcqL+3cNPpCUdT6wGGRJ9DCv/jvGiWKExWhOle5oltrmxk+D
NsqRcws5pEubfHE4J8BvNWr8lE1kHfYVhrJETiLJUiN2XAJcbI4Jy7rU/3EGteNS
0HMZdaPPjV874lohdM70X2225SbYrCVkAYB5hnZCTeC3tYyCawBBPMQoyAiOcmU=
=+861
-----END PGP SIGNATURE-----
Merge tag 'v3.7-rc2' into drm-intel-next-queued
Linux 3.7-rc2
Backmerge to solve two ugly conflicts:
- uapi. We've already added new ioctl definitions for -next. Do I need to say more?
- wc support gtt ptes. We've had to revert this for snb+ for 3.7 and
also fix a few other things in the code. Now we know how to make it
work on snb+, but to avoid losing the other fixes do the backmerge
first before re-enabling wc gtt ptes on snb+.
And a few other minor things, among them git getting confused in
intel_dp.c and seemingly causing a conflict out of nothing ...
Conflicts:
drivers/gpu/drm/i915/i915_reg.h
drivers/gpu/drm/i915/intel_display.c
drivers/gpu/drm/i915/intel_dp.c
drivers/gpu/drm/i915/intel_modes.c
include/drm/i915_drm.h
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Daniel writes:
The big thing is the disabling of the hsw support by default, cc: stable.
We've aimed for basic hsw support in 3.6, but due to a few bad
happenstances we've screwed up and only 3.8 will have better modeset
support than vesa. To avoid yet another round of fallout from such a
gaffle on for the next platform we've added a module option to disable
early hw support by default. That should also give us more flexibility in
bring-up.
Otherwise just small fixes:
- 3 fixes from Egbert for sdvo corner cases
- invert-brightness quirk entry from Egbert
- revert a dp link training change, it regresses some setups
- and shut up a spurious WARN in our gem fault handler.
- regression fix for an oops on bit17 swizzling machines, introduce in 3.7
- another no-lvds quirk
* 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel:
drm/i915: Initialize obj->pages before use by i915_gem_object_do_bit17_swizzle()
drm/i915: Add no-lvds quirk for Supermicro X7SPA-H
drm/i915: Insert i915_preliminary_hw_support variable.
drm/i915: shut up spurious WARN in the gtt fault handler
Revert "drm/i915: Try harder to complete DP training pattern 1"
DRM/i915: Restore sdvo_flags after dtd->mode->dtd Roundrtrip.
DRM/i915: Don't clone SDVO LVDS with analog.
DRM/i915: Add QUIRK_INVERT_BRIGHTNESS for NCR machines.
DRM/i915: Don't delete DPLL Multiplier during DAC init.
If we leave obj->pages set to NULL before attempting to deswizzle them,
then an OOPS is well deserved.
Fixes regression introduced in commit 9da3da660d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Fri Jun 1 15:20:22 2012 +0100
drm/i915: Replace the array of pages with a scatterlist
Reported-and-tested-by: Krzysztof Kolasa
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-ENOSPC can happen if userspace is being simplistic and tries to map a
too big object. To aid further spurious WARN debugging, also print out
the error code.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56017
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Daniel writes:
"- some register magic to fix hsw crw (Paulo&Ben)
- fix backlight destruction for cpu edp (Jani)
- fix gen ch7xxx dvo ->get_hw_state
- fixup the plane->pipe fixup code, the broken version massively angers
the modeset sanity checks
- kill pipe A quirk for i855gm, otherwise I get a black screen with the
above patch
- fixup for gem_get_page helper (Chris)
- fixup guardband clipping w/a (Ken), without this mesa master can erronously
drop vertices on snb, mesa 9.0 has the optimization reverted
- another pageflip vs. modeset fix
- kill bogus BUG_ON which broke ums+gem from Willy Tarreau (gasp, people
are still using this!)"
* 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel:
drm/i915: fix non-DP-D eDP backlight cleanup and module reload
drm/i915: HSW CRW stability magic
drm/i915/dvo-ch7xxx: fix get_hw_state
drm/i915: fixup the plane->pipe fixup code
drm/i915: rip out the pipe A quirk for i855gm
drm/i915: disable wc gtt pte mappings on gen2
drm/i915: fixup i915_gem_object_get_page inline helper
drm/i915: Disallow preallocation of requests
drm/i915: Set guardband clipping workaround bit in the right register.
drm/i915: paper over a pipe-enable vs pageflip race
drm/i915: remove useless BUG_ON which caused a regression in 3.5.
This magic brings stability to HSW CRW machines.
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The intention was to allow the caller to avoid a failure to queue a
request having already written commands to the ring. However, this is a
moot point as the i915_add_request() can fail for other reasons than a
mere allocation failure and those failure cases are more likely than
ENOMEM. So the overlay code already had to handle i915_add_request()
failures, and due to
commit 3bb73aba1e
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Fri Jul 20 12:40:59 2012 +0100
drm/i915: Allow late allocation of request for i915_add_request()
the error handling code in intel_overlay.c was subject to causing
double-frees, as found by coverity.
Rather than further complicate i915_add_request() and callers, realise
the battle is lost and adapt intel_overlay.c to take advantage of the
late allocation of requests.
v2: Handle callers passing in a NULL seqno.
v3: Ditto. This time for sure.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
By using round_jiffies() we can align the wakeup of our worker to the
nearest second in order to batch wakeups and reduce system load, which
is useful for unimportant coarse tasks like our retire_requests.
v2: round_jiffies_relative() already returns the relative timeout value,
so no need to incorrectly perform the subtraction twice. The timer
interface still leaves the possibility for the value of jiffies to
change be we program the timer.
Suggested-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
round_jiffies() aligns the wakeup time to the nearest second in order to
batch wakeups and reduce system load, which is useful for unimportant
coarse timers like our hangcheck.
v2: round_jiffies_relative() returns the relative jiffie value, whereas
we need the absolute value for the timer.
Suggested-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
starting an old X server causes a kernel BUG since commit 1b50247a8d:
------------[ cut here ]------------
kernel BUG at drivers/gpu/drm/i915/i915_gem.c:3661!
invalid opcode: 0000 [#1] SMP
Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss uvcvideo
+videobuf2_core videodev videobuf2_vmalloc videobuf2_memops uhci_hcd ath9k mac80211 snd_hda_codec_realtek ath9k_common microcode
+ath9k_hw psmouse serio_raw sg ath cfg80211 atl1c lpc_ich mfd_core ehci_hcd snd_hda_intel snd_hda_codec snd_hwdep snd_pcm rtc_cmos
+snd_timer snd evdev eeepc_laptop snd_page_alloc sparse_keymap
Pid: 2866, comm: X Not tainted 3.5.6-rc1-eeepc #1 ASUSTeK Computer INC. 1005HA/1005HA
EIP: 0060:[<c12dc291>] EFLAGS: 00013297 CPU: 0
EIP is at i915_gem_entervt_ioctl+0xf1/0x110
EAX: f5941df4 EBX: f5940000 ECX: 00000000 EDX: 00020000
ESI: f5835400 EDI: 00000000 EBP: f51d7e38 ESP: f51d7e20
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 8005003b CR2: b760e0a0 CR3: 351b6000 CR4: 000007d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process X (pid: 2866, ti=f51d6000 task=f61af8d0 task.ti=f51d6000)
Stack:
00000001 00000000 f5835414 f51d7e84 f5835400 f54f85c0 f51d7f10 c12b530b
00000001 c151b139 c14751b6 c152e030 00000b32 00006459 00000059 0000e200
00000001 00000000 00006459 c159ddd0 c12dc1a0 ffffffea 00000000 00000000
Call Trace:
[<c12b530b>] drm_ioctl+0x2eb/0x440
[<c12dc1a0>] ? i915_gem_init+0xe0/0xe0
[<c1052b2b>] ? enqueue_hrtimer+0x1b/0x50
[<c1053321>] ? __hrtimer_start_range_ns+0x161/0x330
[<c10530b3>] ? lock_hrtimer_base+0x23/0x50
[<c1053163>] ? hrtimer_try_to_cancel+0x33/0x70
[<c12b5020>] ? drm_version+0x90/0x90
[<c10ca171>] vfs_ioctl+0x31/0x50
[<c10ca2e4>] do_vfs_ioctl+0x64/0x510
[<c10535de>] ? hrtimer_nanosleep+0x8e/0x100
[<c1052c20>] ? update_rmtp+0x80/0x80
[<c10ca7c9>] sys_ioctl+0x39/0x60
[<c1433949>] syscall_call+0x7/0xb
Code: 83 c4 0c 5b 5e 5f 5d c3 c7 44 24 04 2c 05 53 c1 c7 04 24 6f ef 47 c1 e8 6e e0 fd ff c7 83 38 1e 00 00 00 00 00 00 e9 3f ff ff
+ff <0f> 0b eb fe 0f 0b eb fe 8d b4 26 00 00 00 00 0f 0b eb fe 8d b6
EIP: [<c12dc291>] i915_gem_entervt_ioctl+0xf1/0x110 SS:ESP 0068:f51d7e20
---[ end trace dd332ec083cbd513 ]---
The crash happens here in i915_gem_entervt_ioctl() :
3659 BUG_ON(!list_empty(&dev_priv->mm.active_list));
3660 BUG_ON(!list_empty(&dev_priv->mm.flushing_list));
-> 3661 BUG_ON(!list_empty(&dev_priv->mm.inactive_list));
3662 mutex_unlock(&dev->struct_mutex);
Quoting Chris :
"That BUG_ON there is silly and can simply be removed. The check is to
verify that no batches were submitted to the kernel whilst the UMS/GEM
client was suspended - to which the BUG_ONs are a crude approximation.
Furthermore, the checks are too late, since it means we attempted to
program the hardware whilst it was in an invalid state, the BUG_ONs are
the least of your concerns at that point."
Note that this regression has been introduced in
commit 1b50247a8d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Apr 24 15:47:30 2012 +0100
drm/i915: Remove the list of pinned inactive objects
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
[danvet: Added note about the regressing commit and cc: stable.]
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Daniel writes:
Bigger -fixes pile, mostly because I've included Ajax' DP dongle stuff,
as discussed on irc. Otherwise just small things:
- regression fix to finally make 6bpc auto-dither on dp work (Jani)
- reinstate an snb ctx w/a that accidentally got lost in a rework (Chris)
- fixup the DP train sequence, logic-goof-up uncovered by Coverty (Chris)
- fix set_caching locking (Ben)
- fix spurious segfault on con-current gtt mmap faulting (Dimitry and Mika)
- some pageflip correctness fixes (still hunting down some issues, but
these are the worst offenders of confused code that we've tracked down
thus far) from Chris and me
- fixup swizzling settings on vlv (Jesse)
- gt_mode w/a from Ben added, fixes snb gt1 rc6+hw ctx hangs.
* 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel:
drm/i915: Fix GT_MODE default value
drm/i915: don't frob the vblank ts in finish_page_flip
drm/i915: call drm_handle_vblank before finish_page_flip
drm/i915: print warning if vmi915_gem_fault error is not handled
drm/i915: EBUSY status handling added to i915_gem_fault().
drm/i915: Try harder to complete DP training pattern 1
drm/i915: set swizzling to none on VLV
drm/dp: Make sink count DP 1.2 aware
drm/dp: Document DP spec versions for various DPCD registers
drm/i915/dp: Be smarter about connection sense for branch devices
drm/i915/dp: Fetch downstream port info if needed during DPCD fetch
drm/dp: Update DPCD defines
drm: Export drm_probe_ddc()
drm/i915: Flush the pending flips on the CRTC before modification
drm/i915: Actually invalidate the TLB for the SandyBridge HW contexts w/a
drm/i915: Fix set_caching locking
drm/i915: use adjusted_mode instead of mode for checking the 6bpc force flag
Falling into default case in vmi915_gem_fault is a bug. Be more
verbose about it.
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Subsequent threads returning EBUSY from vm_insert_pfn() was not handled
correctly. As a result concurrent access from new threads to
mmapped data caused SIGBUS.
Note that this fixes i-g-t/tests/gem_threaded_tiled_access.
Tested-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Pull drm merge (part 1) from Dave Airlie:
"So first of all my tree and uapi stuff has a conflict mess, its my
fault as the nouveau stuff didn't hit -next as were trying to rebase
regressions out of it before we merged.
Highlights:
- SH mobile modesetting driver and associated helpers
- some DRM core documentation
- i915 modesetting rework, haswell hdmi, haswell and vlv fixes, write
combined pte writing, ilk rc6 support,
- nouveau: major driver rework into a hw core driver, makes features
like SLI a lot saner to implement,
- psb: add eDP/DP support for Cedarview
- radeon: 2 layer page tables, async VM pte updates, better PLL
selection for > 2 screens, better ACPI interactions
The rest is general grab bag of fixes.
So why part 1? well I have the exynos pull req which came in a bit
late but was waiting for me to do something they shouldn't have and it
looks fairly safe, and David Howells has some more header cleanups
he'd like me to pull, that seem like a good idea, but I'd like to get
this merge out of the way so -next dosen't get blocked."
Tons of conflicts mostly due to silly include line changes, but mostly
mindless. A few other small semantic conflicts too, noted from Dave's
pre-merged branch.
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (447 commits)
drm/nv98/crypt: fix fuc build with latest envyas
drm/nouveau/devinit: fixup various issues with subdev ctor/init ordering
drm/nv41/vm: fix and enable use of "real" pciegart
drm/nv44/vm: fix and enable use of "real" pciegart
drm/nv04/dmaobj: fixup vm target handling in preparation for nv4x pcie
drm/nouveau: store supported dma mask in vmmgr
drm/nvc0/ibus: initial implementation of subdev
drm/nouveau/therm: add support for fan-control modes
drm/nouveau/hwmon: rename pwm0* to pmw1* to follow hwmon's rules
drm/nouveau/therm: calculate the pwm divisor on nv50+
drm/nouveau/fan: rewrite the fan tachometer driver to get more precision, faster
drm/nouveau/therm: move thermal-related functions to the therm subdev
drm/nouveau/bios: parse the pwm divisor from the perf table
drm/nouveau/therm: use the EXTDEV table to detect i2c monitoring devices
drm/nouveau/therm: rework thermal table parsing
drm/nouveau/gpio: expose the PWM/TOGGLE parameter found in the gpio vbios table
drm/nouveau: fix pm initialization order
drm/nouveau/bios: check that fixed tvdac gpio data is valid before using it
drm/nouveau: log channel debug/error messages from client object rather than drm client
drm/nouveau: have drm debugging macros build on top of core macros
...
Convert #include "..." to #include <path/...> in drivers/gpu/.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Jones <davej@redhat.com>
Remove redundant DRM UAPI header #inclusions from drivers/gpu/.
Remove redundant #inclusions of core DRM UAPI headers (drm.h, drm_mode.h and
drm_sarea.h). They are now #included via drmP.h and drm_crtc.h via a preceding
patch.
Without this patch and the patch to make include the UAPI headers from the core
headers, after the UAPI split, the DRM C sources cannot find these UAPI headers
because the DRM code relies on specific -I flags to make #include "..." work
on headers in include/drm/ - but that does not work after the UAPI split without
adding more -I flags.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Jones <davej@redhat.com>
On the EINVAL case we don't release struct_mutex. It should be safe to
grab the lock after checking the parameters, which also resolves the
issues.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQEcBAABAgAGBQJQX7MuAAoJEHm+PkMAQRiG0h0IAJURkrMCAQUxA+Ik66ReH89s
LQcVd0U9uL4UUOi7f5WR64Vf9Cfu6VVGX9ZKSvjpNskvlQaUQPMIt4pMe6g4X4dI
u0bApEy4XZz3nGabUAghIU8jJ8cDmhCG6kPpSiS7pi7KHc0yIa4WFtJRrIpGaIWT
xuK38YOiOHcSDRlLyWZzainMncQp/ixJdxnqVMTonkVLk0q0b84XzOr4/qlLE5lU
i+TsK3PRKdQXgvZ4CebL+srPBwWX1dmgP3VkeBloQbSSenSeELICbFWavn2ml+sF
GXi4dO93oNquL/Oy5SwI666T4uNcrRPaS+5X+xSZgBW/y2aQVJVJuNZg6ZP/uWk=
=0v2l
-----END PGP SIGNATURE-----
Merge tag 'v3.6-rc7' into drm-intel-next-queued
Manual backmerge of -rc7 to resolve a silent conflict leading to
compile failure in drivers/gpu/drm/i915/intel_hdmi.c.
This is due to the bugfix in -rc7:
commit b98b601672
Author: Wang Xingchao <xingchao.wang@intel.com>
Date: Thu Sep 13 07:43:22 2012 +0800
drm/i915: HDMI - Clear Audio Enable bit for Hot Plug
Since this code moved around a lot in -next git put that snippet at
the wrong spot. I've tried to fix this by making the conflict explicit
by merging a version for next with:
commit 3cce574f01
Author: Wang Xingchao <xingchao.wang@intel.com>
Date: Thu Sep 13 11:19:00 2012 +0800
drm/i915: HDMI - Clear Audio Enable bit for Hot Plug unconditionally
But that failed to solve the entire problem. To avoid pushing out
further -nightly branch to our QA where this is broken, do the
backmerge and manually add the stuff git adds to -next from the patch
in -fixes.
Note that this doesn't show up in git's merge diff (and hence is also
not handled by git rerere), which adds to the reasons why I'd like to
fix this with a verbose backmerge. The git merge diff only shows a
bunch of trivial conflicts of the "code changed in lines next to each
another" kind.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
By providing a callback for when we need to bind the pages, and then
release them again later, we can shorten the amount of time we hold the
foreign pages mapped and pinned, and importantly the dmabuf objects then
behave as any other normal object with respect to the shrinker and
memory management.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Rather than have multiple data structures for describing our page layout
in conjunction with the array of pages, we can migrate all users over to
a scatterlist.
One major advantage, other than unifying the page tracking structures,
this offers is that we replace the vmalloc'ed array (which can be up to
a megabyte in size) with a chain of individual pages which helps reduce
memory pressure.
The disadvantage is that we then do not have a simple array to iterate,
or to access randomly. The common case for this is in the relocation
processing, which will typically fit within a single scatterlist page
and so be almost the same cost as the simple array. For iterating over
the array, the extra function call could be optimised away, but in
reality is an insignificant cost of either binding the pages, or
performing the pwrite/pread.
v2: Fix drm_clflush_sg() to not invoke wbinvd as well! And fix the
trivial compile error from rebasing.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
By using the recently introduced pinning of pages, we can safely drop
the mutex in the knowledge that the pages are not going to disappear
beneath us, and so we can simplify the code for iterating over the pages.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
By using the recently introduced pinning of pages, we can safely drop
the mutex in the knowledge that the pages are not going to disappear
jeneath us, and so we can simplify the code for iterating over the pages.
Note: The old code had such complicated page refcounting since it used
obj->pages as a micro-optimization if it's there, but that could
(before this patch) disappear when we drop the dev->struct_mutex.
Hence some manual page refcounting was required for the slow path,
complicated by the fact that pages returned by shmem_read_mapping_page
already have a pageref, which needs to be dropped again.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Added note to explain the question Ben raised in review.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We need to refcount our pages in order to prevent reaping them at
inopportune times, such as when they currently vmapped or exported to
another driver. However, we also wish to keep the lazy deallocation of
our pages so we need to take a pin/unpinned approach rather than a
simple refcount.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In order to specialise functions depending upon the type of object, we
can attach vfuncs to each object via a new ->ops pointer.
For instance, this will be used in future patches to only bind pages from
a dma-buf for the duration that the object is used by the GPU - and so
prevent them from pinning those pages for the entire of the object.
v2: Bonus comments.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Pin-leaks persist and we get the perennial bug reports of machine
lockups to the BUG_ON(pin_count==MAX). If we instead loudly report that
the object cannot be pinned at that time it should prevent the driver from
locking up, and hopefully restore a semblance of working whilst still
leaving us a OOPS to debug.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Daniel writes:
"New stuff for -next. Highlights:
- prep patches for the modeset rework. Note that one of those patches
touches the fb helper in the common drm code.
- hasw hdmi audio support (Wang Xingchao)
- improved instdone dumping for gen7 (Ben)
- unbound tracking and a few follow-up patches from Chris
- dma_buf->begin/end_cpu_access plus fix for drm/udl (Dave)
- improve mmio error reporting for hsw
- prep patch for WQ_NON_REENTRANT removal (Tejun Heo)
"
* 'for-airlied' of git://people.freedesktop.org/~danvet/drm-intel: (41 commits)
drm/i915: Remove __GFP_NO_KSWAPD
drm/i915: disable rc6 on ilk when vt-d is enabled
drm/i915: Avoid unbinding due to an interrupted pin_and_fence during execbuffer
drm/i915: Use new INSTDONE registers (Gen7+)
drm/i915: Add new INSTDONE registers
drm/i915: Extract reading INSTDONE
drm/i915: Use a non-blocking wait for set-to-domain ioctl
drm/i915: Juggle code order to ease flow of the next patch
drm/i915: Use cpu relocations if the object is in the GTT but not mappable
drm/i915: Extract general object init routine
drm/i915: Protect private gem objects from truncate (such as imported dmabuf)
drm/i915: Only pwrite through the GTT if there is space in the aperture
i915: use alloc_ordered_workqueue() instead of explicit UNBOUND w/ max_active = 1
drm/i915: Find unclaimed MMIO writes.
drm/i915: Add ERR_INT to gen7 error state
drm/i915: Cantiga+ cannot handle a hsync front porch of 0
drm/i915: fix reassignment of variable "intel_dp->DP"
drm/i915: Try harder to allocate an mmap_offset
drm/i915: Show pin count in debugfs
drm/i915: Show (count, size) of purgeable objects in i915_gem_objects
...
When I pulled-in today's drm-intel-next into linux-next (next-20120824)
I saw this build-breakage:
drivers/gpu/drm/i915/i915_gem.c: In function 'i915_gem_object_get_pages_gtt':
drivers/gpu/drm/i915/i915_gem.c:1778:40: error: '__GFP_NO_KSWAPD' undeclared (first use in this function)
drivers/gpu/drm/i915/i915_gem.c:1778:40: note: each undeclared identifier is reported only once for each function it appears in
This is caused by commit ba099ef165f8 ("mm: remove __GFP_NO_KSWAPD")
and commit b6beae2c2014 ("mm: remove __GFP_NO_KSWAPD fixes") in
linux-next (next-20120824).
Fix this by removing __GFP_NO_KSWAPD from drm/i915 driver.
Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There was some merge conflicts in -next and they weren't so pretty, so
backmerge now to avoid them.
Conflicts:
drivers/gpu/drm/i915/i915_gem.c
drivers/gpu/drm/i915/intel_modes.c
The principal use for set-to-domain is for userspace to serialise
operations with a particular buffer, for example to maintain coherency
with a CPU map or to ratelimit its rendering by waiting on all previous
operations before continuing. As such we tend to hold the struct_mutex
for long periods during the synchronisation and so cause contention
issues with other users of the graphics device, even for independent
operations as memory management. An example is the contention between
compiz and X which causes jitter in the display and a drop in peak
throughput.
The ultimate solution would be a set of fine grained locks and lockless
operations, but an intermediate step is to first attempt the
synchronisation for set-to-domain without holding the mutex. This
introduces a number of race conditions, so we limit it use to the ioctl
periphery where we have no dependent state and can safely complete with
a locked synchronisation afterwards.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Move the wait-for-rendering logic around in the file so that we can
group it together with the subsequent variations. The general goal is to
have the lower level routines clustered together and then the higher
level logic building upon those low level routines that came before.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we wish to create specialised object constructions in the near
future that share the same basic GEM object struct, export the default
initializer.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If the object has no backing shmemfs filp, then we obviously cannot
perform a truncation operation upon it.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Avoid stalling and waiting for the GPU by checking to see if there is
sufficient inactive space in the aperture for us to bind the buffer
prior to writing through the GTT. If there is inadequate space we will
have to stall waiting for the GPU, and incur overheads moving objects
about. Instead, only incur the clflush overhead on the target object by
writing through shmem.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Given the persistence of an offset for the lifetime of an object, itis
easy to contemplate how the mmap space becomes badly fragmented to the
point that further allocations fail with ENOSPC. Our only recourse at
this point is to try to purge the objects to release some space and
reattempt the allocation.
References: https://bugs.freedesktop.org/show_bug.cgi?id=39552
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
A pair of universally true checks that just need to be put in the right
place depending on where in the patch sequence you go. Note that
i915_gem_object_put_pages_gtt() already gains the
BUG_ON(obj->gtt_space), but on reflection that needed to migrate to
put_pages().
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Prep work to make Chris Wilson's unbound tracking patch a bit easier
to read. Alas, I'd have preferred that moving the page allocation
retry loop from bind to get_pages would have been a separate patch,
too. But that looks like real work ;-)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
After reset we unconditionally reinitialize lists. If the context switch
hasn't yet completed before the suspend, the default context object will
end up on lists that are going to go away when we resume.
The patch forces the context switch to be synchronous before suspend
assuring that the active/inactive tracking is correct at the time of
resume.
References: https://bugs.freedesktop.org/show_bug.cgi?id=52429
Tested-by: Guang A Yang <guang.a.yang@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Avoid the forcewake overhead when simply retiring requests, as often the
last seen seqno is good enough to satisfy the retirment process and will
be promptly re-run in any case. Only ensure that we force the coherent
seqno read when we are explicitly waiting upon a completion event to be
sure that none go missing, and also for when we are reporting seqno
values in case of error or debugging.
This greatly reduces the load for userspace using the busy-ioctl to
track active buffers, for instance halving the CPU used by X in pushing
the pixels from a software render (flash). The effect will be even more
magnified with userptr and so providing a zero-copy upload path in that
instance, or in similar instances where X is simply compositing DRI
buffers.
v2: Reverse the polarity of the tachyon stream. Daniel suggested that
'force' was too generic for the parameter name and that 'lazy_coherency'
better encapsulated the semantics of it being an optimization and its
purpose. Also notice that gen6_get_seqno() is only used by gen6/7
chipsets and so the test for IS_GEN6 || IS_GEN7 is redundant in that
function.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
By selecting the cache level (essentially whether or not the CPU snoops
any updates to the bo, and on more recent machines whether it resides
inside the CPU's last-level-cache) a userspace driver is able to then
manage all of its memory within buffer objects, if it so desires. This
enables the userspace driver to accelerate uploads and more importantly
downloads from the GPU and to able to mix CPU and GPU rendering/activity
efficiently.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Added code comment about where we plan to stuff platform
specific cacheing control bits in the ioctl struct.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Several functions of the GPU have the restriction that differing memory
domains cannot be placed next to each other (as the GPU may prefetch
beyond the end of one domain and hang as it crosses into the other
domain). We use the facility of the drm_mm to mark ranges with a
particular color that corresponds to the cache attributes of those pages
in order to prevent allocating adjacent blocks of differing memory
types.
v2: Rebase ontop of drm_mm coloring v2.
v3: Fix rebinding existing gtt_space and add a verification routine.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As suggested by Daniel, rip out the independent timers for device and
crtc busyness and integrate the manual powermanagement of the display
engine into the GEM core and its request tracking. The benefits are that
the code is a lot smaller, fewer moving parts and should fit more neatly
into the overall activity tracking of the driver.
v2: Complete overhaul and removal of the racy timers and workers.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
By moving the function to intel_ringbuffer and currying the appropriate
parameter, hopefully we make the callsites easier to read and
understand.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Rely instead on the insertion of the implicit flush before the seqno
breadcrumb.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As the flush is either performed explictly immediately after the
execbuffer dispatch, or before the serialisation of last_fenced_seqno we
can forgo the explict i915_gem_flush_ring().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is now handled by a global flag to ensure we emit a flush before
the next serialisation point (if we failed to queue one previously).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we guarantee to emit a flush before emitting the breadcrumb or
the next batchbuffer, there is no further need for the flushing list.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we always flush the GPU cache prior to emitting the breadcrumb, we no
longer have to worry about the deferred flush causing the
pending_gpu_write to be delayed. So we can instead utilize the known
last_write_seqno to hopefully minimise the wait times.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we move to lazily clearing the GPU write domain only when the buffer
becomes inactive, this leaves a window of opportunity for
i915_gem_object_pin_to_display_plane() to detect a seemingly
inconsistent value. This function is special as it tries to pipeline the
operation to avoid the stall and so may not retires the buffer and we
may not get the opportunity to clear the write domain. However, we know
all is good, so drop the assertion.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Request preallocation was added to i915_add_request() in order to
support the overlay. However, not all users care and can quite happily
ignore the failure to allocate the request as they will simply repeat
the request in the future.
By pushing the allocation down into i915_add_request(), we can then
remove some rather ugly error handling in the callers.
v2: Nullify request->file_priv otherwise we chase a garbage pointer
when retiring requests.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The intention is to help select which engine to use for copies with
interoperating clients - such as a GL client making a request to the X
server to perform a SwapBuffers, which may require copying from the
active GL back buffer to the X front buffer.
We choose to report a mask of the active rings to future proof the
interface against any changes which may allow for the object to reside
upon multiple rings.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: bikeshed away the write ring mask and add the explanation
Chris sent in a follow-up mail why we decided to use masks.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This prevents a WARN introduced with
commit de2b998552
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Wed Jul 4 22:52:50 2012 +0200
drm/i915: don't return a spurious -EIO from intel_ring_begin
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It never quite worked despite the numerous workarounds, yet I still see
people trying to use this hardware and filing bug reports. As we no
longer even try to implement the workarounds, since 6a233c7887
(drm/i915/ringbuffer: kill snb blt workaround), simply disable the ring.
v2: Add a message to inform the user about the limited capabilities of
their pre-production hardware.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In order to support snoopable memory on non-LLC architectures (so that
we can bind vgem objects into the i915 GATT for example), we have to
avoid the prefetcher on the GPU from crossing memory domains and so
prevent allocation of a snoopable PTE immediately following an uncached
PTE. To do that, we need to extend the range allocator with support for
tracking and segregating different node colours.
This will be used by i915 to segregate memory domains within the GTT.
v2: Now with more drm_mm helpers and less driver interference.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Airlie <airlied@redhat.com
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@gmail.com>
... instead of looping endless with no hope of ever serving that
page-fault. We only need to break out of this loop when the gpu died,
to run the reset work (and hopefully resurrect it).
To clarify questions Chris raised on irc: This is about handling I/O
errors not from our own code, but e.g. when the disk died when trying
to swap in a gem bo. So this patch remidies the issue that the current
handling only handles gpu-death-induced cases of -EIO. Admittedly,
dying disks are much rarer than hanging gpus ...To clarify questions
Chris raised on irc: This is about handling I/O errors not from our
own code, but e.g. when the disk died when trying to swap in a gem bo.
So this patch remidies the issue that the current handling only
handles gpu-death-induced cases of -EIO. Admittedly, dying disks are
much rarer than hanging gpus ...
This seems to have been lost in:
commit d9bc7e9f32
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Mon Feb 7 13:09:31 2011 +0000
drm/i915: Fix infinite loop regression from 21dd3734
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With the gpu reset no longer using a trylock we've increased the
chances of userspace getting stuck quite a bit. To make that
(hopefully) rare case more paletable time out when waiting for the gpu
reset code to complete and signal this little issue to the caller by
returning -EIO.
This should help userspace to somewhat gracefully fall back and
hopefully allow the user to grab some logs and reboot the machine
(instead of staring at a frozen X screen in agony).
Suggested by Chris Wilson because I've been stubborn about allowing
the gpu reset code no to fail, ever (by removing the trylock).
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
So don't return -EAGAIN, even in the case of a gpu hang. Remap it to
-EIO instead. Note that this isn't really an issue with
interruptability, but more that we have quite a few codepaths (mostly
around kms stuff) that simply can't handle any errors and hence not
even -EAGAIN. Instead of adding proper failure paths so that we could
restart these ioctls we've opted for the cheap way out of sleeping
non-interruptibly. Which works everywhere but when the gpu dies,
which this patch fixes.
So essentially interruptible == false means 'wait for the gpu or die
trying'.'
This patch is a bit ugly because intel_ring_begin is all non-interruptible
and hence only returns -EIO. But as the comment in there says,
auditing all the callsites would be a pain.
To avoid duplicating code, reuse i915_gem_check_wedge in __wait_seqno
and intel_wait_ring_buffer. Also use the opportunity to clarify the
different cases in i915_gem_check_wedge a bit with comments.
v2: Don't access dev_priv->mm.interruptible from check_wedge - we
might not hold dev->struct_mutex, making this racy. Instead pass
interruptible in as a parameter. I've noticed this because I've hit a
BUG_ON(!mutex_is_locked) at the top of check_wedge. This has been
added in
commit b4aca0106c
Author: Ben Widawsky <ben@bwidawsk.net>
Date: Wed Apr 25 20:50:12 2012 -0700
drm/i915: extract some common olr+wedge code
although that commit is missing any justification for this. I guess
it's just copy&paste, because the same commit add the same BUG_ON
check to check_olr, where it indeed makes sense.
But in check_wedge everything we access is protected by other means,
so this is superflous. And because it now gets in the way (we add a
new caller in __wait_seqno, which can be called without
dev->struct_mutext) let's just remove it.
v3: Group all the i915_gem_check_wedge refactoring into this patch, so
that this patch here is all about not returning -EAGAIN to callsites
that can't handle syscall restarting.
v4: Add clarification what interuptible == fales means in our code,
requested by Ben Widawsky.
v5: Fix EAGAIN mispell noticed by Chris Wilson.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is just the minimal patch to disable all this code so that we can
do decent amounts of QA before we rip it all out.
The complicating thing is that we need to flush the gpu caches after
the batchbuffer is emitted. Which is past the point of no return where
execbuffer can't fail any more (otherwise we risk submitting the same
batch multiple times).
Hence we need to add a flag to track whether any caches associated
with that ring are dirty. And emit the flush in add_request if that's
the case.
Note that this has a quite a few behaviour changes:
- Caches get flushed/invalidated unconditionally.
- Invalidation now happens after potential inter-ring sync.
I've bantered around a bit with Chris on irc whether this fixes
anything, and it might or might not. The only thing clear is that with
these changes it's much easier to reason about correctness.
Also rip out a lone get_next_request_seqno in the execbuffer
retire_commands function. I've dug around and I couldn't figure out
why that is still there, with the outstanding lazy request stuff it
shouldn't be necessary.
v2: Chris Wilson complained that I also invalidate the read caches
when flushing after a batchbuffer. Now optimized.
v3: Added some comments to explain the new flushing behaviour.
Cc: Eric Anholt <eric@anholt.net>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
To keep things as sane as possible, switch to the default context before
idling. This should help free context objects, as well as put things in
a more well defined state before suspending.
v2: remove seqno from context switch call (daniel)
return error on failed context switch instead of WARN+continue (daniel)
v3: move idling to i915_gpu idle (from i915_gem_idle) (Chris)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Very basic code for context setup/destruction in the driver.
Adds the file i915_gem_context.c This file implements HW context
support. On gen5+ a HW context consists of an opaque GPU object which is
referenced at times of context saves and restores. With RC6 enabled,
the context is also referenced as the GPU enters and exists from RC6
(GPU has it's own internal power context, except on gen5). Though
something like a context does exist for the media ring, the code only
supports contexts for the render ring.
In software, there is a distinction between contexts created by the
user, and the default HW context. The default HW context is used by GPU
clients that do not request setup of their own hardware context. The
default context's state is never restored to help prevent programming
errors. This would happen if a client ran and piggy-backed off another
clients GPU state. The default context only exists to give the GPU some
offset to load as the current to invoke a save of the context we
actually care about. In fact, the code could likely be constructed,
albeit in a more complicated fashion, to never use the default context,
though that limits the driver's ability to swap out, and/or destroy
other contexts.
All other contexts are created as a request by the GPU client. These
contexts store GPU state, and thus allow GPU clients to not re-emit
state (and potentially query certain state) at any time. The kernel
driver makes certain that the appropriate commands are inserted.
There are 4 entry points into the contexts, init, fini, open, close.
The names are self-explanatory except that init can be called during
reset, and also during pm thaw/resume. As we expect our context to be
preserved across these events, we do not reinitialize in this case.
As Adam Jackson pointed out, The cutoff of 1MB where a HW context is
considered too big is arbitrary. The reason for this is even though
context sizes are increasing with every generation, they have yet to
eclipse even 32k. If we somehow read back way more than that, it
probably means BIOS has done something strange, or we're running on a
platform that wasn't designed for this.
v2: rename load/unload to init/fini (daniel)
remove ILK support for get_size() (indirectly daniel)
add HAS_HW_CONTEXTS macro to clarify supported platforms (daniel)
added comments (Ben)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
When drm/i915 is in control of the gtt, we need to call
the enable function at all the relevant places ourselves.
Reviewed-by: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
For that to work we need to export the base address of the gtt
mmio window from intel-gtt. Also replace all other uses of
dev->agp by values we already have at hand.
Reviewed-by: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Change the ns_timeout parameter of the wait ioctl to a signed value.
Doing this allows the kernel to provide an infinite wait when a timeout
of less than 0 is provided. This mimics select/poll.
Initially the parameter was meant to match up with the GL spec 1:1, but
after being made aware of how much 2^64 - 1 nanoseconds actually is, I
do not think anyone will ever notice the loss of 1 bit.
The infinite timeout on waiting is similar to the existing i915
userspace interface with the exception that struct_mutex is dropped
while doing the wait in this ioctl.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Both busy_ioctl and the new wait_ioct need to do the same dance (or at
least should). Some slight changes:
- busy_ioctl now unconditionally checks for olr. Before emitting a
require flush would have prevent the olr check and hence required a
second call to the busy ioctl to really emit the request.
- the timeout wait now also retires request. Not really required for
abi-reasons, but makes a notch more sense imo.
I've tested this by pimping the i-g-t test some more and also checking
the polling behviour of the wait_rendering_timeout ioctl versus what
busy_ioctl returns.
v2: Too many people complained about unplug, new color is
flush_active.
v3: Kill the comment about the unplug moniker.
v4: s/un-active/inactive/
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We need the latest dma-buf code from Dave Airlie so that we can pimp
the backing storage handling code in drm/i915 with Chris Wilson's
unbound tracking and stolen mem backed gem object code.
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If any l3 rows have been previously remapped, we must remap them after
GPU reset/resume too.
v2: Just return (no warn) on remapping init if not IVB (Jesse)
Move the check of schizo userspace to i915_gem_l3_remap (Jesse)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel:
drm/i915: tune down the noise of the RP irq limit fail
drm/i915: Remove the error message for unbinding pinned buffers
drm/i915: Limit page allocations to lowmem (dma32) for i965
drm/i915: always use RPNSWREQ for turbo change requests
drm/i915: reject doubleclocked cea modes on dp
drm/i915: Adding TV Out Missing modes.
drm/i915: wait for a vblank to pass after tv detect
drm/i915: no lvds quirk for HP t5740e Thin Client
drm/i915: enable vdd when switching off the eDP panel
drm/i915: Fix PCH PLL assertions to not assume CRTC:PLL relationship
drm/i915: Always update RPS interrupts thresholds along with frequency
drm/i915: properly handle interlaced bit for sdvo dtd conversion
drm/i915: fix module unload since error_state rework
drm/i915: be more careful when returning -ENXIO in gmbus transfer
Wait request is poorly named IMO. After working with these functions for
some time, I feel it's much clearer to name the functions more
appropriately.
Of course we must update the callers to use the new name as well.
This leaves room within our namespace for a *real* wait request function
at some point.
Note to maintainer: this patch is optional.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This helps implement GL_ARB_sync but stops short of allowing full blown
sync objects. Finally we can use the new timed seqno waiting function
to allow userspace to wait on a buffer object with a timeout. This
implements that interface.
The IOCTL will take as input a buffer object handle, and a timeout in
nanoseconds (flags is currently optional but will likely be used for
permutations of flush operations). Users may specify 0 nanoseconds to
instantly check.
The wait ioctl with a timeout of 0 reimplements the busy ioctl. With any
non-zero timeout parameter the wait ioctl will wait for the given number
of nanoseconds on an object becoming unbusy. Since the wait itself does
so holding struct_mutex the object may become re-busied before this
completes. A similar but shorter race condition exists in the busy
ioctl.
v2: ETIME/ERESTARTSYS instead of changing to EBUSY, and EGAIN (Chris)
Flush the object from the gpu write domain (Chris + Daniel)
Fix leaked refcount in good case (Chris)
Naturally align ioctl struct (Chris)
v3: Drop lock after getting seqno to avoid ugly dance (Chris)
v4: check for 0 timeout after olr check to allow polling (Chris)
v5: Updated the comment. (Chris)
v6: Return -ETIME instead of -EBUSY when timeout_ns is 0 (Daniel)
Fix the commit message comment to be less ugly (Ben)
Add a warning to check the return timespec (Ben)
v7: Use DRM_AUTH for the ioctl. (Eugeni)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is now used intentionally to prevent proliferation of is-pinned
checks upon the inactive list following:
commit 1b50247a8d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Apr 24 15:47:30 2012 +0100
drm/i915: Remove the list of pinned inactive objects
Reported-and-tested-by: guang.a.yang@intel.com
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50075
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Broadwater and Crestline share a limitation that prevent it from
relocating general surface state above 4GiB. The only recourse we have
since any buffer object may be used as a relocation target is then to
limit all object allocations on 965g[m] to DMA32.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Insert a wait parameter in the code so we can possibly timeout on a
seqno wait if need be. The code should be functionally the same as
before because all the callers will continue to retry if an arbitrary
timeout elapses.
We'd like to have nanosecond granularity, but the only way to do this is
with hrtimer, and that doesn't fit well with the needs of this code.
v2: Fix rebase error (Chris)
Return proper time even in wedged + signal case (Chris + Ben)
Use timespec constructs (Ben)
Didn't take Daniel's advice regarding the Frankenstein-ness of the
function. I did try his advice, but in the end I liked the way the
original code looked, better.
v3: Make wakeups far less frequent for infinite waits (Chris)
v4: Remove dummy_wait variable (Daniel)
Use raw monotonic time instead of jiffies (made the code a bit cleaner) (Ben)
Added a couple of warnings (Ben)
v5: Remove warnings (Daniel)
Use more accurate time diff for default case (Daniel)
Bikeshed for setting the return timespec in timeout case (Daniel)
s/jiffies/time in one of the comments
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This adds handle->fd and fd->handle support to i915, this is to allow
for offloading of rendering in one direction and outputs in the other.
v2 from Daniel Vetter:
- fixup conflicts with the prepare/finish gtt prep work.
- implement ppgtt binding support.
Note that we have squat i-g-t testcoverage for any of the lifetime and
access rules dma_buf/prime support brings along. And there are quite a
few intricate situations here.
Also note that the integration with the existing code is a bit
hackish, especially around get_gtt_pages and put_gtt_pages. It imo
would be easier with the prep code from Chris Wilson's unbound series,
but that is for 3.6.
Also note that I didn't bother to put the new prepare/finish gtt hooks
to good use by moving the dma_buf_map/unmap_attachment calls in there
(like we've originally planned for).
Last but not least this patch is only compile-tested, but I've changed
very little compared to Dave Airlie's version. So there's a decent
chance v2 on drm-next works as well as v1 on 3.4-rc.
v3: Right when I've hit sent I've noticed that I've screwed up one
obj->sg_list (for dmar support) and obj->sg_table (for prime support)
disdinction. We should be able to merge these 2 paths, but that's
material for another patch.
v4: fix the error reporting bugs pointed out by ickle.
v5: fix another error, and stop non-gtt mmaps on shared objects
stop pread/pwrite on imported objects, add fake kmap
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In many places we wish to iterate over the rings associated with the
GPU, so refactor them to use a common macro.
Along the way, there are a few code removals that should be side-effect
free and some rearrangement which should only have a cosmetic impact,
such as error-state.
Note that this slightly changes the semantics in the hangcheck code:
We now always cycle through all enabled rings instead of
short-circuiting the logic.
v2: Pull in a couple of suggestions from Ben and Daniel for
intel_ring_initialized() and not removing the warning (just moving them
to a new home, closer to the error).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Added note to commit message about the small behaviour
change, suggested by Ben Widawsky.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Backmerge of drm-next to resolve a few ugly conflicts and to get a few
fixes from 3.4-rc6 (which drm-next has already merged). Note that this
merge also restricts the stencil cache lra evict policy workaround to
snb (as it should) - I had to frob the code anyway because the
CM0_MASK_SHIFT define died in the masked bit cleanups.
We need the backmerge to get Paulo Zanoni's infoframe regression fix
for gm45 - further bugfixes from him touch the same area and would
needlessly conflict.
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQEcBAABAgAGBQJPpvY9AAoJEHm+PkMAQRiGpEoIAJgbu+Y8gITnBK/wh9O6zy3S
5jie5KK4YWdbJsvO58WbNr3CyVIwGIqQ2dUZLiU59aBVLarlGw8xor0MmW+cZwhp
6fBHaf0qDYAV0MZjD+mnnExOiCRyISa2lPmsfu9dAWywh5KGe6/oAP6/qcXIyok3
KZyl3qQf4ENpaZPHwZPXCEkUvtuyHgNiszN+QXEadA3s19Ot4VGe9A3VGw+GNrSm
JqFIq3acQAbKa5BYaqf7TQC02v2FI7//eqt6QHxTqbE6a7LGbTvLfX3HlJ2mnfqa
1R6QHhM4y4OZDHbaMT2raHZ8WuLXzhehJzhP8Co7AHFOKwVKOb5XbcUr2RrukMU=
=HkMd
-----END PGP SIGNATURE-----
Merge tag 'v3.4-rc6' into drm-intel-next
Conflicts:
drivers/gpu/drm/i915/intel_display.c
Ok, this is a fun story of git totally messing things up. There
/shouldn't/ be any conflict in here, because the fixes in -rc6 do only
touch functions that have not been changed in -next.
The offending commits in drm-next are 14415745b2..1fa611065 which
simply move a few functions from intel_display.c to intel_pm.c. The
problem seems to be that git diff gets completely confused:
$ git diff 14415745b2..1fa611065
is a nice mess in intel_display.c, and the diff leaks into totally
unrelated functions, whereas
$git diff --minimal 14415745b2..1fa611065
is exactly what we want.
Unfortunately there seems to be no way to teach similar smarts to the
merge diff and conflict generation code, because with the minimal diff
there really shouldn't be any conflicts. For added hilarity, every
time something in that area changes the + and - lines in the diff move
around like crazy, again resulting in new conflicts. So I fear this
mess will stay with us for a little longer (and might result in
another backmerge down the road).
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The new wait_rendering ioctl also needs to check for an oustanding
lazy request, and we already duplicate that logic at three places. So
extract it.
While at it, also extract the code to check the gpu wedging state to
improve code flow.
v2: Don't use seqno as an outparam (Chris)
v3 by danvet: Kill stale comment and pimp commit message
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Even the horrible gen3 XvMC code has learned to do this
right by the time xf86-video-intel releases learned to do
kernel modesetting. So we can just disallow this.
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
... and shove allow_batchbuffer in there. More dragons will
follow suit.
There's the curious case that we allow this for KMS ...
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It turns out throttle had an almost identical bit of code to do the
wait. Now we can call the new helper directly. This is just a bonus,
and not needed for the overall series.
v2: remove irq_get/put which is now in __wait_seqno (Ben)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It's about to go away anyway. Just here to help bisection.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
i915_wait_request is actually a fairly large function encapsulating
quite a few different operations. Because being able to wait on seqnos
in various conditions is useful, extracting that bit of code to a helper
function seems useful
v2: pull the irq_get/put as well (Ben)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The only time irq_get should fail is during unload or suspend. Both of
these points should try to quiesce the GPU before disabling interrupts
and so the atomic polling should never occur.
This was recommended by Chris Wilson as a way of reducing added
complexity to the polled wait which I introduced in an RFC patch.
09:57 < ickle_> it's only there as a fudge for waiting after irqs
after uninstalled during s&r, we aren't actually meant to hit it
09:57 < ickle_> so maybe we should just kill the code there and fix the breakage
v2: return -ENODEV instead of -EBUSY when irq_get fails
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The waiting_seqno is not terribly useful, and as such we can remove it
so that we'll be able to extract lockless code.
v2: Keep the information for error_state (Chris)
Check if ring is initialized in hangcheck (Chris)
Capture the waiting ring (Chris)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: add some bikeshed to clarify a comment.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This extra bit of interrupt enabling code doesn't belong in the wait
seqno function. If anything we should pull it out to a helper so the
throttle code can also use it. The history is a bit vague, but I am
going to attempt to just dump it, unless someone can argue otherwise.
Removing this allows for a shared lock free wait seqno function. To keep
tabs on this issue though, the IER value is stored on error capture
(recommended by Chris Wilson)
v2: fixed typo EIR->IER (Ben)
Fix some white space (Ben)
Move IER capture to globally instead of per ring (Ben)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: ier is a 16 bit reg on gen2!]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This originates from a hack by me to quickly fix a bug in an earlier
patch where we needed control over whether or not waiting on a seqno
actually did any retire list processing. Since the two operations aren't
clearly related, we should pull the parameter out of the wait function,
and make the caller responsible for retiring if the action is desired.
The only function call site which did not get an explicit retire_request call
(on purpose) is i915_gem_inactive_shrink(). That code was already calling
retire_request a second time.
v2: don't modify any behavior excepit i915_gem_inactive_shrink(Daniel)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
I've missed this one.
v2: Chris Wilson noticed another register.
v3: Color choice improvements.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We always set it so there's no point in checking. We could
instead add a bit that tells us whether gem is actually
initialized (i.e. either kms or gem_init_ioctl called), but
that's imho not worth it.
So just rip it out.
There's a little change in the wait_ring timeout, but we've never
run with anything else than the 60 second timeout, even on dri1
userspace.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This ioctl used in a kms driver is only useful to create massive
havoc.
v2: Bail out with -ENODEV as suggested by Chris Wilson.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The use of the mm_list by deferred-free breaks the following patches to
extend the range of objects tracked. We can simplify things if we just
make the unbind during free uninterrutible.
Note that unbinding should never fail, because we hold an additional
reference on every active object. Only the ilk vt-d workaround breaks
this, but already takes care of not failing by waiting for the gpu to
quiescent non-interruptible. But the existence of the deferred free
list casted some doubts on this theory, hence WARN if the unbind fails
and only then retry non-interruptible.
We can kill this additional code after a release in case the theory is
indeed right and no one has hit that WARN.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Simplify object tracking by removing the inactive but pinned list. The
only place where this was used is for counting the available memory,
which is just as easy performed by checking all objects on the rare
occasions it is required (application startup). For ease of debugging,
we keep the reporting of pinned objects through the error-state and
debugfs.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This was only used by one external caller who would just be as happy
with evict-everything, so perform the replacement and make the function
private.
In the process we note that unbinding the inactive list should not fail,
and make it a warning instead.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Currently, we only bump the inactive LRU of an object when we bind
into the GTT for a page-fault. As the object may be used many times
before its mapping is zapped, we do not mark it as active as
frequently as we should. Userspace should be calling set-to-GTT-domain
before each pointer deference (for synchronous access) and so is a
good place to mark the buffer as active.
Marking the buffer as recently used places it at the end of the
inactive eviction queue, though still before anything with outstanding
rendering. This reduces the likelihood of evicting a buffer that is
going to be used again by the CPU in the near future. This way we can
hopefully avoid to kick out upload buffers right before we use them on
the gpu.
Note that we need to check that the object is not active or pinned,
for otherwise we create havoc on the active/pinned lists, which also
use obj->mm_list.
The active lists are sorted by and evicted in last GPU rendering
order, access by the CPU to a still active buffer therefore does not
affect its eviction ordering. Pinned objects are currently excluded
from eviction, therefore the only list that we need to bump for GTT
access by the CPU is the inactive list.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Added further explanations to the commit message as discussed
on irc.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
... and put them to so good use.
Note that there's functional change in vlv clock gating code, we now
no longer spuriously read back the current value of the bit. According
to Bspec the high bits should always read zero, so ORing this in
should have no effect.
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Rename obj->tiling_changed to obj->fence_dirty so that it is clear that
it flags when the parameters for an active fence (including the
no-fence) register are changed.
Also, do not set this flag when the object does not have a fence
register allocated currently and the gpu does not depend upon the
unfence. This case works exactly like when a tiled object lost its
fence and hence does not need additional handling for the tiling
change in the code.
v2: Use fence_dirty to better express what the flag tracks and add a few
more details to the comments to serve as a reminder of how the GPU also
uses the unfenced register slot.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Add some bikeshed to the commit message about the stricter
use of fence_dirty.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As with one of the earlier patches in the series, we're forced to cast
for copy_[to|from]_user. Again because of the nature of the GEN x86
exclusivity, this should be safe.
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
[danvet: Added some bikeshed.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This continues the theme started with vm_brk() and vm_munmap():
vm_mmap() does the same thing as do_mmap(), but additionally does the
required VM locking.
This uninlines (and rewrites it to be clearer) do_mmap(), which sadly
duplicates it in mm/mmap.c and mm/nommu.c. But that way we don't have
to export our internal do_mmap_pgoff() function.
Some day we hopefully don't have to export do_mmap() either, if all
modular users can become the simpler vm_mmap() instead. We're actually
very close to that already, with the notable exception of the (broken)
use in i810, and a couple of stragglers in binfmt_elf.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We can also take advantage of the new 'no retire' mode for seqno waiting
to avoid having to take a reference on the old fence object whilst
flushing an existing fence.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now that we have a routine that is able to clear the fences as well as
setup up the register for a tiled object, remove the surplus routines to
clear the fences.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
One clarification that we make is to the existing semantics of
obj->tiling_changed to only mean that we need to update an associated
fence register (including the NO_FENCE when executing an untiled but
fenced GPU command). If we do not have a fence register or pending
fenced GPU access for the object (after put_fence() for example), then
we can clear the tiling_changed flag as any fence will necessarily be
rewritten upon acquisition.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Update the existing architecture specific fence writing routines to
either update the fence to point to a tiled object or to clear them in
preparation to remove the other fence writing routes.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As i915_wait_request() will first check for an already passed seqno,
doing it also in the caller is a waste of space for a cold path.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As the fences are stored in LRU order, we can simply reuse the oldest if
we do not have an unused register.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we now never pipeline a fence update, obj->last_fenced_ring is always
the same as the obj->ring whenever obj->last_fenced_seqno is active, so
remove it.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we now no longer track a pipelined fence change, we never use
ring->setup_seqno and can kill it.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Step 2 is then to replace the pipelined parameter with NULL and perform
constant folding to remove dead code.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We never succeeded in getting pipelined fencing to work (unresolved
spurious GPU hangs), so begin the process of dismantling and removal
the broken code.
Step 1 is the removal of the pipeline parameter to get_fence().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
For some reason snb has 2 fields to set ppgtt cacheability. This one
here does not exist on gen7.
This might explain why ppgtt wasn't a win on snb like on ivb - not
enough pte caching.
v2: Fixup rebase fail.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Bspec says that we need to set this: vol1c.3 "Blitter Command
Streamer", Section 1.1.2.1 "GAB_CTL_REG - GAB Unit Control Register".
We don't really rely on pagefaults, but who knows what this all
affects.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQEbBAABAgAGBQJPi3XOAAoJEHm+PkMAQRiGnsUH9RjHwH4YFVyuP/DKtKa6zs74
wqkpT15yITQ5WWMog4JaJFFg5rJCUd8QZr7AS/HSn0ijDyZX5VU7Rcs9cMudDzNR
H/5K/AscS4fjb0HwWVqoltTWHRb9QGSwVN3+E3VCDLt9P89YJ0o3QztkkuEX5dkZ
jc7reVXTfRnCcILEa9jleOzrn+OLM3j/jAjQ2hGunl8EDLzD4b17HHPoli4jEZ/5
5ibpSVsPD+AqzN+glbXvYjVItl12D0IQos/JdOwfuZriCVWLxysSSwHZTbPCyvBZ
LHH4HR5T+XLSXbjJeNkUFHLzqU+d5gVRadIoWtJCxqxFjKbOs2YtzJ5Ai0nDiw==
=kTkC
-----END PGP SIGNATURE-----
Merge tag 'v3.4-rc3' into drm-intel-next-queued
Backmerge Linux 3.4-rc3 into drm-intel-next to resolve a few things
that conflict/depend upon patches in -rc3:
- Second part of the Sandybridge workaround series - it changes some
of the same registers.
- Preparation for Chris Wilson's fencing cleanup - we need the fix
from -rc3 merged before we can move around all that code.
- Resolve the gmbus conflict - gmbus has been disabled in 3.4 again,
but should be enabled on all generations in 3.5.
Conflicts:
drivers/gpu/drm/i915/intel_i2c.c
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
... we will botch up the bit17 swizzling. Furthermore tiled pwrite is
a (now) unused slowpath, so no one really cares.
This fixes the last swizzling issues I have with i-g-t on my bit17
swizzling i915G. No regression, it's been broken since the dawn of
gem, but it's nice for regression tracking when really _all_ i-g-t
tests work.
Actually this is not true, Chris Wilson noticed while reviewing this
patch that the commit
commit d9e86c0ee6
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed Nov 10 16:40:20 2010 +0000
drm/i915: Pipelined fencing [infrastructure]
contained a functional change that broke things.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Waiting for seqno-1 in our object synchronization code is an
implementation detail given how we've decided to do the waits within the
rest of our code.
Requested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This fixes a long standing issue where emitting the semaphore updates
may have failed, but we've already updated our internal data structure.
Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When I extracted the synchronization code for implementing semaphorified
pageflips (74f5f6e0), I neglected the non pipelined case which also
calls this code. The modesetting code wants to make sure the object has
finished rendering to the frame before configuring the scanout (ie.
non-pipelined case).
As a result of a follow on discussion on IRC, I've decided to add a
comment about the function itself which received much inspiration from
Chris as well. So really, this patch was ghost-written by Chris :).
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Similar to allowing a buffer to be simultaneously read by the GPU and
through the GTT, we wish to allow readback of the pages through the CPU
domain whilst they are also being read by the GPU. Domain coherency
is managed by allowing multiple readers, but only a single writer.
This is used by mesa for its program cache which it may search for every
new program every frame and then renews should it need to add. During
renewal, mesa copies the program bo currently executing through a CPU
mapping onto the new bo. This patch allows the search and that copy to
proceed without causing a stall on the current batch.
Testcase: i-g-t/tests/gem_cpu_concurrent_blit
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In theory this will have performance and power improvements. Performance
because we don't need to stall when the scanout BO is busy, and power
because we don't have to stall when the BO is busy (and the ring can
even go to sleep if the HW supports it).
v2:
squash 2 patches into 1 (me)
un-inline the enable_semaphores function (Daniel)
remove comment about SNB hangs from i915_gem_object_sync (Chris)
rename intel_enable_semaphores to i915_semaphore_is_enabled (me)
removed page flip comment; "no why" (Chris)
To address other comments from Daniel (irc):
update the comment to say 'vt-d is crap, don't enable semaphores'
- I think you misinterpreted Chris' comment, it already exists.
checking out whether we can pageflip on the render ring on ivb (didn't
work on early silicon)
- We don't want to enable workarounds for early silicon unless we have
to.
- I can't find any references in the docs about this.
optionally use it if the fb is already busy on the render ring
- This should be how the code already worked, unless I am
misunderstanding your meaning.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
By simplifying the rules to calling get_fence when writing to the
through the GTT in a tiled manner, and calling put_fence before writing
to the object through the GTT in a linear manner, the code becomes
clearer and there is less chance of making a mistake.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[danvet: fixed up conflict with ppgtt code and spelling in a new
comment.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Daniel Vetter wrote
First pull request for 3.5-next, slightly large than usual because new
things kept coming in since the last pull for 3.4.
Highlights:
- first batch of hw enablement for vlv (Jesse et al) and hsw (Eugeni). pci
ids are not yet added, and there's still quite a few patches to merge
(mostly modesetting). To make QA easier I've decided to merge this stuff
in pieces.
- loads of cleanups and prep patches spurred by the above. Especially vlv
is a real frankenstein chip, but also hsw is stretching our driver's
code design. Expect more to come in this area for 3.5.
- more gmbus fixes, cleanups and improvements by Daniel Kurtz. Again,
there are more patches needed (and some already queued up), but I wanted
to split this a bit for better testing.
- pwrite/pread rework and retuning. This series has been in the works for
a few months already and a lot of i-g-t tests have been created for it.
Now it's finally ready to be merged. Note that one patch in this series
touches include/pagemap.h, that patch is acked-by akpm.
- reduce mappable pressure and relocation throughput improvements from
Chris.
- mmap offset exhaustion mitigation by Chris Wilson.
- a start at figuring out which codepaths in our messy dri1/ums+gem/kms
driver we actually need to support by bailing out of unsupported case.
The driver now refuses to load without kms on gen6+ and disallows a few
ioctls that userspace never used in certain cases. More of this will
definitely come.
- More decoupling of global gtt and ppgtt.
- Improved dual-link lvds detection by Takashi Iwai.
- Shut up the compiler + plus fix the fallout (Ben)
- Inverted panel brightness handling (mostly Acer manages to break things
in this way).
- Small fixlets and adjustements and some minor things to help debugging.
Regression-wise QA reported quite a few issues on ivb, but all of them
turned out to be hw stability issues which are already fixed in
drm-intel-fixes (QA runs the nightly regression tests on -next alone,
without -fixes automatically merged in). There's still one issue open on
snb, it looks like occlusion query writes are not quite as cache coherent
as we've expected. With some of the pwrite adjustements we can now
reliably hit this. Kernel workaround for it is in the works."
* 'drm-intel-next' of git://people.freedesktop.org/~danvet/drm-intel: (101 commits)
drm/i915: VCS is not the last ring
drm/i915: Add a dual link lvds quirk for MacBook Pro 8,2
drm/i915: make quirks more verbose
drm/i915: dump the DMA fetch addr register on pre-gen6
drm/i915/sdvo: Include YRPB as an additional TV output type
drm/i915: disallow gem init ioctl on ilk
drm/i915: refuse to load on gen6+ without kms
drm/i915: extract gt interrupt handler
drm/i915: use render gen to switch ring irq functions
drm/i915: rip out old HWSTAM missed irq WA for vlv
drm/i915: open code gen6+ ring irqs
drm/i915: ring irq cleanups
drm/i915: add SFUSE_STRAP registers for digital port detection
drm/i915: add WM_LINETIME registers
drm/i915: add WRPLL clocks
drm/i915: add LCPLL control registers
drm/i915: add SSC offsets for SBI access
drm/i915: add port clock selection support for HSW
drm/i915: add S PLL control
drm/i915: add PIXCLK_GATE register
...
Conflicts:
drivers/char/agp/intel-agp.h
drivers/char/agp/intel-gtt.c
drivers/gpu/drm/i915/i915_debugfs.c
This fixes a resume regression introduced in
commit 7dd4906586
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed Mar 21 10:48:18 2012 +0000
drm/i915: Mark untiled BLT commands as fenced on gen2/3
which fixed fencing tracking for untiled blt commands.
A side effect of that patch was that now also untiled objects have a
non-zero obj->last_fenced_seqno to track when a fence can be set up
after a pipelined tiling change. Unfortunately this was only cleared
by the fence setup and teardown code, resulting in tons of untiled but
inactive objects with non-zero last_fenced_seqno.
Now after resume we completely reset the seqno tracking, both on the
driver side (by setting dev_priv->next_seqno = 1) and on the hw side
(by allocating a new hws page, which contains the seqnos). Hilarity
and indefinite waits ensued from the stale seqnos in
obj->last_fenced_seqno from before the suspend.
The fix is to properly clear the fencing tracking state like we
already do for the normal gpu rendering while moving objects off the
active list.
Reported-and-tested-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Jiri Slaby <jslaby@suse.cz>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Ums is already disabled, but on ilk we can additionally disable gem
initialization when using user mode setting. Upstream never support
ilk without kernel modesetting and not even the RHEL ilk ums backport
needs gem - that driver is based on xf86-video-intel version 2.2,
which is pre-gem.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The BLT commands on gen2/3 utilize the fence registers and so we cannot
modify any fences for the object whilst those commands are in flight.
Currently we marked tiled commands as occupying a fence, but forgot to
restrict the untiled commands from preventing a fence being assigned
before they were completed.
One side-effect is that we ten have to double check that a fence was
allocated for a fenced buffer during move-to-active.
Reported-by: Jiri Slaby <jirislaby@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43427
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47990
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Testcase: i-g-t/tests/gem_tiled_after_untiled_blt
Tested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The ppgtt page directory lives in a snatched part of the gtt pte
range. Which naturally gets cleared on hibernate when we pull the
power. Suspend to ram (which is what I've tested) works because
despite the fact that this is a mmio region, it is actually back by
system ram.
Fix this by moving the page directory setup code to the ppgtt init
code (which gets called on resume).
This fixes hibernate on my ivb and snb.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Haven't seen this yet, but it doesn't hurt.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Beside helping the compiler untangle this maze they double-up as
documentation for which parts of the code aren't performance-critical
but just around to keep old (but already dead-slow) userspace from
breaking.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The issue is that with inline clflushing the clflushing isn't properly
swizzled. Fix this by
- always clflushing entire 128 byte chunks and
- unconditionally flush before writes when swizzling a given page.
We could be clever and check whether we pwrite a partial 128 byte
chunk instead of a partial cacheline, but I've figured that's not
worth it.
Now the usual approach is to fold this into the original patch series, but
I've opted against this because
- this fixes a corner case only very old userspace relies on and
- I'd like to not invalidate all the testing the pwrite rewrite has gotten.
This fixes the regression notice by tests/gem_tiled_partial_prite_pread
from i-g-t. Unfortunately it doesn't fix the issues with partial pwrites to
tiled buffers on bit17 swizzling machines. But that is also broken without
the pwrite patches, so likely a different issue (or a problem with the
testcase).
v2: Simplify the patch by dropping the overly clever partial write
logic for swizzled pages.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
drm/i915 wants to read/write more than one page in its fastpath
and hence needs to prefault more than PAGE_SIZE bytes.
Add new functions in filemap.h to make that possible.
Also kill a copy&pasted spurious space in both functions while at it.
v2: As suggested by Andrew Morton, add a multipage parameter to both
functions to avoid the additional branch for the pagemap.c hotpath.
My gcc 4.6 here seems to dtrt and indeed reap these branches where not
needed.
v3: Becaus I couldn't find a way around adding a uaddr += PAGE_SIZE to
the filemap.c hotpaths (that the compiler couldn't remove again),
let's go with separate new functions for the multipage use-case.
v4: Adjust comment to CodingStlye and fix spelling.
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
While moving around things, this two functions slowly grew out of any
sane bounds. So extract a few lines that do the copying and
clflushing. Also add a few comments to explain what's going on.
v2: Again do s/needs_clflush/needs_clflush_after/ in the write paths
as suggested by Chris Wilson.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It's around 20% faster.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It's too expensive to move it around just for that pwrite, especially
when we're trashing on the mappable gtt part like crazy.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In micro-benchmarking of the usual pwrite use-pattern of alternating
pwrites with gtt domain reads from the gpu, this yields around 30%
improvement of pwrite throughput across all buffers size. The trick is
that we can avoid clflush cachelines that we will overwrite completely
anyway.
Furthermore for partial pwrites it gives a proportional speedup on top
of the 30% percent because we only clflush back the part of the buffer
we're actually writing.
v2: Simplify the clflush-before-write logic, as suggested by Chris
Wilson.
v3: Finishing touches suggested by Chris Wilson:
- add comment to needs_clflush_before and only set this if the bo is
uncached.
- s/needs_clflush/needs_clflush_after/ in the write paths to clearly
differentiate it from needs_clflush_before.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The pagemap.h prefault helpers do the prefaulting by simply writing
some data into every page. Hence we should not prefault when we're not
yet commited to to actually writing data to userspace. The problem is
now that
- we can't prefault while holding dev->struct_mutex for we could
deadlock with our own pagefault handler
- we need to grab dev->struct_mutex before copying to sync up with any
outsanding gpu writes.
Therefore only prefault when we're dropping the lock the first time in
the pread slowpath - at that point we're committed to the write, don't
wait on the gpu anymore and hence won't return early (with e.g.
-EINTR).
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With the proper prefault, it's extremely unlikely that we fall back
to the gtt slowpath.
So just kill it and use the shmem_pwrite path as fallback.
To further clean up the code, move the preparatory gem calls into the
respective pwrite functions. This way the gtt_fast->shmem fallback
is much more obvious.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This speeds up pwrite and pread from ~120 µs ro ~100 µs for
reading/writing 1mb on my snb (if the backing storage pages
are already pinned, of course).
v2: Chris Wilson pointed out a glaring page reference bug - I've
unconditionally dropped the reference. With that fixed (and the
associated reduction of dirt in dmesg) it's now even a notch faster.
v3: Unconditionaly grab a page reference when dropping
dev->struct_mutex to simplify the code-flow.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
~120 µs instead fo ~210 µs to write 1mb on my snb. I like this.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
No longer needed.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>