This has been sprinkled all over the place in dev_priv. I think
it'd be good to also move all the code into a separate file like
i915_gem_error.c, but that's for another patch.
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tha one is really big, since it contains tons of comments explaining
how things work. Which is nice ;-)
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We already had a mapping in both (minus the phys_addr in AGP).
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
And, move it to where the rest of the logic is.
There is some slight functionality changes. There was extra paranoid
checks in AGP code making sure we never do idle maps on gen2 parts. That
was not duplicated as the simple PCI id check should do the right thing.
v2: use IS_GEN5 && IS_MOBILE check instead. For now, this is the same as
IS_IRONLAKE_M but is more future proof. The workaround docs hint that
more than one platform may be effected, but we've never seen such a
platform in the wild. (Rodrigo, Daniel)
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> (v1)
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Add a new "Automatic" mode to the "Broadcast RGB" range property.
When selected the driver automagically selects between full range and
limited range output.
Based on CEA-861 [1] guidelines, limited range output is selected if the
mode is a CEA mode, except 640x480. Otherwise full range output is used.
Additionally DVI monitors should most likely default to full range
always.
As per DP1.2a [2] DisplayPort should always use full range for 18bpp, and
otherwise will follow CEA-861 rules.
NOTE: The default value for the property will now be "Automatic"
so some people may be affected in case they're relying on the
current full range default.
[1] CEA-861-E - 5.1 Default Encoding Parameters
[2] VESA DisplayPort Ver.1.2a - 5.1.1.1 Video Colorimetry
v2: Use has_hdmi_sink to check if a HDMI monitor is present
v3: Add information about relevant spec chapters
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The purpose of the gtt structure is to help isolate our gtt specific
properties from the rest of the code (in doing so it help us finish the
isolation from the AGP connection).
The following members are pulled out (and renamed):
gtt_start
gtt_total
gtt_mappable_end
gtt_mappable
gtt_base_addr
gsm
The gtt structure will serve as a nice place to put gen specific gtt
routines in upcoming patches. As far as what else I feel belongs in this
structure: it is meant to encapsulate the GTT's physical properties.
This is why I've not added fields which track various drm_mm properties,
or things like gtt_mtrr (which is itself a pretty transient field).
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
[Ben modified commit messages]
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With the assertion from the previous patch in place, it should be safe
to get rid gtt_mappable_total. Keeps things saner to not have to track
the same info in two places.
In order to keep the diff as simple as possible and keep with the
existing gtt_setup semantics we opt to keep gtt_mappable_end. It's not
as consistent with the 'total' used in the previous patch, but that can
be fixed later.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
[Ben modified commit message]
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It's duplicated in the more useful gtt_total.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As a means to investigate some bad system behaviour related to the
purging of the active, inactive and unbound lists, it is useful to be
able to manually control when those lists should be cleared.
v2: use _safe list iterators as we kick objects from the list as we
walk.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Add a small comment explaining why we don't need to check and
wait for gpu resets, acked by Chris on irc.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The two functions are rather similar, so merge them.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This variable is only used locally in the irq postinstall
functions for ivybridge and ironlake.
Signed-off-by: Egbert Eich <eich@suse.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Daniel writes:
- seqno wrap fixes and debug infrastructure from Mika Kuoppala and Chris
Wilson
- some leftover kill-agp on gen6+ patches from Ben
- hotplug improvements from Damien
- clear fb when allocated from stolen, avoids dirt on the fbcon (Chris)
- Stolen mem support from Chris Wilson, one of the many steps to get to
real fastboot support.
- Some DDI code cleanups from Paulo.
- Some refactorings around lvds and dp code.
- some random little bits&pieces
* tag 'drm-intel-next-2012-12-21' of git://people.freedesktop.org/~danvet/drm-intel: (93 commits)
drm/i915: Return the real error code from intel_set_mode()
drm/i915: Make GSM void
drm/i915: Move GSM mapping into dev_priv
drm/i915: Move even more gtt code to i915_gem_gtt
drm/i915: Make next_seqno debugs entry to use i915_gem_set_seqno
drm/i915: Introduce i915_gem_set_seqno()
drm/i915: Always clear semaphore mboxes on seqno wrap
drm/i915: Initialize hardware semaphore state on ring init
drm/i915: Introduce ring set_seqno
drm/i915: Missed conversion to gtt_pte_t
drm/i915: Bug on unsupported swizzled platforms
drm/i915: BUG() if fences are used on unsupported platform
drm/i915: fixup overlay stolen memory leak
drm/i915: clean up PIPECONF bpc #defines
drm/i915: add intel_dp_set_signal_levels
drm/i915: remove leftover display.update_wm assignment
drm/i915: check for the PCH when setting pch_transcoder
drm/i915: Clear the stolen fb before enabling
drm/i915: Access to snooped system memory through the GTT is incoherent
drm/i915: Remove stale comment about intel_dp_detect()
...
Conflicts:
drivers/gpu/drm/i915/intel_display.c
These are useful for investigating hangs involving WAIT_FOR_EVENT.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Apply a droplet of Future-Proof in the if-ladder.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The iomapping of the register region has historically been a uint32_t
for the obvious reason that our PTE size was always 4b. In the future
however, we cannot make this assumption.
By making the type void, it makes the upcoming pointer math we will do
much easier, and hopefully gives the compiler opportunities to warn us
when we do stupid things.
v2: Cast to __iomem, caught by Ville
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Fixup __iomem issue for real.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This removes an unused field from the AGP structure and moves it into
the dev_priv structure (with a slightly better name). This builds upon
the kill-agp series already merged.
GSM is a well defined term in the bspec:
GSM: Graphics Stolen Memory
GTT stolen space is defined for storage of the GFX GTT entries in
physical memory. IA can not access GSM directly , it can only access via
GTTMMADR. GT can access GSM directly or through GTTMMADR.
This is not the entire stolen space.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This really should have been part of the kill agp series.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
commit 5774506f15
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed Nov 21 13:04:04 2012 +0000
drm/i915: Borrow our struct_mutex for the direct reclaim
added a nice trick to steal the struct_mutex lock in the shrinker if
it's the current task holding it. But this also caused the requirement
that every place which allocates memory needs to be careful about the
gem state of objects, since the shrinker could have pulled the rug out
from under it. We've usually solved this by carefully preallocating
things or ensure that buffers are pinned already.
But the shrinker also reaps mmap offset, so allocating those needs to
be careful, too. Now that code has been factored out into some common
helpers, so either we have fragile code depending upon the common
helper not doing something we don't want it to do. Or we need to
reimplement the mmap offset creation and so also leak implementation
details into our code.
Since this all results in leaky abstraction, cop out by disabling the
lock borrowing trick while calling down into the helpers. That way our
craziness is nicely confined to files in drm/i915.
v2: Split out the change to create_mmap_offset as request by Chris Wilson.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This function can be used to set the driver's next_seqno
to arbitrary value.
i915_gem_set_seqno() will idle the gpu, retire outstanding
requests, clear the semaphore mailboxes and set the hardware
status page's seqno index.
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now that Chris Wilson demonstrated that the key for stability on early
gen 2 is to simple _never_ exchange the physical backing storage of
batch buffers I've tried a stab at a kernel solution. Doesn't look too
nefarious imho, now that I don't try to be too clever for my own good
any more.
v2: After discussing the various techniques, we've decided to always blit
batches on the suspect devices, but allow userspace to opt out of the
kernel workaround assume full responsibility for providing coherent
batches. The principal reason is that avoiding the blit does improve
performance in a few key microbenchmarks and also in cairo-trace
replays.
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet:
- Drop the hunk which uses HAS_BROKEN_CS_TLB to implement the ring
wrap w/a. Suggested by Chris Wilson.
- Also add the ACTHD check from Chris Wilson for the error state
dumping, so that we still catch batches when userspace opts out of
the w/a.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
I'm not really sure, since the w/a entry is as thin on details as
ever, and Bspec doesn't say anything about it. But I've figured only
dispatching to rows 0&1 instead of all four should be the right thing
for GT1.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
[danvet: Add the missing snb server GT1 to the check, spotted by Chris
Wilson.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Spinning for up to 200 us with interrupts locked out is not good. So
let's just spin (and even that seems to be excessive).
And we don't call these functions from interrupt context, so this is
not required. Besides that doing anything in interrupt contexts which
might take a few hundred us is a no-go. So just convert the entire
thing to a mutex. Also move the mutex-grabbing out of the read/write
functions (add a WARN_ON(!is_locked)) instead) since all callers are
nicely grouped together.
Finally the real motivation for this change: Dont grab the modeset
mutex in the dpio debugfs file, we don't need that consistency. And
correctness of the dpio interface is ensured with the dpio_lock.
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
For GMCH platforms we set up the hpd irq registers in the irq
postinstall hook. But since we only enable the irq sources we actually
need in PORT_HOTPLUG_EN/STATUS, taking dev_priv->hotplug_supported_mask
into account, no hpd interrupt sources is enabled since
commit 52d7ecedac
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Sat Dec 1 21:03:22 2012 +0100
drm/i915: reorder setup sequence to have irqs for output setup
Wrongly set-up interrupts also lead to broken hw-based load-detection
on at least GM45, resulting in ghost VGA/TV-out outputs.
To fix this, delay the hotplug register setup until after all outputs
are set up, by moving it into a new dev_priv->display.hpd_irq_callback.
We might also move the PCH_SPLIT platforms to such a setup eventually.
Another funny part is that we need to delay the fbdev initial config
probing until after the hpd regs are setup, for otherwise it'll detect
ghost outputs. But we can only enable the hpd interrupt handling
itself (and the output polling) _after_ that initial scan, due to
massive locking brain-damage in the fbdev setup code. Add a big
comment to explain this cute little dragon lair.
v2: Encapsulate all the fbdev handling by wrapping the move call into
intel_fbdev_initial_config in intel_fb.c. Requested by Chris Wilson.
v3: Applied bikeshed from Jesse Barnes.
v4: Imre Deak noticed that we also need to call intel_hpd_init after
the drm_irqinstall calls in the gpu reset and resume paths - otherwise
hotplug will be broken. Also improve the comment a bit about why
hpd_init needs to be called before we set up the initial fbdev config.
Bugzilla: Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54943
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> (v3)
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If wrap just happened we need to prevent emitting waits for
pre wrap values. Detect this and emit no-ops instead.
v2: Use olr > seqno to detect wrap instead of *seqno == 0
as suggested by Chris Wilson.
v3: Use last used seqno to detect the wraparound. From
Chris Wilson
v4: Fixed unnecessary last_seqno assigment
References: https://bugs.freedesktop.org/show_bug.cgi?id=57967
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If we fail to set the bit when needed we get some nice FDI link
training failures (AKA "black screen on VGA output").
While we don't really know how to properly choose whether we need to
set the bit or not (VBT?), just read the initial value set by the BIOS
and store it for later usage.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We need this code to init the PCH SSC refclk and the FDI registers.
The BIOS does this too and that's why VGA worked before this patch,
until you tried to suspend the machine...
This patch implements the "Sequence to enable CLKOUT_DP for FDI usage
and configure PCH FDI/IO" from our documentation.
v2:
- Squash Damien Lespiau's reset spelling fix on top.
- Add a comment that we don't need to bother about the ULT special
case Damien noticed, since ULT won't have VGA.
- Add a comment to rip out the SDV codepaths once haswell ships for
real.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This way we should be able to write mPHY registers using the Sideband
Interface in the next commit. Also fixed some syntax oddities in the
related code.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
On a machine with bit17 swizzling, we need to store the bit17 of the
physical page address in put-pages. This requires a memory allocation,
on average less than a page, which may be difficult to satisfy is the
request to put-pages is on behalf of the shrinker. We could allow that
allocation to pull from the reserved memory pools, but it seems much
safer to preallocate the array for tiled objects on affected machines.
v2: Export i915_gem_object_needs_bit17_swizzle() for reuse.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Both the dp and fdi code use the exact same computations (ignore minor
differences in conversion between bits and bytes).
This makes it even more apparent that we have a _massive_ mess between
cpu transcoder/fdi link/pch transcoder and pch link settings. And also
that we have hilarious amounts of confusion between edp and dp
(despite that they're identical at a link level).
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
At least on the platforms that have a dp aux irq and also have it
enabled - vlvhsw should have one, too. But I don't have a machine to
test this on. Judging from docs there's no dp aux interrupt for gm45.
Also, I only have an ivb cpu edp machine, so the dp aux A code for
snb/ilk is untested.
For dpcd probing when nothing is connected it slashes about 5ms of cpu
time (cpu time is now negligible), which agrees with 3 * 5 400 usec
timeouts.
A previous version of this patch increases the time required to go
through the dp_detect cycle (which includes reading the edid) from
around 33 ms to around 40 ms. Experiments indicated that this is
purely due to the irq latency - the hw doesn't allow us to queue up
dp aux transactions and hence irq latency directly affects throughput.
gmbus is much better, there we have a 8 byte buffer, and we get the
irq once another 4 bytes can be queued up.
But by using the pm_qos interface to request the lowest possible cpu
wake-up latency this slowdown completely disappeared.
Since all our output detection logic is single-threaded with the
mode_config mutex right now anyway, I've decide not ot play fancy and
to just reuse the gmbus wait queue. But this would definitely prep the
way to run dp detection on different ports in parallel
v2: Add a timeout for dp aux transfers when using interrupts - the hw
_does_ prevent this with the hw-based 400 usec timeout, but if the
irq somehow doesn't arrive we're screwed. Lesson learned while
developing this ;-)
v3: While at it also convert the busy-loop to wait_for_atomic, so that
we don't run the risk of an infinite loop any more.
v4: Ensure we have the smallest possible irq latency by using the
pm_qos interface.
v5: Add a comment to the code to explain why we frob pm_qos. Suggested
by Chris Wilson.
v6: Disable dp irq for vlv, that's easier than trying to get at docs
and hw.
v7: Squash in a fix for Haswell that Paulo Zanoni tracked down - the
dp aux registers aren't at a fixed offset any more, but can be on the
PCH while the DP port is on the cpu die.
Reviewed-by: Imre Deak <imre.deak@intel.com> (v6)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We need two special things to properly wire this up:
- Add another argument to gmbus_wait_hw_status to pass in the
correct interrupt bit in gmbus4.
- Since we can only get an irq for one of the two events we want,
hand-roll the wait_event_timeout code so that we wake up every
jiffie and can check for NAKs. This way we also subsume gmbus
support for platforms without interrupts (or where those are not
yet enabled).
The important bit really is to only enable one gmbus interrupt source
at the same time - with that piece of lore figured out, this seems to
work flawlessly.
Ben Widawsky rightfully complained the lack of measurements for the
claimed benefits (especially since the first version was actually
broken and fell back to bit-banging). Previously reading the 256 byte
hdmi EDID takes about 72 ms here. With this patch it's down to 33 ms.
Given that transfering the 256 bytes over i2c at wire speed takes
20.5ms alone, the reduction in additional overhead is rather nice.
v2: Chris Wilson wondered whether GMBUS4 might contain some set bits
when booting up an hence result in some spurious interrupts. Since we
clear GMBUS4 after every wait and we do gmbus transfer really early in
the setup sequence to detect displays the window is small, but still
be paranoid and clear it properly.
v3: Clarify the comment that gmbus irq generation can only support one
kind of event, why it bothers us and how we work around that limit.
Cc: Daniel Kurtz <djkurtz@chromium.org>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Otherwise the new&shiny irq-driven gmbus and dp aux code won't work that
well. Noticed since the dp aux code doesn't have an automatic fallback
with a timeout (since the hw provides for that already).
v2: Simple move drm_irq_install before intel_modeset_gem_init, as
suggested by Ben Widawsky.
v3: Now that interrupts are enabled before all connectors are fully
set up, we might fall over serving a HPD interrupt while things are
still being set up. Instead of jumping through massive hoops and
complicating the code with a separate hpd irq enable step, simply
block out the hotplug work item from doing anything until things are
in place.
v4: Actually, we can enable hotplug processing only after the fbdev is
fully set up, since we call down into the fbdev from the hotplug work
functions. So stick the hpd enabling right next to the poll helper
initialization.
v5: We need to enable irqs before intel_modeset_init, since that
function sets up the outputs.
v6: Fixup cleanup sequence, too.
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
- __iomem where there is none (I love how we mix these things up).
- Use gfp_t instead of an other plain type.
- Unconfuse one place about enum pipe vs enum transcoder - for the pch
transcoder we actually use the pipe enum. Fixup the other cases
where we assign the pipe to the cpu transcoder with explicit casts.
- Declare the mch_lock properly in a header.
There is still a decent mess in intel_bios.c about __iomem, but heck,
this is x86 and we're allowed to do that.
Makes-sparse-happy: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Use a space after the cast consistently and fix up the
newly-added cast in i915_irq.c to properly use __iomem.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Be specific for the GPU domains so that we can detect if userspace ever
passed in an invalid combination, as well as accurately reflect the
known GPU domains when printing state.
Fixes i-g-t/gem_exec_bad_domains
References: https://bugs.freedesktop.org/show_bug.cgi?id=57826
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The primary purpose of this was to debug some use-after-free memory
corruption that was causing an OOPS inside drm/i915. As it turned out
the corruption was being caused elsewhere and i915.ko as a major user of
many objects was being hit hardest.
Indeed as we do frequent the generic kmalloc caches, dedicating one to
ourselves (or at least naming one for us depending upon the core) aids
debugging our own slab usage.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Allow for the creation of GEM objects backed by stolen memory. As these
are not backed by ordinary pages, we create a fake dma mapping and store
the address in the scatterlist rather than obj->pages.
v2: Mark _i915_gem_object_create_stolen() as static, as noticed by Jesse
Barnes.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In order to accommodate objects that are not backed by struct pages, but
instead point into a contiguous region of stolen space, we need to make
various changes to avoid dereferencing obj->pages or obj->base.filp.
First introduce a marker for the stolen object, that specifies its
offset into the stolen region and implies that it has no backing pages.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As FBC is commonly disabled due to limitations of the chipset upon
output configurations, on many systems FBC is never enabled. For those
systems, it is advantageous to make use of the stolen memory for other
objects and so we defer allocation of the FBC chunk until we actually
require it. This increases the likelihood of that allocation failing,
but that in turns means that we are already taking advantage of the
stolen memory!
As well as delaying the allocation from driver initialisation until the
first use of FBC, we also return the stolen block after we finish using
it - allowing greater flexibility in our usage of stolen space. A side
effect of this is that we can then attempt to allocate only the required
amount of space (with a little slack to reduce reallocation rate and
avoid fragmentation).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As yet we do not do any preallocation (chicken-and-egg problem), but we
may like to preserve anything already allocated by the BIOS or grub and
reuse for own purposes after initialising the driver.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The routine to query the base of stolen memory was using the wrong
registers and the wrong encodings on virtually every platform.
It was not until the G33 refresh, that a PCI config register was
introduced that explicitly said where the stolen memory was. Prior to
865G there was not even a register that said where the end of usable
low memory was and where the stolen memory began (or ended depending
upon chipset). Before then, one has to look at the BIOS memory maps to
find the Top of Memory. Alas that is not exported by arch/x86 and so we
have to resort to disabling stolen memory on gen2 for the time being.
Then SandyBridge enlarged the PCI register to a full 32-bits and change
the encoding of the address, so even though we happened to be querying
the right register, we read the wrong bits and ended up using address 0
for our stolen data, i.e. notably FBC.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
spinlock_t should always be used.
LD drivers/gpu/drm/i915/built-in.o
CHECK drivers/gpu/drm/i915/i915_drv.c
CC [M] drivers/gpu/drm/i915/i915_drv.o
CHECK drivers/gpu/drm/i915/i915_dma.c
CC [M] drivers/gpu/drm/i915/i915_dma.o
CHECK drivers/gpu/drm/i915/i915_irq.c
CC [M] drivers/gpu/drm/i915/i915_irq.o
CHECK drivers/gpu/drm/i915/i915_debugfs.c
drivers/gpu/drm/i915/i915_debugfs.c:558:31: warning: dereference of noderef expression
drivers/gpu/drm/i915/i915_debugfs.c:558:39: warning: dereference of noderef expression
drivers/gpu/drm/i915/i915_debugfs.c:558:51: warning: dereference of noderef expression
drivers/gpu/drm/i915/i915_debugfs.c:558:63: warning: dereference of noderef expression
CC [M] drivers/gpu/drm/i915/i915_debugfs.o
CHECK drivers/gpu/drm/i915/i915_suspend.c
CC [M] drivers/gpu/drm/i915/i915_suspend.o
CHECK drivers/gpu/drm/i915/i915_gem.c
drivers/gpu/drm/i915/i915_gem.c:3703:14: warning: incorrect type in assignment (different base types)
drivers/gpu/drm/i915/i915_gem.c:3703:14: expected unsigned int [unsigned] [usertype] mask
drivers/gpu/drm/i915/i915_gem.c:3703:14: got restricted gfp_t
drivers/gpu/drm/i915/i915_gem.c:3706:22: warning: invalid assignment: &=
drivers/gpu/drm/i915/i915_gem.c:3706:22: left side has type unsigned int
drivers/gpu/drm/i915/i915_gem.c:3706:22: right side has type restricted gfp_t
drivers/gpu/drm/i915/i915_gem.c:3707:22: warning: invalid assignment: |=
drivers/gpu/drm/i915/i915_gem.c:3707:22: left side has type unsigned int
drivers/gpu/drm/i915/i915_gem.c:3707:22: right side has type restricted gfp_t
drivers/gpu/drm/i915/i915_gem.c:3711:39: warning: incorrect type in argument 2 (different base types)
drivers/gpu/drm/i915/i915_gem.c:3711:39: expected restricted gfp_t [usertype] mask
drivers/gpu/drm/i915/i915_gem.c:3711:39: got unsigned int [unsigned] [usertype] mask
CC [M] drivers/gpu/drm/i915/i915_gem.o
CHECK drivers/gpu/drm/i915/i915_gem_context.c
CC [M] drivers/gpu/drm/i915/i915_gem_context.o
CHECK drivers/gpu/drm/i915/i915_gem_debug.c
CC [M] drivers/gpu/drm/i915/i915_gem_debug.o
CHECK drivers/gpu/drm/i915/i915_gem_evict.c
CC [M] drivers/gpu/drm/i915/i915_gem_evict.o
CHECK drivers/gpu/drm/i915/i915_gem_execbuffer.c
CC [M] drivers/gpu/drm/i915/i915_gem_execbuffer.o
CHECK drivers/gpu/drm/i915/i915_gem_gtt.c
CC [M] drivers/gpu/drm/i915/i915_gem_gtt.o
CHECK drivers/gpu/drm/i915/i915_gem_stolen.c
CC [M] drivers/gpu/drm/i915/i915_gem_stolen.o
CHECK drivers/gpu/drm/i915/i915_gem_tiling.c
CC [M] drivers/gpu/drm/i915/i915_gem_tiling.o
CHECK drivers/gpu/drm/i915/i915_sysfs.c
CC [M] drivers/gpu/drm/i915/i915_sysfs.o
CHECK drivers/gpu/drm/i915/i915_trace_points.c
CC [M] drivers/gpu/drm/i915/i915_trace_points.o
CHECK drivers/gpu/drm/i915/intel_display.c
drivers/gpu/drm/i915/intel_display.c:1736:9: warning: mixing different enum types
drivers/gpu/drm/i915/intel_display.c:1736:9: int enum transcoder versus
drivers/gpu/drm/i915/intel_display.c:1736:9: int enum pipe
drivers/gpu/drm/i915/intel_display.c:3659:48: warning: mixing different enum types
drivers/gpu/drm/i915/intel_display.c:3659:48: int enum pipe versus
drivers/gpu/drm/i915/intel_display.c:3659:48: int enum transcoder
CC [M] drivers/gpu/drm/i915/intel_display.o
CHECK drivers/gpu/drm/i915/intel_crt.c
CC [M] drivers/gpu/drm/i915/intel_crt.o
CHECK drivers/gpu/drm/i915/intel_lvds.c
CC [M] drivers/gpu/drm/i915/intel_lvds.o
CHECK drivers/gpu/drm/i915/intel_bios.c
drivers/gpu/drm/i915/intel_bios.c:706:60: warning: incorrect type in initializer (different address spaces)
drivers/gpu/drm/i915/intel_bios.c:706:60: expected struct vbt_header *vbt
drivers/gpu/drm/i915/intel_bios.c:706:60: got void [noderef] <asn:2>*vbt
drivers/gpu/drm/i915/intel_bios.c:726:42: warning: incorrect type in argument 1 (different address spaces)
drivers/gpu/drm/i915/intel_bios.c:726:42: expected void const *<noident>
drivers/gpu/drm/i915/intel_bios.c:726:42: got unsigned char [noderef] [usertype] <asn:2>*
drivers/gpu/drm/i915/intel_bios.c:727:40: warning: cast removes address space of expression
drivers/gpu/drm/i915/intel_bios.c:738:24: warning: cast removes address space of expression
CC [M] drivers/gpu/drm/i915/intel_bios.o
CHECK drivers/gpu/drm/i915/intel_ddi.c
drivers/gpu/drm/i915/intel_ddi.c:87:6: warning: symbol 'intel_prepare_ddi_buffers' was not declared. Should it be static?
drivers/gpu/drm/i915/intel_ddi.c:1036:34: warning: mixing different enum types
drivers/gpu/drm/i915/intel_ddi.c:1036:34: int enum pipe versus
drivers/gpu/drm/i915/intel_ddi.c:1036:34: int enum transcoder
CC [M] drivers/gpu/drm/i915/intel_ddi.o
drivers/gpu/drm/i915/intel_ddi.c: In function ‘intel_ddi_setup_hw_pll_state’:
drivers/gpu/drm/i915/intel_ddi.c:1129:2: warning: ‘port’ may be used uninitialized in this function [-Wmaybe-uninitialized]
drivers/gpu/drm/i915/intel_ddi.c:1111:12: note: ‘port’ was declared here
CHECK drivers/gpu/drm/i915/intel_dp.c
CC [M] drivers/gpu/drm/i915/intel_dp.o
CHECK drivers/gpu/drm/i915/intel_hdmi.c
CC [M] drivers/gpu/drm/i915/intel_hdmi.o
CHECK drivers/gpu/drm/i915/intel_sdvo.c
CC [M] drivers/gpu/drm/i915/intel_sdvo.o
CHECK drivers/gpu/drm/i915/intel_modes.c
CC [M] drivers/gpu/drm/i915/intel_modes.o
CHECK drivers/gpu/drm/i915/intel_panel.c
CC [M] drivers/gpu/drm/i915/intel_panel.o
CHECK drivers/gpu/drm/i915/intel_pm.c
drivers/gpu/drm/i915/intel_pm.c:2173:1: warning: symbol 'mchdev_lock' was not declared. Should it be static?
CC [M] drivers/gpu/drm/i915/intel_pm.o
CHECK drivers/gpu/drm/i915/intel_i2c.c
CC [M] drivers/gpu/drm/i915/intel_i2c.o
CHECK drivers/gpu/drm/i915/intel_fb.c
CC [M] drivers/gpu/drm/i915/intel_fb.o
CHECK drivers/gpu/drm/i915/intel_tv.c
CC [M] drivers/gpu/drm/i915/intel_tv.o
CHECK drivers/gpu/drm/i915/intel_dvo.c
CC [M] drivers/gpu/drm/i915/intel_dvo.o
CHECK drivers/gpu/drm/i915/intel_ringbuffer.c
CC [M] drivers/gpu/drm/i915/intel_ringbuffer.o
CHECK drivers/gpu/drm/i915/intel_overlay.c
CC [M] drivers/gpu/drm/i915/intel_overlay.o
CHECK drivers/gpu/drm/i915/intel_sprite.c
CC [M] drivers/gpu/drm/i915/intel_sprite.o
CHECK drivers/gpu/drm/i915/intel_opregion.c
CC [M] drivers/gpu/drm/i915/intel_opregion.o
CHECK drivers/gpu/drm/i915/dvo_ch7xxx.c
CC [M] drivers/gpu/drm/i915/dvo_ch7xxx.o
CHECK drivers/gpu/drm/i915/dvo_ch7017.c
CC [M] drivers/gpu/drm/i915/dvo_ch7017.o
CHECK drivers/gpu/drm/i915/dvo_ivch.c
CC [M] drivers/gpu/drm/i915/dvo_ivch.o
CHECK drivers/gpu/drm/i915/dvo_tfp410.c
CC [M] drivers/gpu/drm/i915/dvo_tfp410.o
CHECK drivers/gpu/drm/i915/dvo_sil164.c
CC [M] drivers/gpu/drm/i915/dvo_sil164.o
CHECK drivers/gpu/drm/i915/dvo_ns2501.c
CC [M] drivers/gpu/drm/i915/dvo_ns2501.o
CHECK drivers/gpu/drm/i915/i915_gem_dmabuf.c
CC [M] drivers/gpu/drm/i915/i915_gem_dmabuf.o
CHECK drivers/gpu/drm/i915/i915_ioc32.c
CC [M] drivers/gpu/drm/i915/i915_ioc32.o
CHECK drivers/gpu/drm/i915/intel_acpi.c
CC [M] drivers/gpu/drm/i915/intel_acpi.o
LD [M] drivers/gpu/drm/i915/i915.o
Building modules, stage 2.
MODPOST 1 modules
CC drivers/gpu/drm/i915/i915.mod.o
LD [M] drivers/gpu/drm/i915/i915.ko
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: intel-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Reported-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
And use it whenever we call code that uses the DDIs. We already have
intel_ddi.c and prefix every function with intel_ddi_something instead of
haswell_something, so I think replacing the checks with HAS_DDI makes more
sense. Just a cosmetical change, yes I know, but I have this OCD...
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Yeah, all users (both the clock selection special cases and the lvds
pin pair stuff) are still in common code, but this will change.
v2: Rebase on top of Jani Nikula's panel rework.
v3: Incorporate review from Paulo Zanoni:
- s/__is_dual_link_lvds/compute_is_dual_link_lvds
- kill dev_priv->lvds_val
- drop spurious whitespace change
v4: Add a debug printk to display the dual-link status, as suggested
by Paulo Zanoni in review.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> (v3)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Should be useful to know what the driver thought the other ring's seqno
was when it last used a semaphore.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Based on the work by Mika Kuoppala, we realised that we need to handle
seqno wraparound prior to committing our changes to the ring. The most
obvious point then is to grab the seqno inside intel_ring_begin(), and
then to reuse that seqno for all ring operations until the next request.
As intel_ring_begin() can fail, the callers must already be prepared to
handle such failure and so we can safely add further checks.
This patch looks like it should be split up into the interface
changes and the tweaks to move seqno wrapping from the execbuffer into
the core seqno increment. However, I found no easy way to break it into
incremental steps without introducing further broken behaviour.
v2: Mika found a silly mistake and a subtle error in the existing code;
inside i915_gem_retire_requests() we were resetting the sync_seqno of
the target ring based on the seqno from this ring - which are only
related by the order of their allocation, not retirement. Hence we were
applying the optimisation that the rings were synchronised too early,
fortunately the only real casualty there is the handling of seqno
wrapping.
v3: Do not forget to reset the sync_seqno upon module reinitialisation,
ala resume.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=863861
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> [v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There seem to be indeed some awkwards machines around, mostly those
without OpRegion support, where the firmware changes the display hw
state behind our backs when closing the lid.
This force-restore logic has been originally introduced in
commit c1c7af6089
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date: Thu Sep 10 15:28:03 2009 -0700
drm/i915: force mode set at lid open time
but after the modeset-rework we've disabled it in the vain hope that
it's no longer required:
commit 3b7a89fce3
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Mon Sep 17 22:27:21 2012 +0200
drm/i915: fix OOPS in lid_notify
Alas, no.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54677
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=57434
Tested-by: Krzysztof Mazur <krzysiek@podlesie.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
For now, this code is just used by the eDP AUX channel frequency.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We need to enable a special bit, otherwise none of the DP functions
requiring the PCH will work.
Version 2: store the PCH ID inside dev_priv, as suggested by Daniel
Vetter.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now that we always restore the HWS registers (both physical and GTT
virtual addresses) when re-initialising the rings, we can eliminate the
superfluous save/restore of the register across suspend and resume.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
drivers/gpu/drm/i915/i915_drv.h:1545:2: warning: '______f' is static but
declared in inline function 'i915_gem_chipset_flush' which is not static
Reported-by: kbuild test robot <fengguang.wu@intel.com>
dri-devel-Reference: <50a4d41c.586VhmwghPuKZbkB%fengguang.wu@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Daniel writes:
Highlights of this -next round:
- ivb fdi B/C fixes
- hsw sprite/plane offset fixes from Damien
- unified dp/hdmi encoder for hsw, finally external dp support on hsw
(Paulo)
- kill-agp and some other prep work in the gtt code from Ben
- some fb handling fixes from Ville
- massive pile of patches to align hsw VGA with the spec and make it
actually work (Paulo)
- pile of workarounds from Jesse, mostly for vlv, but also some other
related platforms
- start of a dev_priv reorg, that thing grew out of bounds and chaotic
- small bits&pieces all over the place, down to better error handling for
load-detect on gen2 (Chris, Jani, Mika, Zhenyu, ...)
On top of the previous pile (just copypasta):
- tons of hsw dp prep patches form Paulo
- round scheduled work items and timers to nearest second (Chris)
- some hw workarounds (Jesse&Damien)
- vlv dp support and related fixups (Vijay et al.)
- basic haswell dp support, not yet wired up for external ports (Paulo)
- edp support (Paulo)
- tons of refactorings to prepare for the above (Paulo)
- panel rework, unifiying code between lvds and edp panels (Jani)
- panel fitter scaling modes (Jani + Yuly Novikov)
- panel power improvements, should now work without the BIOS setting it up
- extracting some dp helpers from radeon/i915 and move them to
drm_dp_helper.c
- randome pile of workarounds (Damien, Ben, ...)
- some cleanups for the register restore code for suspend/resume
- secure batchbuffer support, should enable tear-free blits on gen6+
Chris)
- random smaller fixlets and cleanups.
* 'for-airlied' of git://people.freedesktop.org/~danvet/drm-intel: (231 commits)
drm/i915: Restore physical HWS_PGA after resume
drm/i915: Report amount of usable graphics memory in MiB
drm/i915/i2c: Track users of GMBUS force-bit
drm/i915: Allocate the proper size for contexts.
drm/i915: Update load-detect failure paths for modeset-rework
drm/i915: Clear unused fields of mode for framebuffer creation
drm/i915: Always calculate 8xx WM values based on a 32-bpp framebuffer
drm/i915: Fix sparse warnings in from AGP kill code
drm/i915: Missed lock change with rps lock
drm/i915: Move the remaining gtt code
drm/i915: flush system agent TLBs on SNB
drm/i915: Kill off now unused gen6+ AGP code
drm/i915: Calculate correct stolen size for GEN7+
drm/i915: Stop using AGP layer for GEN6+
drm/i915: drop the double-OP_STOREDW usage in blt_ring_flush
drm/i915: don't rewrite the GTT on resume v4
drm/i915: protect RPS/RC6 related accesses (including PCU) with a new mutex
drm/i915: put ring frequency and turbo setup into a work queue v5
drm/i915: don't block resume on fb console resume v2
drm/i915: extract l3_parity substruct from dev_priv
...
This fixes a regression for SDVO from
commit fbfcc4f3a0
Author: Jani Nikula <jani.nikula@intel.com>
Date: Mon Oct 22 16:12:18 2012 +0300
drm/i915/sdvo: restore i2c adapter config on intel_sdvo_init() failures
As SDVOB and SDVOC are multiplexed on the same pin, if a chipset does
not have the second SDVO encoder, it will then remove the force-bit
setting on the common i2c adapter during teardown. All subsequent
attempts of trying to use GMBUS with SDVOB then fail.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
[danvet: fixup inversion in the debug printout, noticed by Jani
Nikulai.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As a quick hack we make the old intel_gtt structure mutable so we can
fool a bunch of the existing code which depends on elements in that data
structure. We can/should try to remove this in a subsequent patch.
This should preserve the old gtt init behavior which upon writing these
patches seems incorrect. The next patch will fix these things.
The one exception is VLV which doesn't have the preserved flush control
write behavior. Since we want to do that for all GEN6+ stuff, we'll
handle that in a later patch. Mainstream VLV support doesn't actually
exist yet anyway.
v2: Update the comment to remove the "voodoo"
Check that the last pte written matches what we readback
v3: actually kill cache_level_to_agp_type since most of the flags will
disappear in an upcoming patch
v4: v3 was actually not what we wanted (Daniel)
Make the ggtt bind assertions better and stricter (Chris)
Fix some uncaught errors at gtt init (Chris)
Some other random stuff that Chris wanted
v5: check for i==0 in gen6_ggtt_bind_object to shut up gcc (Ben)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by [v4]: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Make the cache_level -> agp_flags conversion for pre-gen6 a
tad more robust by mapping everything != CACHE_NONE to the cached agp
flag - we have a 1:1 uncached mapping, but different modes of
cacheable (at least on later generations). Suggested by Chris Wilson.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This allows the power related code to run independently of the rest of
the pipeline, extending the resume and init time improvements into
userspace, which would otherwise have been blocked on the struct mutex
if we were doing PCU communication.
v2: Also convert the locking for the rps sysfs interface.
Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Communicating via the mailbox registers with the PCU can take quite
awhile. And updating the ring frequency or enabling turbo is not
something that needs to happen synchronously, so take it out of our init
and resume paths to speed things up (~200ms on my T420).
v2: add comment about why we use a work queue (Daniel)
make sure work queue is idle on suspend (Daniel)
use a delayed work queue since there's no hurry (Daniel)
v3: make cleanup symmetric and just call cancel work directly (Daniel)
v4: schedule the work using round_jiffies_up to batch work better (Chris)
v5: fix the right schedule_delayed_work call (Chris)
References: https://bugs.freedesktop.org/show_bug.cgi?id=54089
Signed-of-by: Jesse Barnes <jbarnes@virtuougseek.org>
[danvet: bikeshed the placement of the new delayed work, move it to
all the other gen6 power mgmt stuff.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The console lock can be contended, so rather than prevent other drivers
after us from being held up, queue the console suspend into the global
work queue that can happen anytime. I've measured this to take around
200ms on my T420. Combined with the ring freq/turbo change, we should
save almost 1/2 a second on resume.
v2: use console_trylock() to try to resume the console immediately (Chris)
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: move dev_priv->console_resume_work next to the fbdev
pointer.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Pretty astonishing how far apart these two members landed ... Especially since
I've already removed almost 200 lines in between.
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Also, move dev_priv->counter there, it's only used in i915_dma.c
And also move the dri1 dungeon at the end of dev_priv where no one
cares about it.
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
And give the structs slightly more generic names. I've decided to keep
the short rps/ips prefix, since that's just easier and less churn.
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
dev_priv has grown way too big, and grouping memebers into substructs
and moving them out of line helps re-gain some overview.
Unfortunatley I couldn't just call the substruct save and drop the prefix, since
that will make most member names clash with registers #defines. Changes in
i915_drv.h done by hand, everything else changed with
s/\<save\([A-Z]*\)/regfile.save\1/ in vim.
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now that we no longer pretend to have flexibility in matching any
north display block with any pch, we can ditch this.
v2: Fix the embarassing rebase fail that Paulo Zanoni spotted.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Some subsequent commits will need to know what generation we're running
on to do different pte encoding for the ppgtt. Since it's not much
hassle or overhead to store it in the ppgtt structure, do that.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
After all relevant pipes are disabled and after we've updated all the
state with the staged state, but before we call the per-crtc
->mode_set functions there's a very natural point to set up any
shared/global resources like
- shared plls (obviously only the setup, the enabling needs to be
separately handling with a separate refcount)
- global watermark state like the DSPARB on gmch platforms
- workaround bits that depend upon the exact global output
configuration
- enabling the right set of refclocks
- enabling/disabling manual power wells.
Now for a lot of these things we can't move them into this function
yet, most often because we only compute the required information in
the per-crtc ->mode_set callback. Which is too late. But due to a
bunch of reasons (check-only atomic modeset, fastboot&hw state checks,
...) we need to separate the computation of that state from the actual
hw frobbery anyway. So we can move things into this new callback step-
by-step.
Others can't be moved here (or implemented at all) because our code
lacks the smarts to properly update them. E.g. the DSPARB can only be
updated when all pipes are disabled, so if we decide to change it's
value, we need to disable _all_ pipes. The infrastructure for that is
already in place (with the various pipe masks that driver the modeset
logic). But again we need to move a few things out of ->mode_set
first before we can even implement the correct decision making.
In any case, we need to start somewhere, so let's start with the
callback: Some small follow-up patches will make immediate good use of
it.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Before Haswell we used to have the CPU pipes and the PCH transcoders.
We had the same amount of pipes and transcoders, and there was a 1:1
mapping between them. After Haswell what we used to call CPU pipe was
split into CPU pipe and CPU transcoder. So now we have 3 CPU pipes (A,
B and C), 4 CPU transcoders (A, B, C and EDP) and 1 PCH transcoder
(only used for VGA).
For all the outputs except for EDP we have an 1:1 mapping on the CPU
pipes and CPU transcoders, so if you're using CPU pipe A you have to
use CPU transcoder A. When have an eDP output you have to use
transcoder EDP and you can attach this CPU transcoder to any of the 3
CPU pipes. When using VGA you need to select a pair of matching CPU
pipes/transcoders (A/A, B/B, C/C) and you also need to enable/use the
PCH transcoder.
For now we're just creating the cpu_transcoder definitions and setting
cpu_transcoder to TRANSCODER_EDP on DDI eDP code, but none of the
registers was ported to use transcoder instead of pipe. The goal is to
keep the code backwards-compatible since on all cases except when
using eDP we must have pipe == cpu_transcoder.
V2: Comment the haswell_crtc_off chunk, suggested by Damien Lespiau
and Daniel Vetter.
We currently need the haswell_crtc_off chunk because TRANSCODER_EDP
can be used by any CRTC, so when you stop using it you have to stop
saying you're using it, otherwise you may have at some point 2 CRTCs
claiming they're using TRANSCODER_EDP (a disabled CRTC and an enabled
one), then the HW state readout code will get completely confused.
In other words:
Imagine the following case:
xrandr --output eDP1 --auto --crtc 0
xrandr --output eDP1 --off
xrandr --output eDP1 --auto --crtc 2
After the last command you could get a "pipe A assertion failure
(expected off, current on)" because CRTC 0 still claims it's using
TRANSCODER_EDP, so the HW state readout function will read it
(through PIPECONF) and expect it to be off, when it's actually on
because it's being used by CRTC 2.
So when we make "intel_crtc->cpu_transcoder = intel_crtc->pipe" we
make sure we're pointing to our own original CRTC which is certainly
not used by any other CRTC.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Get rid of saved int_lvds_connector and int_edp_connector in
drm_i915_private.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Based on earlier work by Chris Wilson <chris@chris-wilson.co.uk>
CC: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJQgvdwAAoJEHm+PkMAQRiG+3AH/i2XsqqN3VctL0nnbWfvds+Q
aKulfIdJTjKiVAsawPUtRqReZ8ijiebrgA/53lZLlrFOoPPQ5+LHmnSyQF6gErOY
NuAE1lijXDRM1pwBlhvOBbAj26wUobGjqONFJ9OkKr758Ue8ds/Q7UdxyEgmYgmg
tvVMzfRcICzryUV3PcqL+3cNPpCUdT6wGGRJ9DCv/jvGiWKExWhOle5oltrmxk+D
NsqRcws5pEubfHE4J8BvNWr8lE1kHfYVhrJETiLJUiN2XAJcbI4Jy7rU/3EGteNS
0HMZdaPPjV874lohdM70X2225SbYrCVkAYB5hnZCTeC3tYyCawBBPMQoyAiOcmU=
=+861
-----END PGP SIGNATURE-----
Merge tag 'v3.7-rc2' into drm-intel-next-queued
Linux 3.7-rc2
Backmerge to solve two ugly conflicts:
- uapi. We've already added new ioctl definitions for -next. Do I need to say more?
- wc support gtt ptes. We've had to revert this for snb+ for 3.7 and
also fix a few other things in the code. Now we know how to make it
work on snb+, but to avoid losing the other fixes do the backmerge
first before re-enabling wc gtt ptes on snb+.
And a few other minor things, among them git getting confused in
intel_dp.c and seemingly causing a conflict out of nothing ...
Conflicts:
drivers/gpu/drm/i915/i915_reg.h
drivers/gpu/drm/i915/intel_display.c
drivers/gpu/drm/i915/intel_dp.c
drivers/gpu/drm/i915/intel_modes.c
include/drm/i915_drm.h
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Some BIOSes may forcibly suspend RC6 during their operation which
trigger a warning as we find the hardware in a perplexing state upon
first use. So far that appears to be the worst symptom as fortuituously
we use the same values as the BIOS for programming the FORCEWAKE register.
Reported-by: Oleksij Rempel <bug-track@fisher-privat.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
On the worst scenario, users with new hardwares and old kernel from
enabling times can get black screens. So, from now on, this
perliminary_hw_support module parameter shall be used by all upcoming
platforms that are still under enabling. The second option would be to
merge the pci ids after basic modeset works, but that makes testing
and development while bringing up hw a rather tedious afair.
Although it is uncomfortable for developers use this extra variable it
brings more stability for end users.
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
[danvet: dropped the i915_ param prefix, i915.i915_ is just tedious.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There is a special mechanism for communicating with the PCU already
being used for the ring frequency stuff. As we'll be needing this for
other commands, extract it now to make future code less error prone and
the current code more reusable.
I'm not entirely sure if this code matches 1:1 with the previous code
behaviorally. Functionally however, it should be the same.
CC: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: Fixup compile fail reported by Wu Fengguang.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Note that just because we have n == MAX elements left, does not imply
that there are only MAX elements left in the scatterlist and so we may
not be on the last chain, and the nth element may in fact be a chain ptr.
This is exercised by the improved hangman tests and the gem_exec_big
test in i-g-t.
This regression has been introduced in
commit 9da3da660d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Fri Jun 1 15:20:22 2012 +0100
drm/i915: Replace the array of pages with a scatterlist
v2: KISS, replace the direct lookup with a for_each_sg() [danvet]
v3: Try to be clever again.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The intention was to allow the caller to avoid a failure to queue a
request having already written commands to the ring. However, this is a
moot point as the i915_add_request() can fail for other reasons than a
mere allocation failure and those failure cases are more likely than
ENOMEM. So the overlay code already had to handle i915_add_request()
failures, and due to
commit 3bb73aba1e
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Fri Jul 20 12:40:59 2012 +0100
drm/i915: Allow late allocation of request for i915_add_request()
the error handling code in intel_overlay.c was subject to causing
double-frees, as found by coverity.
Rather than further complicate i915_add_request() and callers, realise
the battle is lost and adapt intel_overlay.c to take advantage of the
late allocation of requests.
v2: Handle callers passing in a NULL seqno.
v3: Ditto. This time for sure.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Problems with the previous code:
- HDMI just uses WRPLL1 for everything, so dual head cases might not
work sometimes.
- At encoder->mode_set we just write the PLL register without doing
any kind of check (e.g., check if the PLL is already being used).
- There is no way to fail and return error codes at
encoder->mode_set.
- We write to PORT_CLK_SEL at mode_set and we never disable it.
- Machines hang due to wrong clock enable/disable sequence.
So here we rewrite the code, making it a little more like the
pre-Haswell PLL mode set code:
- Check PLL availability at ironlake_crtc_mode_set.
- Try to use both WRPLLs.
- Check if PLLs are used before actually trying to use them, and
properly fail with error messages.
- Enable/disable PORT_CLK_SEL at the right place.
- Add some WARNs to check for bugs.
The next improvement will be to try to reuse PLLs if the timings
match, but this is content for another patch and it's already
documented with a TODO comment.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
round_jiffies() aligns the wakeup time to the nearest second in order to
batch wakeups and reduce system load, which is useful for unimportant
coarse timers like our hangcheck.
v2: round_jiffies_relative() returns the relative jiffie value, whereas
we need the absolute value for the timer.
Suggested-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
By providing a callback for when we need to bind the pages, and then
release them again later, we can shorten the amount of time we hold the
foreign pages mapped and pinned, and importantly the dmabuf objects then
behave as any other normal object with respect to the shrinker and
memory management.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Magic numbers are bad mmmkay. In this case in particular the value is
especially weird because the docs say multiple things. We'll need this
value for sysfs, so extracting it is useful for that as well.
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Rather than have multiple data structures for describing our page layout
in conjunction with the array of pages, we can migrate all users over to
a scatterlist.
One major advantage, other than unifying the page tracking structures,
this offers is that we replace the vmalloc'ed array (which can be up to
a megabyte in size) with a chain of individual pages which helps reduce
memory pressure.
The disadvantage is that we then do not have a simple array to iterate,
or to access randomly. The common case for this is in the relocation
processing, which will typically fit within a single scatterlist page
and so be almost the same cost as the simple array. For iterating over
the array, the extra function call could be optimised away, but in
reality is an insignificant cost of either binding the pages, or
performing the pwrite/pread.
v2: Fix drm_clflush_sg() to not invoke wbinvd as well! And fix the
trivial compile error from rebasing.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We need to refcount our pages in order to prevent reaping them at
inopportune times, such as when they currently vmapped or exported to
another driver. However, we also wish to keep the lazy deallocation of
our pages so we need to take a pin/unpinned approach rather than a
simple refcount.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In order to specialise functions depending upon the type of object, we
can attach vfuncs to each object via a new ->ops pointer.
For instance, this will be used in future patches to only bind pages from
a dma-buf for the duration that the object is used by the GPU - and so
prevent them from pinning those pages for the entire of the object.
v2: Bonus comments.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As a quick reference I'll detail the motivation and design of the new code a
bit here (mostly stitched together from patchbomb announcements and commits
introducing the new concepts).
The crtc helper code has the fundamental assumption that encoders and crtcs can
be enabled/disabled in any order, as long as we take care of depencies (which
means that enabled encoders need an enabled crtc to feed them data,
essentially).
Our hw works differently. We already have tons of ugly cases where crtc code
enables encoder hw (or encoder->mode_set enables stuff that should only be
enabled in enocder->commit) to work around these issues. But on the disable
side we can't pull off similar tricks - there we actually need to rework the
modeset sequence that controls all this. And this is also the real motivation
why I've finally undertaken this rewrite: eDP on my shiny new Ivybridge
Ultrabook is broken, and it's broken due to the wrong disable sequence ...
The new code introduces a few interfaces and concepts:
- Add new encoder->enable/disable functions which are directly called from the
crtc->enable/disable function. This ensures that the encoder's can be
enabled/disabled at a very specific in the modeset sequence, controlled by our
platform specific code (instead of the crtc helper code calling them at a time
it deems convenient).
- Rework the dpms code - our code has mostly 1:1 connector:encoder mappings and
does support cloning on only a few encoders, so we can simplify things quite a
bit.
- Also only ever disable/enable the entire output pipeline. This ensures that
we obey the right sequence of enabling/disabling things, trying to be clever
here mostly just complicates the code and results in bugs. For cloneable
encoders this requires a bit of special handling to ensure that outputs can
still be disabled individually, but it simplifies the common case.
- Add infrastructure to read out the current hw state. No amount of careful
ordering will help us if we brick the hw on the initial modeset setup. Which
could happen if we just randomly disable things, oblivious to the state set up
by the bios. Hence we need to be able to read that out. As a benefit, we grow a
few generic functions useful to cross-check our modeset code with actual hw
state.
With all this in place, we can copy&paste the crtc helper code into the
drm/i915 driver and start to rework it:
- As detailed above, the new code only disables/enables an entire output pipe.
As a preparation for global mode-changes (e.g. reassigning shared resources) it
keeps track of which pipes need to be touched by a set of bitmasks.
- To ensure that we correctly disable the current display pipes, we need to
know the currently active connector/encoder/crtc linking. The old crtc helper
simply overwrote these links with the new setup, the new code stages the new
links in ->new_* pointers. Those get commited to the real linking pointers once
the old output configuration has been torn down, before the ->mode_set
callbacks are called.
- Finally the code adds tons of self-consistency checks by employing the new hw
state readout functions to cross-check the actual hw state with what the
datastructure think it should be. These checks are done both after every
modeset and after the hw state has been read out and sanitized at boot/resume
time. All these checks greatly helped in tracking down regressions and bugs in
the new code.
With this new basis, a lot of cleanups and improvements to the code are now
possible (besides the DP fixes that ultimately made me write this), but not yet
done:
- I think we should create struct intel_mode and use it as the adjusted mode
everywhere to store little pieces like needs_tvclock, pipe dithering values or
dp link parameters. That would still be a layering violation, but at least we
wouldn't need to recompute these kinds of things in intel_display.c. Especially
the port bpc computation needed for selecting the pipe bpc and dithering
settings in intel_display.c is rather gross.
- In a related rework we could implement ->mode_valid in terms of ->mode_fixup
in a generic way - I've hunted down too many bugs where ->mode_valid did the
right thing, but ->mode_fixup didn't. Or vice versa, resulting in funny bugs
for user-supplied modes.
- Ditch the idea to rework the hdp handling in the common crtc helper code and
just move things to i915.ko. Which would rid us of the ->detect crtc helper
dependencies.
- LVDS wire pair and pll enabling is all done in the crtc->mode_set function
currently. We should be able to move this to the crtc_enable callbacks (or in
the case of the LVDS wire pair enabling, into some encoder callback).
Last, but not least, this new code should also help in enabling a few neat
features: The hw state readout code prepares (but there are still big pieces
missing) for fastboot, i.e. avoiding the inital modeset at boot-up and just
taking over the configuration left behind by the bios. We also should be able
to extend the configuration checks in the beginning of the modeset sequence and
make better decisions about shared resources (which is the entire point behind
the atomic/global modeset ioctl).
Tested-by: Jani Nikula <jani.nikula@intel.com>
Tested-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Damien Lespiau <damien.lespiau@intel.com>
Tested-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Vijay Purushothaman <vijay.a.purushothaman@intel.com>
Acked-by: Vijay Purushothaman <vijay.a.purushothaman@intel.com>
Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Acked-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Tested-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
... instead of resetting a few things and hoping that this will work
out.
To properly disable the output pipelines at the initial modeset after
resume or boot up we need to have an accurate picture of which outputs
are enabled and connected to which crtcs. Otherwise we risk disabling
things at the wrong time, which can lead to hangs (or at least royally
confused panels), both requiring a walk to the reset button to fix.
Hence read out the hw state with the freshly introduce get_hw_state
functions and then sanitize it afterwards.
For a full modeset readout (which would allow us to avoid the initial
modeset at boot up) a few things are still missing:
- Reading out the mode from the pipe, especially the dotclock
computation is quite some fun.
- Reading out the parameters for the stolen memory framebuffer and
wrapping it up.
- Reading out the pch pll connections - luckily the disable code
simply bails out if the crtc doesn't have a pch pll attached (even
for configurations that would need one).
This patch here turned up tons of smelly stuff around resume: We
restore tons of register in seemingly random way (well, not quite, but
we're not too careful either), which leaves the hw in a rather
ill-defined state: E.g. the port registers are sometimes
unconditionally restore (lvds, crt), leaving us with an active
encoder/connector but no active pipe connected to it. Luckily the hw
state sanitizer detects this madness and fixes things up a bit.
v2: When checking whether an encoder with active connectors has a crtc
wire up to it, check for both the crtc _and_ it's active state.
v3:
- Extract intel_sanitize_encoder.
- Manually disable active encoders without an active pipe.
v4: Correclty fix up the pipe<->plane mapping on machines where we
switch pipes/planes. Noticed by Chris Wilson, who also provided the
fixup.
v5: Spelling fix in a comment, noticed by Paulo Zanoni
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Because that's what we're essentially calling. This is the first step
in untangling the crtc_helper induced dpms handling mess we have - at
the crtc level we only have 2 states and the magic is just in
selecting which one (and atm there isn't even much magic, but on
recent platforms where not even the crt output has more than 2 states
we could do better).
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Like with the equivalent change for gen6+ rps state, this helps in
clarifying the code (and in fixing a few places that have fallen through
the cracks in the locking review).
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Using the extracted INSTDONE reading, and our new register definitions,
update our hangcheck detection and error collection to use it. This
primarily means changing == to memcmp, and changing = to memcpy.
Hopefully this will give more info on error dump, and provide more
accurate hangcheck detection (both are actually TBD).
Also, remove the reading of instdone1 from the ring error collection
function, and just crap everything in capture_error_state (that could be
split into a separate patch if it wasn't so trivial).
v2: Now assuming i915_get_extra_instdone does the memset we can clean up the
code a bit (Jani)
v3: use ARRAY_SIZE as requested earlier by Jani (didn't change sizeof)
Updated commit msg
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we wish to create specialised object constructions in the near
future that share the same basic GEM object struct, export the default
initializer.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Avoid stalling and waiting for the GPU by checking to see if there is
sufficient inactive space in the aperture for us to bind the buffer
prior to writing through the GTT. If there is inadequate space we will
have to stall waiting for the GPU, and incur overheads moving objects
about. Instead, only incur the clflush overhead on the target object by
writing through shmem.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
ERR_INT can generate interrupts. However since most of the conditions seem
quite fatal the patch opts to simply report it in error state instead of
adding more complexity to the interrupt handler for little gain (the
bits are sticky anyway).
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Antti Koskipaa <antti.koskipaa@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
... and move a few others only used by i915_dma.c into the dri1
dungeon.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It's only ever a pointer to the global mchdev_lock, and we don't use
it at all.
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This way it's easier so see what belongs together, and what is used
by the ilk ips code. Also add some comments that explain the locking.
Note that (cur|min|max)_delay need to be duplicated, because
they're also used by the ips code.
v2: Missed one place that the dev_priv->ips change caught ...
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Handy for lazy people like me, or when people forget to add the output
of lspci -nn.
v2: Chris Wilson noticed that we have this duplicated already in the
i915_capabilites debugfs file. But there \n as separator looks better,
which would be a bit verbose in dmesg. Abuse the preprocessor to
extract this all.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We believe to have squashed all issues around the gen6+ rps interrupt
generation and why the gpu sometimes got stuck. With that cleared up,
there's no user left for the sanitize_pm infrastructure, so let's just
rip it out.
Note that 'intel_reg_write 0xa014 0x13070000' is the w/a if we find
ourselves stuck again.
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
By selecting the cache level (essentially whether or not the CPU snoops
any updates to the bo, and on more recent machines whether it resides
inside the CPU's last-level-cache) a userspace driver is able to then
manage all of its memory within buffer objects, if it so desires. This
enables the userspace driver to accelerate uploads and more importantly
downloads from the GPU and to able to mix CPU and GPU rendering/activity
efficiently.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Added code comment about where we plan to stuff platform
specific cacheing control bits in the ioctl struct.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Several functions of the GPU have the restriction that differing memory
domains cannot be placed next to each other (as the GPU may prefetch
beyond the end of one domain and hang as it crosses into the other
domain). We use the facility of the drm_mm to mark ranges with a
particular color that corresponds to the cache attributes of those pages
in order to prevent allocating adjacent blocks of differing memory
types.
v2: Rebase ontop of drm_mm coloring v2.
v3: Fix rebinding existing gtt_space and add a verification routine.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Originally I had a macro specifically for DPF support, and Daniel, with
good reason asked me to change it to this. It's not the way I would have
gone (and indeed I didn't), but for now there is no distinction as all
platforms with L3 also have DPF.
Note: The good reasons are that dpf is a l3$ feature (at least on
currrent hw), hence I don't expect one to go without the other.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: added note]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As suggested by Daniel, rip out the independent timers for device and
crtc busyness and integrate the manual powermanagement of the display
engine into the GEM core and its request tracking. The benefits are that
the code is a lot smaller, fewer moving parts and should fit more neatly
into the overall activity tracking of the driver.
v2: Complete overhaul and removal of the racy timers and workers.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
By moving the function to intel_ringbuffer and currying the appropriate
parameter, hopefully we make the callsites easier to read and
understand.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is now handled by a global flag to ensure we emit a flush before
the next serialisation point (if we failed to queue one previously).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we guarantee to emit a flush before emitting the breadcrumb or
the next batchbuffer, there is no further need for the flushing list.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we always flush the GPU cache prior to emitting the breadcrumb, we no
longer have to worry about the deferred flush causing the
pending_gpu_write to be delayed. So we can instead utilize the known
last_write_seqno to hopefully minimise the wait times.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Request preallocation was added to i915_add_request() in order to
support the overlay. However, not all users care and can quite happily
ignore the failure to allocate the request as they will simply repeat
the request in the future.
By pushing the allocation down into i915_add_request(), we can then
remove some rather ugly error handling in the callers.
v2: Nullify request->file_priv otherwise we chase a garbage pointer
when retiring requests.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The interface's immediate purpose is to do synchronous timestamp queries
as required by GL_TIMESTAMP. The GPU has a register for reading the
timestamp but because that would normally require root access through
libpciaccess, the IOCTL can provide this service instead.
Currently the implementation whitelists only the render ring timestamp
register, because that is the only thing we need to expose at this time.
v2: make size implicit based on the register offset
Add a generation check
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: Jacek Lawrynowicz <jacek.lawrynowicz@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: fixup the ioctl numerb:]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Having had to dive into the bspec to understand what each stage of the
workaround meant, and how that the ring broadcasting IDLE corresponded
with the GT powering down the ring (i.e. rc6) add comments to aide
the next reader.
And since the register "is used to control all aspects of PSMI and power
saving functions" that makes it quite interesting to inspect with
regards to RC6 hangs, so add it to the error-state.
v2: Rediscover the piece of magic, set the RNCID to 0 before waiting for
the ring to wake up.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We already have this pattern at quite a few places, and moving part of
the modeset helper stuff into the driver will add more.
v2: Don't clobber the crtc struct name with the macro parameter ...
v3: Convert two more places noticed by Paulo Zanoni.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
So don't return -EAGAIN, even in the case of a gpu hang. Remap it to
-EIO instead. Note that this isn't really an issue with
interruptability, but more that we have quite a few codepaths (mostly
around kms stuff) that simply can't handle any errors and hence not
even -EAGAIN. Instead of adding proper failure paths so that we could
restart these ioctls we've opted for the cheap way out of sleeping
non-interruptibly. Which works everywhere but when the gpu dies,
which this patch fixes.
So essentially interruptible == false means 'wait for the gpu or die
trying'.'
This patch is a bit ugly because intel_ring_begin is all non-interruptible
and hence only returns -EIO. But as the comment in there says,
auditing all the callsites would be a pain.
To avoid duplicating code, reuse i915_gem_check_wedge in __wait_seqno
and intel_wait_ring_buffer. Also use the opportunity to clarify the
different cases in i915_gem_check_wedge a bit with comments.
v2: Don't access dev_priv->mm.interruptible from check_wedge - we
might not hold dev->struct_mutex, making this racy. Instead pass
interruptible in as a parameter. I've noticed this because I've hit a
BUG_ON(!mutex_is_locked) at the top of check_wedge. This has been
added in
commit b4aca0106c
Author: Ben Widawsky <ben@bwidawsk.net>
Date: Wed Apr 25 20:50:12 2012 -0700
drm/i915: extract some common olr+wedge code
although that commit is missing any justification for this. I guess
it's just copy&paste, because the same commit add the same BUG_ON
check to check_olr, where it indeed makes sense.
But in check_wedge everything we access is protected by other means,
so this is superflous. And because it now gets in the way (we add a
new caller in __wait_seqno, which can be called without
dev->struct_mutext) let's just remove it.
v3: Group all the i915_gem_check_wedge refactoring into this patch, so
that this patch here is all about not returning -EAGAIN to callsites
that can't handle syscall restarting.
v4: Add clarification what interuptible == fales means in our code,
requested by Ben Widawsky.
v5: Fix EAGAIN mispell noticed by Chris Wilson.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Previously we had has_pch_split to tell us whether we had a PCH or not
and we also had dev_priv->pch_type to tell us which kind of PCH it
was, but it could only be used if we were 100% sure we did have a PCH.
Now that PCH_NONE was added to dev_priv->pch_type we don't need
has_pch_split anymore: we can just check for pch_type != PCH_NONE.
The HAS_PCH_{IBX,CPT,LPT} macros use dev_priv->pch_type, so they can
only be called after intel_detect_pch. The HAS_PCH_SPLIT macro looks
at dev_priv->info->has_pch_split, which is available earlier.
Since the goal is to implement HAS_PCH_SPLIT using dev_priv->pch_type
instead of dev_priv->info->has_pch_split, we need to make sure that
intel_detect_pch is called before any calls to HAS_PCH_SPLIT are made.
So we moved the intel_detect_pch call to an earlier stage.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
And rely on the fact that it's 0 to assume that machines without a PCH
will have PCH_NONE as dev_priv->pch_type.
Just today I finally realized that HAS_PCH_IBX is true for machines
without a PCH. IMHO this is totally counter-intuitive and I don't
think it's a good idea to assume that we're going to check for
HAS_PCH_IBX only after we check for HAS_PCH_SPLIT.
I believe that in the future we'll have more PCH types and checks
like:
if (HAS_PCH_IBX(dev) || HAS_PCH_CPT(dev))
will become more and more common. There's a good chance that we may
break non-PCH machines by adding these checks in code that runs on all
machines. I also believe that the HAS_PCH_SPLIT check will become less
common as we add more and more different PCH types. We'll probably
start replacing checks like:
if (HAS_PCH_SPLIT(dev))
foo();
else
bar();
with:
if (HAS_PCH_NEW(dev))
baz();
else if (HAS_PCH_OLD(dev) || HAS_PCH_IBX(dev))
foo();
else
bar();
and this may break gen 2/3/4.
As far as we have investigated, this patch will affect the behavior of
intel_hdmi_dpms and intel_dp_link_down on gen 4. In both functions the
code inside the HAS_PCH_IBX check is for IBX-specific workarounds, so
we should be safe. If we start bisecting gen 2/3/4 bugs to this commit
we should consider replacing the HAS_PCH_IBX checks with something
else.
V2: Improve commit message, list possible side effects and solution.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
While creating the new enable/disable_gt_powersave functions in
commit 8090c6b9da
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Sun Jun 24 16:42:32 2012 +0200
drm/i915: wrap up gt powersave enabling functions
I've botched up the handling of ironlake_disable_rc6. Fix this up by
calling it at the right place. Note though that ironlake_disable_rc6
does a bit more than just disabling rc6 - it also tears down all the
allocated context objects.
Hence we need to move intel_teardown_rc6 out and directly call it from
intel_modeset_cleanup.
Also properly mark ironlake_enable_rc6 as static and kill the un-used
declaration in i915_drv.h.
Note: In review a question popped out why disable_rc6 also tears down
the backing object and why we should move that out - it's simply for
consistency with gen6+ rps code, which does it that way.
Cc: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tidy up the routines for interacting with the GT (in particular the
forcewake dance) which are scattered throughout the code in a single
structure.
v2: use wait_for_atomic for polling.
v3: *really* use wait_for_atomic for polling.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQEcBAABAgAGBQJP53AxAAoJEHm+PkMAQRiGs2QH/RaqkXz96fwjhDcyiKpDqA3c
kGuS5mz5cOhnqKSmR88HFm6pwuhLux/qSJzeAmoQy1MC8a0ACx7AnANW0lfN3/qe
/HGYz8h60yCL/fhn8/bUYtdt9xsoDqoDcq/ooFl9mcsJGWbC6WeMSZU5dAUYqviE
qFrp5zjY07FG53CRGT0hFpezQNwNL+VLH30CF9LD+fJLPVEYum2zBNGXWM42rcw5
fxzGL/6SO8YqA/Upic1ht6HAd6s5LOrlST7qvnyXUMvRXN5z/Y92ueYJZefkS1Om
ohuLIKM2bv9/dJS67H8N2baSKGCzBdfSe5/5WaHdLYW9MiVju0wRl6HPJtAMrkk=
=H8t8
-----END PGP SIGNATURE-----
Merge tag 'v3.5-rc4' into drm-intel-next-queued
I want to merge the "no more fake agp on gen6+" patches into
drm-intel-next (well, the last pieces). But a patch in 3.5-rc4 also
adds a new use of dev->agp. Hence the backmarge to sort this out, for
otherwise drm-intel-next merged into Linus' tree would conflict in the
relevant code, things would compile but nicely OOPS at driver load :(
Conflicts in this merge are just simple cases of "both branches
changed/added lines at the same place". The only tricky part is to
keep the order correct wrt the unwind code in case of errors in
intel_ringbuffer.c (and the MI_DISPLAY_FLIP #defines in i915_reg.h
together, obviously).
Conflicts:
drivers/gpu/drm/i915/i915_reg.h
drivers/gpu/drm/i915/intel_ringbuffer.c
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It doesn't hurt and it at least prevents us from OOPSing left and
right at quite a few places. This also allows us to simplify the code
a bit by folding the only line of context_open into the callsite.
We obviuosly also need to run the cleanup code unconditionally, too.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Add the interfaces to allow user space to create and destroy contexts.
Contexts are destroyed automatically if the file descriptor for the dri
device is closed.
Following convention as usual here causes checkpatch warnings.
v2: with is_initialized, no longer need to init at create
drop the context switch on create (daniel)
v3: Use interruptible lock (Chris)
return -ENODEV in !GEM case (Chris)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Implement the context switch code as well as the interfaces to do the
context switch. This patch also doesn't match 1:1 with the RFC patches.
The main difference is that from Daniel's responses the last context
object is now stored instead of the last context. This aids in allows us
to free the context data structure, and context object independently.
There is room for optimization: this code will pin the context object
until the next context is active. The optimal way to do it is to
actually pin the object, move it to the active list, do the context
switch, and then unpin it. This allows the eviction code to actually
evict the context object if needed.
The context switch code is missing workarounds, they will be implemented
in future patches.
v2: actually do obj->dirty=1 in switch (daniel)
Modified comment around above
Remove flags to context switch (daniel)
Move mi_set_context code to i915_gem_context.c (daniel)
Remove seqno , use lazy request instead (daniel)
v3: use i915_gem_request_next_seqno instead of
outstanding_lazy_request (Daniel)
remove id's from trace events (Daniel)
Put the context BO in the instruction domain (Daniel)
Don't unref the BO is context switch fails (Chris)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Invent an abstraction for a hw context which is passed around through
the core functions. The main bit a hw context holds is the buffer object
which backs the context. The rest of the members are just helper
functions. Specifically the ring member, which could likely go away if
we decide to never implement whatever other hw context support exists.
Of note here is the introduction of the 64k alignment constraint for the
BO. If contexts become heavily used, we should consider tweaking this
down to 4k. Until the contexts are merged and tested a bit though, I
think 64k is a nice start (based on docs).
Since we don't yet switch contexts, there is really not much complexity
here. Creation/destruction works pretty much as one would expect. An idr
is used to generate the context id numbers which are unique per file
descriptor.
v2: add DRM_DEBUG_DRIVERS to distinguish ENOMEM failures (ben)
convert a BUG_ON to WARN_ON, default destruction is still fatal (ben)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Very basic code for context setup/destruction in the driver.
Adds the file i915_gem_context.c This file implements HW context
support. On gen5+ a HW context consists of an opaque GPU object which is
referenced at times of context saves and restores. With RC6 enabled,
the context is also referenced as the GPU enters and exists from RC6
(GPU has it's own internal power context, except on gen5). Though
something like a context does exist for the media ring, the code only
supports contexts for the render ring.
In software, there is a distinction between contexts created by the
user, and the default HW context. The default HW context is used by GPU
clients that do not request setup of their own hardware context. The
default context's state is never restored to help prevent programming
errors. This would happen if a client ran and piggy-backed off another
clients GPU state. The default context only exists to give the GPU some
offset to load as the current to invoke a save of the context we
actually care about. In fact, the code could likely be constructed,
albeit in a more complicated fashion, to never use the default context,
though that limits the driver's ability to swap out, and/or destroy
other contexts.
All other contexts are created as a request by the GPU client. These
contexts store GPU state, and thus allow GPU clients to not re-emit
state (and potentially query certain state) at any time. The kernel
driver makes certain that the appropriate commands are inserted.
There are 4 entry points into the contexts, init, fini, open, close.
The names are self-explanatory except that init can be called during
reset, and also during pm thaw/resume. As we expect our context to be
preserved across these events, we do not reinitialize in this case.
As Adam Jackson pointed out, The cutoff of 1MB where a HW context is
considered too big is arbitrary. The reason for this is even though
context sizes are increasing with every generation, they have yet to
eclipse even 32k. If we somehow read back way more than that, it
probably means BIOS has done something strange, or we're running on a
platform that wasn't designed for this.
v2: rename load/unload to init/fini (daniel)
remove ILK support for get_size() (indirectly daniel)
add HAS_HW_CONTEXTS macro to clarify supported platforms (daniel)
added comments (Ben)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
For that to work we need to export the base address of the gtt
mmio window from intel-gtt. Also replace all other uses of
dev->agp by values we already have at hand.
Reviewed-by: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel:
drm/i915: pch_irq_handler -> {ibx, cpt}_irq_handler
char/agp: add another Ironlake host bridge
drm/i915: fix up ivb plane 3 pageflips
drm/i915: hold forcewake around ring hw init
drm/i915: Mark the ringbuffers as being in the GTT domain
drm/i915/crt: Do not rely upon the HPD presence pin
drm/i915: Reset last_retired_head when resetting ring
Empirical evidence suggests that we need to: On at least one ivb
machine when running the hangman i-g-t test, the rings don't properly
initialize properly - the RING_START registers seems to be stuck at
all zeros.
Holding forcewake around this register init sequences makes chip reset
reliable again. Note that this is not the first such issue:
commit f01db988ef
Author: Sean Paul <seanpaul@chromium.org>
Date: Fri Mar 16 12:43:22 2012 -0400
drm/i915: Add wait_for in init_ring_common
added delay loops to make RING_START and RING_CTL initialization
reliable on the blt ring at boot-up. So I guess it won't hurt if we do
this unconditionally for all force_wake needing gpus.
To avoid copy&pasting of the HAS_FORCE_WAKE check I've added a new
intel_info bit for that.
v2: Fixup missing commas in static struct and properly handling the
error case in init_ring_common, both noticed by Jani Nikula.
Cc: stable@vger.kernel.org
Reported-and-tested-by: Yang Guang <guang.a.yang@intel.com>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50522
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We need the latest dma-buf code from Dave Airlie so that we can pimp
the backing storage handling code in drm/i915 with Chris Wilson's
unbound tracking and stolen mem backed gem object code.
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If any l3 rows have been previously remapped, we must remap them after
GPU reset/resume too.
v2: Just return (no warn) on remapping init if not IVB (Jesse)
Move the check of schizo userspace to i915_gem_l3_remap (Jesse)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
On IVB hardware we are given an interrupt whenever a L3 parity error
occurs in the L3 cache. The L3 cache is used by internal GPU clients
only. This is a very rare occurrence (in fact to test this I need to
use specially instrumented silicon).
When a row in the L3 cache detects a parity error the HW generates an
interrupt. The interrupt is masked in GTIMR until we get a chance to
read some registers and alert userspace via a uevent. With this
information userspace can use a sysfs interface (follow-up patch) to
remap those rows.
Way above my level of understanding, but if a given row fails, it is
statistically more likely to fail again than a row which has not failed.
Therefore it is desirable for an operating system to maintain a lifelong
list of failing rows and always remap any bad rows on driver load.
Hardware limits the number of rows that are remappable per bank/subbank,
and should more than that many rows detect parity errors, software
should maintain a list of the most frequent errors, and remap those
rows.
V2: Drop WARN_ON(IS_GEN6) (Jesse)
DRM_DEBUG row/bank/subbank on errror (Jesse)
Comment updates (Jesse)
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Wait request is poorly named IMO. After working with these functions for
some time, I feel it's much clearer to name the functions more
appropriately.
Of course we must update the callers to use the new name as well.
This leaves room within our namespace for a *real* wait request function
at some point.
Note to maintainer: this patch is optional.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This helps implement GL_ARB_sync but stops short of allowing full blown
sync objects. Finally we can use the new timed seqno waiting function
to allow userspace to wait on a buffer object with a timeout. This
implements that interface.
The IOCTL will take as input a buffer object handle, and a timeout in
nanoseconds (flags is currently optional but will likely be used for
permutations of flush operations). Users may specify 0 nanoseconds to
instantly check.
The wait ioctl with a timeout of 0 reimplements the busy ioctl. With any
non-zero timeout parameter the wait ioctl will wait for the given number
of nanoseconds on an object becoming unbusy. Since the wait itself does
so holding struct_mutex the object may become re-busied before this
completes. A similar but shorter race condition exists in the busy
ioctl.
v2: ETIME/ERESTARTSYS instead of changing to EBUSY, and EGAIN (Chris)
Flush the object from the gpu write domain (Chris + Daniel)
Fix leaked refcount in good case (Chris)
Naturally align ioctl struct (Chris)
v3: Drop lock after getting seqno to avoid ugly dance (Chris)
v4: check for 0 timeout after olr check to allow polling (Chris)
v5: Updated the comment. (Chris)
v6: Return -ETIME instead of -EBUSY when timeout_ns is 0 (Daniel)
Fix the commit message comment to be less ugly (Ben)
Add a warning to check the return timespec (Ben)
v7: Use DRM_AUTH for the ioctl. (Eugeni)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This adds handle->fd and fd->handle support to i915, this is to allow
for offloading of rendering in one direction and outputs in the other.
v2 from Daniel Vetter:
- fixup conflicts with the prepare/finish gtt prep work.
- implement ppgtt binding support.
Note that we have squat i-g-t testcoverage for any of the lifetime and
access rules dma_buf/prime support brings along. And there are quite a
few intricate situations here.
Also note that the integration with the existing code is a bit
hackish, especially around get_gtt_pages and put_gtt_pages. It imo
would be easier with the prep code from Chris Wilson's unbound series,
but that is for 3.6.
Also note that I didn't bother to put the new prepare/finish gtt hooks
to good use by moving the dma_buf_map/unmap_attachment calls in there
(like we've originally planned for).
Last but not least this patch is only compile-tested, but I've changed
very little compared to Dave Airlie's version. So there's a decent
chance v2 on drm-next works as well as v1 on 3.4-rc.
v3: Right when I've hit sent I've noticed that I've screwed up one
obj->sg_list (for dmar support) and obj->sg_table (for prime support)
disdinction. We should be able to merge these 2 paths, but that's
material for another patch.
v4: fix the error reporting bugs pointed out by ickle.
v5: fix another error, and stop non-gtt mmaps on shared objects
stop pread/pwrite on imported objects, add fake kmap
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In many places we wish to iterate over the rings associated with the
GPU, so refactor them to use a common macro.
Along the way, there are a few code removals that should be side-effect
free and some rearrangement which should only have a cosmetic impact,
such as error-state.
Note that this slightly changes the semantics in the hangcheck code:
We now always cycle through all enabled rings instead of
short-circuiting the logic.
v2: Pull in a couple of suggestions from Ben and Daniel for
intel_ring_initialized() and not removing the warning (just moving them
to a new home, closer to the error).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Added note to commit message about the small behaviour
change, suggested by Ben Widawsky.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The line time can be programmed according to the number of horizontal
pixels vs effective pixel rate ratio.
v2: improve comment as per Chris Wilson suggestion
v3: incorporate latest changes in specs.
v4: move into wm update routine, also mention that the same routine can
program IPS watermarks. We do not have their enablement code yet, nor
handle the required clock settings at the moment, so this patch won't
program those values for now.
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Only half of them even cared, and it's always the same one.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
- reduce the irq disabled section, even for a debugfs file this was
way too long.
- always disable irqs when taking the lock.
v2: Thou shalt not mistake locking for reference counting, so:
- reference count the error_state to protect from concurent freeeing.
This will be only really used in the next patch.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
gpu reset is a very important piece of our infrastructure.
Unfortunately we only really it test by actually hanging the gpu,
which often has bad side-effects for the entire system. And the gpu
hang handling code is one of the rather complicated pieces of code we
have, consisting of
- hang detection
- error capture
- actual gpu reset
- reset of all the gem bookkeeping
- reinitialition of the entire gpu
This patch adds a debugfs to selectively stopping rings by ceasing to
update the hw tail pointer, which will result in the gpu no longer
updating it's head pointer and eventually to the hangcheck firing.
This way we can exercise the gpu hang code under controlled conditions
without a dying gpu taking down the entire systems.
Patch motivated by me forgetting to properly reinitialize ppgtt after
a gpu reset.
Usage:
echo $((1 << $ringnum)) > i915_ring_stop # stops one ring
echo 0xffffffff > i915_ring_stop # stops all, future-proof version
then run whatever testload is desired. i915_ring_stop automatically
resets after a gpu hang is detected to avoid hanging the gpu to fast
and declaring it wedged.
v2: Incorporate feedback from Chris Wilson.
v3: Add the missing cleanup.
v4: Fix up inconsistent size of ring_stop_read vs _write, noticed by
Eugeni Dodonov.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Every time we use the device after a period of idleness, check that the
power management setup is still sane. This is to workaround a bug
whereby it seems that we begin suppressing power management interrupts,
preventing SandyBridge+ from going into turbo mode.
This patch does have a side-effect. It removes the mark-busy for just
moving the cursor - we don't want to increase the render clock just for
the sprite, though we may want to bump the display frequency. I'd argue
that we do not, and certainly don't want to take the struct_mutex here
due to the large latencies that introduces.
References: https://bugs.freedesktop.org/show_bug.cgi?id=44006
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
To get the fun stuff out of the way, the legacy hws is allocated by
userspace when the gpu needs a gfx hws. And there's no reference-counting
going on, so userspace can simply screw everyone over.
At least it's not as horrible as i810, where the ringbuffer is allocated
by userspace ...
We can't fix this disaster, but we can at least tidy up the code a
bit to make things clearer:
- Drop the drm ioremap indirection.
- Add a new new read_legacy_status_page to paper over the differences
between the legacy gfx hws and the physical hws shared with the
new ringbuffer code.
- Add a pointer in dev_priv->dri1 for the cpu addresses - that one is
an iomem remapping as opposed to all other hw status pages. This is
just prep work to make sparse happy.
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Wohoo!
Now we only need to move all the gem/kms stuff that accidentally
landed in i915_dma.c out of it, and this will be our legacy dri1
grave-yard.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
... and hide it in i915_dma.c.
This way all the legacy stuff dealing with READ_BREADCRUMB and
LP_RING and friends is in i915_dma.c.
v2: Rebase on top of Chris Wilson's rework irq handling code.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Let's just get this out of the way.
v2: Rebase against ENODEV changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Assigned in setparam, used never.
I didn't bother to dig through the archives to figure out what
this was supposed to do.
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
... and shove allow_batchbuffer in there. More dragons will
follow suit.
There's the curious case that we allow this for KMS ...
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
i915_dma.c contains most of the old dri1 horror-show, so move
the remaining bits there, too. The code has been removed and
the only thing left are some stubs to ensure that userspace
doesn't try to use this stuff. vblank_pipe_set only returns 0
without any side-effects, so we can even stub it out with
the canonical drm_noop.
v2: Rebase against ENODEV changes.
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
vblank_pipe was intended to be used for tracking DRI1 state. However,
the vblank_pipe reported to DRI1 is fixed to umask both pipes, and the
dev_priv->vblank_pipe unused and superfluous.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The waiting_seqno is not terribly useful, and as such we can remove it
so that we'll be able to extract lockless code.
v2: Keep the information for error_state (Chris)
Check if ring is initialized in hangcheck (Chris)
Capture the waiting ring (Chris)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: add some bikeshed to clarify a comment.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This extra bit of interrupt enabling code doesn't belong in the wait
seqno function. If anything we should pull it out to a helper so the
throttle code can also use it. The history is a bit vague, but I am
going to attempt to just dump it, unless someone can argue otherwise.
Removing this allows for a shared lock free wait seqno function. To keep
tabs on this issue though, the IER value is stored on error capture
(recommended by Chris Wilson)
v2: fixed typo EIR->IER (Ben)
Fix some white space (Ben)
Move IER capture to globally instead of per ring (Ben)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: ier is a 16 bit reg on gen2!]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This originates from a hack by me to quickly fix a bug in an earlier
patch where we needed control over whether or not waiting on a seqno
actually did any retire list processing. Since the two operations aren't
clearly related, we should pull the parameter out of the wait function,
and make the caller responsible for retiring if the action is desired.
The only function call site which did not get an explicit retire_request call
(on purpose) is i915_gem_inactive_shrink(). That code was already calling
retire_request a second time.
v2: don't modify any behavior excepit i915_gem_inactive_shrink(Daniel)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
On later gen3, you are able to select the meaning of the FlipPending
status bit in IIR and change it to FlipDone. This was sometimes done by
the BIOS leading to confusion on just how pageflipping worked on gen3.
Simplify the implementation by using the legacy meaning for all gen3
machines.
Note: this makes all gen3 machines equally broken...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
And remove the cargo-culted copy from the valleyview irq handler.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Always true these days. It has been added originally to work
around some issues with the agp layer in 2.6.29:
commit ac5c4e7618
Author: Dave Airlie <airlied@redhat.com>
Date: Fri Dec 19 15:38:34 2008 +1000
drm/i915: GEM on PAE has problems - disable it for now.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We slightly modify the initialisation sequence to move the
initialisation of the memory managers earlier and in particular before
probing outputs and detecting any existing output configuration. This is
essential if we wish to track preallocated objects and preserve them
whilst initialising GEM.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The use of the mm_list by deferred-free breaks the following patches to
extend the range of objects tracked. We can simplify things if we just
make the unbind during free uninterrutible.
Note that unbinding should never fail, because we hold an additional
reference on every active object. Only the ilk vt-d workaround breaks
this, but already takes care of not failing by waiting for the gpu to
quiescent non-interruptible. But the existence of the deferred free
list casted some doubts on this theory, hence WARN if the unbind fails
and only then retry non-interruptible.
We can kill this additional code after a release in case the theory is
indeed right and no one has hit that WARN.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Simplify object tracking by removing the inactive but pinned list. The
only place where this was used is for counting the available memory,
which is just as easy performed by checking all objects on the rare
occasions it is required (application startup). For ease of debugging,
we keep the reporting of pinned objects through the error-state and
debugfs.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This was only used by one external caller who would just be as happy
with evict-everything, so perform the replacement and make the function
private.
In the process we note that unbinding the inactive list should not fail,
and make it a warning instead.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
PCH PLLs aren't required for outputs on the CPU, so we shouldn't just
treat them as part of the pipe.
So split the code out and manage PCH PLLs separately, allocating them
when needed or trying to re-use existing PCH PLL setups when the timings
match.
v2: add num_pch_pll field to dev_priv (Daniel)
don't NULL the pch_pll pointer in disable or DPMS will fail (Jesse)
put register offsets in pll struct (Chris)
v3: Decouple enable/disable of PLLs from get/put.
v4: Track temporary PLL disabling during modeset
v5: Tidy PLL initialisation by only checking for num_pch_pll == 0 (Eugeni)
v6: Avoid mishandling allocation failure by embedding the small array of
PLLs into the device struct
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44309
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> (up to v2)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v3+)
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Tested-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Rename obj->tiling_changed to obj->fence_dirty so that it is clear that
it flags when the parameters for an active fence (including the
no-fence) register are changed.
Also, do not set this flag when the object does not have a fence
register allocated currently and the gpu does not depend upon the
unfence. This case works exactly like when a tiled object lost its
fence and hence does not need additional handling for the tiling
change in the code.
v2: Use fence_dirty to better express what the flag tracks and add a few
more details to the comments to serve as a reminder of how the GPU also
uses the unfenced register slot.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Add some bikeshed to the commit message about the stricter
use of fence_dirty.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Almost all of the errors related __iomem problems.
Most of the changes here are trivial, however there is plenty of chance
for yank/paste errors.
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we now never pipeline a fence update, obj->last_fenced_ring is always
the same as the obj->ring whenever obj->last_fenced_seqno is active, so
remove it.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we now no longer track a pipelined fence change, we never use
ring->setup_seqno and can kill it.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We never succeeded in getting pipelined fencing to work (unresolved
spurious GPU hangs), so begin the process of dismantling and removal
the broken code.
Step 1 is the removal of the pipeline parameter to get_fence().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This should contain all the changes which require no thought to make
sparse happy.
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
After a gpu reset we need to re-init some of the hw state we only
initialize when modeset is enabled, like rc6, hw contexts or render/GT
core clock gating and workaround register settings.
Note that this patch has a small change in the resume code:
- rc6 on gen6+ is only restored for the modeset case (for more
consistency with other callsites). This is no problem because recent
kernels refuse to load drm/i915 without kms on gen6+
- rc6/emon on ilk is only restored for the modeset case. This is no
problem because rc6 is disabled by default on ilk, and ums on ilk
has never really been a supported option outside of horrible rhel
backports.
v2: Chris Wilson noticed that we not only fail to restore the clock
gating settings after gpu reset.
v3: Move the call to modeset_init_hw in _reset out of the
struct_mutext protected area - other callers don't hold it, too.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Merge rc6 information into the power group for our device. Until now the
i915 driver has not had any sysfs entries (aside from the connector
stuff enabled by drm core). Since it seems like we're likely to have
more in the future I created a new file for sysfs stubs, as well as the
rc6 sysfs functions which don't really belong elsewhere (perhaps
i915_suspend, but most of the stuff is in intel_display,c).
displays rc6 modes enabled (as a hex mask):
cat /sys/class/drm/card0/power/rc6_enable
displays #ms GPU has been in rc6 since boot:
cat /sys/class/drm/card0/power/rc6_residency_ms
displays #ms GPU has been in deep rc6 since boot:
cat /sys/class/drm/card0/power/rc6p_residency_ms
displays #ms GPU has been in deepest rc6 since boot:
cat /sys/class/drm/card0/power/rc6pp_residency_ms
Important note: I've seen on SNB that even when RC6 is *not* enabled the
rc6 register seems to have a random value in it. I can only guess at the
reason reason for this. Those writing tools that utilize this value need
to be careful and probably want to scrutinize the value very carefully.
v2: use common rc6 residency units to milliseconds for the other RC6 types
v3: don't create sysfs files for GEN <= 5
add a rc6_enable to show a mask of enabled rc6 types
use unmerge instead of remove for sysfs group
squash intel_enable_rc6() extraction into this patch
v4: rename sysfs files (Chris)
CC: Chris Wilson <chris@chris-wilson.co.uk>
CC: Daniel Vetter <daniel.vetter@ffwll.ch>f
CC: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: squash in the 64bit division fix by Chris Wilson.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In theory this will have performance and power improvements. Performance
because we don't need to stall when the scanout BO is busy, and power
because we don't have to stall when the BO is busy (and the ring can
even go to sleep if the HW supports it).
v2:
squash 2 patches into 1 (me)
un-inline the enable_semaphores function (Daniel)
remove comment about SNB hangs from i915_gem_object_sync (Chris)
rename intel_enable_semaphores to i915_semaphore_is_enabled (me)
removed page flip comment; "no why" (Chris)
To address other comments from Daniel (irc):
update the comment to say 'vt-d is crap, don't enable semaphores'
- I think you misinterpreted Chris' comment, it already exists.
checking out whether we can pageflip on the render ring on ivb (didn't
work on early silicon)
- We don't want to enable workarounds for early silicon unless we have
to.
- I can't find any references in the docs about this.
optionally use it if the fb is already busy on the render ring
- This should be how the code already worked, unless I am
misunderstanding your meaning.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
By simplifying the rules to calling get_fence when writing to the
through the GTT in a tiled manner, and calling put_fence before writing
to the object through the GTT in a linear manner, the code becomes
clearer and there is less chance of making a mistake.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[danvet: fixed up conflict with ppgtt code and spelling in a new
comment.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Daniel Vetter wrote
First pull request for 3.5-next, slightly large than usual because new
things kept coming in since the last pull for 3.4.
Highlights:
- first batch of hw enablement for vlv (Jesse et al) and hsw (Eugeni). pci
ids are not yet added, and there's still quite a few patches to merge
(mostly modesetting). To make QA easier I've decided to merge this stuff
in pieces.
- loads of cleanups and prep patches spurred by the above. Especially vlv
is a real frankenstein chip, but also hsw is stretching our driver's
code design. Expect more to come in this area for 3.5.
- more gmbus fixes, cleanups and improvements by Daniel Kurtz. Again,
there are more patches needed (and some already queued up), but I wanted
to split this a bit for better testing.
- pwrite/pread rework and retuning. This series has been in the works for
a few months already and a lot of i-g-t tests have been created for it.
Now it's finally ready to be merged. Note that one patch in this series
touches include/pagemap.h, that patch is acked-by akpm.
- reduce mappable pressure and relocation throughput improvements from
Chris.
- mmap offset exhaustion mitigation by Chris Wilson.
- a start at figuring out which codepaths in our messy dri1/ums+gem/kms
driver we actually need to support by bailing out of unsupported case.
The driver now refuses to load without kms on gen6+ and disallows a few
ioctls that userspace never used in certain cases. More of this will
definitely come.
- More decoupling of global gtt and ppgtt.
- Improved dual-link lvds detection by Takashi Iwai.
- Shut up the compiler + plus fix the fallout (Ben)
- Inverted panel brightness handling (mostly Acer manages to break things
in this way).
- Small fixlets and adjustements and some minor things to help debugging.
Regression-wise QA reported quite a few issues on ivb, but all of them
turned out to be hw stability issues which are already fixed in
drm-intel-fixes (QA runs the nightly regression tests on -next alone,
without -fixes automatically merged in). There's still one issue open on
snb, it looks like occlusion query writes are not quite as cache coherent
as we've expected. With some of the pwrite adjustements we can now
reliably hit this. Kernel workaround for it is in the works."
* 'drm-intel-next' of git://people.freedesktop.org/~danvet/drm-intel: (101 commits)
drm/i915: VCS is not the last ring
drm/i915: Add a dual link lvds quirk for MacBook Pro 8,2
drm/i915: make quirks more verbose
drm/i915: dump the DMA fetch addr register on pre-gen6
drm/i915/sdvo: Include YRPB as an additional TV output type
drm/i915: disallow gem init ioctl on ilk
drm/i915: refuse to load on gen6+ without kms
drm/i915: extract gt interrupt handler
drm/i915: use render gen to switch ring irq functions
drm/i915: rip out old HWSTAM missed irq WA for vlv
drm/i915: open code gen6+ ring irqs
drm/i915: ring irq cleanups
drm/i915: add SFUSE_STRAP registers for digital port detection
drm/i915: add WM_LINETIME registers
drm/i915: add WRPLL clocks
drm/i915: add LCPLL control registers
drm/i915: add SSC offsets for SBI access
drm/i915: add port clock selection support for HSW
drm/i915: add S PLL control
drm/i915: add PIXCLK_GATE register
...
Conflicts:
drivers/char/agp/intel-agp.h
drivers/char/agp/intel-gtt.c
drivers/gpu/drm/i915/i915_debugfs.c
There are 5 DDI ports on Haswell. Port A is always enabled, and is the one
connected to eDP, and Port E is the one that can be connected to the PCH
using FDI protocol. Ports B, C, D and E can be used for digital outputs.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This adds product definitions for desktop, mobile and server boards.
v2: split into a separate patch, add .has_pch_split feature.
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The macro is becoming too complex and with VLV upon us it can lead to
confusion. So transforming this into a feature check instead.
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
[danvet: fixed conflict with is_valleyview addition.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Totally unexpected that this regressed. Luckily it sounds like we just
need to have dmar disable on the igfx, not the entire system. At least
that's what a few days of testing between Tony Vroon and me indicates.
Reported-by: Tony Vroon <tony@linx.net>
Cc: Tony Vroon <tony@linx.net>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=43024
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This allows to select which rc6 modes are to be used via kernel parameter,
via a bitmask parameter. E.g.:
- to enable rc6, i915_enable_rc6=1
- to enable rc6 and deep rc6, i915_enable_rc6=3
- to enable rc6 and deepest rc6, use i915_enable_rc6=5
- to enable rc6, deep and deepest rc6, use i915_enable_rc6=7
Please keep in mind that the deepest RC6 state really should NOT be used
by default, as it could potentially worsen the issues with deep RC6. So do
enable it only when you know what you are doing. However, having it around
could help solving possible future rc6-related issues and their debugging
on user machines.
Note that this changes behavior - previously, value of 1 would enable both
RC6 and deep RC6. Now it should only enable RC6 and deep/deepest RC6
stages must be enabled manually.
v2: address Chris Wilson comments and clean up the code.
References: https://bugs.freedesktop.org/show_bug.cgi?id=42579
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
ValleyView handles force wake differently than previous chipsets, so add
a couple of new functions for it. But leave it disabled by default
until we test it (need a chip with the Punit enabled first).
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
ValleyView puts some display related registers like the PLL controls and
dividers behind the DPIO bus. Add simple indirect register access
routines to get to those registers.
v2: move new wait_for macro to intel_drv.h (Ben)
fix DPIO_PKT double write (Ben)
add debugfs file
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
For use by the rest of the ValleyView code.
v2: fix desktop variant to not set is_mobile (Ben)
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This memory is always allocated, and it is always a fixed size, so just
allocate it along with the rest of the driver state.
Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There is no GMBUS "disabled" port 0, nor "reserved" port 7.
For the other 6 ports there is a fixed 1:1 mapping between pin pairs and
gmbus ports, which means every real gmbus port has a gpio pin.
Given these realizations, clean up gmbus initialization.
Tested on Sandybridge (gen 6, PCH == CougarPoint) hardware.
Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Instead of letting other modules directly access the ->gmbus array,
introduce intel_gmbus_get_adapter() for looking up an i2c_adapter
for a given gmbus port identifier. This will enable later refactoring
of the gmbus port list.
Note: Before requesting an adapter for a given gmbus port number, the
driver must first check its validity using i2c_intel_gmbus_is_port_valid().
If this check fails, a call to intel_gmbus_get_adapter() will WARN_ON and
return NULL. This is relevant for parts of the driver that read a port
from VBIOS, which might be improperly initialized and contain an invalid
port. In these cases, the driver must fall back to using a safer default
port.
Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
No longer needed.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We try to avoid writing the relocations through the uncached GTT, if the
buffer is currently in the CPU write domain and so will be flushed out to
main memory afterwards anyway. Also on SandyBridge we can safely write
to the pages in cacheable memory, so long as the buffer is LLC mapped.
In either of these cases, we therefore do not need to force the
reallocation of the buffer into the mappable region of the GTT, reducing
the aperture pressure.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
... because this is what it actually doesn now that we have the global
gtt vs. ppgtt split.
Also move it to the other global gtt functions in i915_gem_gtt.c
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Again, Valleyview modes these around, so make the mmio base more
explicit to consolidate the base address computations to one
HAS_PCH_SPLIT check.
v2: Fix up the PCH_SPLIT braino ... it actually works that way round.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It's only used by the main read/write functions, so we can keep it with
them.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Add a new module optoin lvds_channel to specify the LVDS channel mode
explicitly instead of probing the LVDS register value set by BIOS.
This will be helpful when VBT is broken or incompatible with the
current code.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=42842
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Currently i915 driver checks [PCH_]LVDS register bits to decide
whether to set up the dual-link or the single-link mode. This relies
implicitly on that BIOS initializes the register properly at boot.
However, BIOS doesn't initialize it always. When the machine is
booted with the closed lid, BIOS skips the LVDS reg initialization.
This ends up in blank output on a machine with a dual-link LVDS when
you open the lid after the boot.
This patch adds a workaround for that problem by checking the initial
LVDS register value in VBT.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37742
Tested-By: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
And track the existence of such a binding similar to the aliasing
ppgtt case. Speeds up binding/unbinding in the common case where we
only need a ppgtt binding (which is accessed in a cpu coherent fashion
by the gpu) and no gloabl gtt binding (which needs uc writes for the
ptes).
This patch just puts the required tracking in place.
v2: Check that global gtt mappings exist in the error_state capture
code (with Chris Wilson's llc reloc patches batchbuffers are no longer
relocated as mappable in all situations, so this matters). Suggested
by Chris Wilson.
v3: Adapted to Chris' latest llc-reloc patches.
v4: Fix a bug in the i915 error state capture code noticed by Chris
Wilson.
Reviewed-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Note that there's a functional change buried in this patch wrt the ilk
dmar workaround: We now only idle the gpu while tearing down the dmar
mappings, not while clearing the gtt. Keeping the current semantics
would have made for some really ugly code and afaik the issue is only
with the dmar unmapping that needs a fully idle gpu.
Reviewed-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
A machine may need to invert the panel backlight brightness value. This
patch adds the infrastructure for a quirk to do so.
Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This way we can simplify the setup and teardown a bit.
Because we don't actually allocate anything anymore for the force_bit
case, we can now convert that into a boolean.
Also and the functionality supported by the bit-banging together with
what gmbus can do, so that this doesn't randomly change any more.
v2: Chris Wilson noticed that I've mixed up && and & ...
v3: Clarify an if block as suggested by Eugeni Dodonov.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
... and directly call the newly exported i2c bit-banging functions.
The code is still pretty convoluted because we only set up the gpio
i2c stuff when actually falling back, resulting in more complexity
than necessary. This will be fixed up in the next patch.
v2: Use exported i2c_bit_algo vtable instead of exported functions.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When we set up the gpio fallback, we always have a 1:1 relationship
with an intel_gmbus. Exploit that to store all gpio related data in
there, too. This is a preparation step to merge the tw i2c adapters
controlling the same bus into one.
Just mundane code-munging in this patch.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This way we can free up the bus->adaptor.algo_data pointer and make it
available for use with the bitbanging fallback algo.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
gcc seems to get uber-anal recently about these things.
Clarification from Dan Carpenter:
"Sorry, I should have said that it's not a gcc warning, it's a smatch
thing. But also it's not uber-anal. It's the exact level of anality
which is required to make the == -1 test work. You can compare
unsigned int and longs to -1 and it works but for smaller types it
doesn't."
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
So that we can tally the request against the command sequence in the
ringbuffer, or merely jump to the interesting locations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Being able to tally the list of outstanding requests with the sequence
of commands in the ringbuffer is often useful evidence with respect to
driver corruption.
Note that since this is the umpteenth per-ring data structure to be added
to the error state, I've coallesced the nearby loops (the ringbuffer and
batchbuffer) into a single structure along with the list of requests. A
later task would be to refactor the ring register state into the same
structure.
v2: Fix pretty printing of requests so that they are parsed correctly by
intel_error_decode and use the 0x%08x format for seqno for consistency
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>