Make the code a bit easier to read by providing symbolic names
for the reply_type (ACK vs. NAK). Also clean up some brace stuff
while at it.
v2: s/DP_REPLY/DP_SIDEBAND_REPLY/ (DK)
Fix some checkpatch issues
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190122200301.18633-1-ville.syrjala@linux.intel.com
It only talks about crtc, brings up intel as an example and I think is
more misleading than useful really. Plus we have lots of discussion
about how your standard kms driver should be initialized/cleaned up,
so maybe better to document this when we have a better idea.
v2: Fix typo in commit message (Nicholas).
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190130163006.28945-2-daniel.vetter@ffwll.ch
Yes it's inconsitent with vrr_capable, but this is the actual uapi as
exercise by igt.
Fixes: ab7a664f7a ("drm: Document variable refresh properties")
Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190130163006.28945-3-daniel.vetter@ffwll.ch
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
This patch fixes the following warnings:
drivers/gpu/drm/savage/savage_state.c:301:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/gpu/drm/savage/savage_state.c:438:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/gpu/drm/savage/savage_state.c:559:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/gpu/drm/savage/savage_state.c:697:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
Warning level 3 was used: -Wimplicit-fallthrough=3
This patch is part of the ongoing efforts to enabling
-Wimplicit-fallthrough.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129202005.GA25789@embeddedor
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
This patch fixes the following warnings:
drivers/gpu/drm/via/via_dmablit.c:179:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/gpu/drm/via/via_dmablit.c:185:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/gpu/drm/via/via_dmablit.c:187:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/gpu/drm/via/via_dmablit.c:195:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
Warning level 3 was used: -Wimplicit-fallthrough=3
This patch is part of the ongoing efforts to enabling
-Wimplicit-fallthrough.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129201742.GA25660@embeddedor
The > should be >= to avoid an off by one bug.
Fixes: c46c24bb6b ("drm/komeda: Add komeda_framebuffer")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: James Qian Wang (Arm Technology China) <james.qian.wang@arm.com>
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
The spec doesn't use a definite article in front of SAGV. The
rules regarding articles and initialisms are super fuzzy, but
at least to my ears it sounds much more natural to not have
the article. Perhaps because I tend to pronounce it as
"sag-vee" instead of spelling out the letters one at a time.
Actually I might still prefer to leave out the article if I
did spell them out.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181221171436.8218-8-ville.syrjala@linux.intel.com
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
On icl+ bspec tells us to calculate a separate minimum ddb
allocation from the blocks watermark. Both have to be checked
against the actual ddb allocation, but since we do things the
other way around we'll just calculat the minimum acceptable
ddb allocation by taking the maximum of the two values.
We'll also replace the memcmp() with a full trawl over the
the watermarks so that it'll ignore the min_ddb_alloc
because we can't directly read that out from the hw. I suppose
we could reconstruct it from the other values, but I was
too lazy to do that now.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181221171436.8218-6-ville.syrjala@linux.intel.com
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Bspec says we have to reject the watermark if it's >= the ddb
allocation. Fix the code to reject the == case as it should.
For transition watermarks we can just use >=, for the rest
we'll do +1 when calculating the minimum ddb allocation size.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181221171436.8218-5-ville.syrjala@linux.intel.com
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
The spec used to say "8bpp" which someone took to mean 8 bytes per
pixel when in fact it was supposed to be 8 bits per pixel. The
spec has been updated to make it more clear now. Fix the code
to match.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181221171436.8218-4-ville.syrjala@linux.intel.com
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
I thought we could remove all the early latency==0 checks
and rely on skl_wm_method{1,2}() checking for it. But
skl_compute_plane_wm() applies a bunch of workarounds to bump
up the latency before calling those guys so clearly it won't
end up doing the right thing. Also not sure if the calculations
based on the method1/2 results are safe agaisnt overflows so
it might not work all that well in any case. Let's put the
early check back.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181221171436.8218-3-ville.syrjala@linux.intel.com
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
On glk+ the level 0 lines watermark actually matters. Do not ignore it.
And while at it let's change things so that we always program a
consistnet 0 to the register when the lines watermarks is ignored
by the hardware.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181221171436.8218-2-ville.syrjala@linux.intel.com
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
We should not pass DPLL_ID_ICL_DPLL0 or DPLL_ID_ICL_DPLL1 to this
function because the path is only taken for non-combophy ports. Let the
warning trigger if improper value is given.
While at it, rename the function to match the register name we are
trying to program.
v2: fix typo in comment
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190125222444.19926-4-lucas.demarchi@intel.com
Even if we don't have the correct clock and get a warning, we should not
skip the return.
v2: improve commit message (from Joonas)
Fixes: 1fa11ee2d9 ("drm/i915/icl: start adding the TBT pll")
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: <stable@vger.kernel.org> # v4.19+
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190125222444.19926-3-lucas.demarchi@intel.com
Fix the TODO leftover in the code by changing the argument in MG_PLL
macros. The MG_PLL ids used to access the register values can be
converted from tc_port rather than port.
All these registers can use the TC port to calculate the right offsets
because they are only available for TC ports. The range (PORT_C onwards)
may not be stable and change from platform to platform. So by using the
TC id directly we avoid having to check for the platform in the "leaf
functions" and thus passing dev_priv around.
The helper functions were also renamed to use "tc" as prefix to make
them more generic.
v2: Improve commit message and fix checkpatch warning (from Paulo)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190125222444.19926-2-lucas.demarchi@intel.com
Missed breadcrumb detection is defunct due to the tight coupling with
dma_fence signaling and the myriad ways we may signal fences from
everywhere but from an interrupt, i.e. we frequently signal a fence
before we even see its interrupt. This means that even if we miss an
interrupt for a fence, it still is signaled before our breadcrumb
hangcheck fires, so simplify the breadcrumb hangchecking by moving it
into the GPU hangcheck and forgo fake interrupts.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-3-chris@chris-wilson.co.uk
A few years ago, see commit 688e6c7258 ("drm/i915: Slaughter the
thundering i915_wait_request herd"), the issue of handling multiple
clients waiting in parallel was brought to our attention. The
requirement was that every client should be woken immediately upon its
request being signaled, without incurring any cpu overhead.
To handle certain fragility of our hw meant that we could not do a
simple check inside the irq handler (some generations required almost
unbounded delays before we could be sure of seqno coherency) and so
request completion checking required delegation.
Before commit 688e6c7258, the solution was simple. Every client
waiting on a request would be woken on every interrupt and each would do
a heavyweight check to see if their request was complete. Commit
688e6c7258 introduced an rbtree so that only the earliest waiter on
the global timeline would woken, and would wake the next and so on.
(Along with various complications to handle requests being reordered
along the global timeline, and also a requirement for kthread to provide
a delegate for fence signaling that had no process context.)
The global rbtree depends on knowing the execution timeline (and global
seqno). Without knowing that order, we must instead check all contexts
queued to the HW to see which may have advanced. We trim that list by
only checking queued contexts that are being waited on, but still we
keep a list of all active contexts and their active signalers that we
inspect from inside the irq handler. By moving the waiters onto the fence
signal list, we can combine the client wakeup with the dma_fence
signaling (a dramatic reduction in complexity, but does require the HW
being coherent, the seqno must be visible from the cpu before the
interrupt is raised - we keep a timer backup just in case).
Having previously fixed all the issues with irq-seqno serialisation (by
inserting delays onto the GPU after each request instead of random delays
on the CPU after each interrupt), we can rely on the seqno state to
perfom direct wakeups from the interrupt handler. This allows us to
preserve our single context switch behaviour of the current routine,
with the only downside that we lose the RT priority sorting of wakeups.
In general, direct wakeup latency of multiple clients is about the same
(about 10% better in most cases) with a reduction in total CPU time spent
in the waiter (about 20-50% depending on gen). Average herd behaviour is
improved, but at the cost of not delegating wakeups on task_prio.
v2: Capture fence signaling state for error state and add comments to
warm even the most cold of hearts.
v3: Check if the request is still active before busywaiting
v4: Reduce the amount of pointer misdirection with list_for_each_safe
and using a local i915_request variable inside the loops
v5: Add a missing pluralisation to a purely informative selftest message.
References: 688e6c7258 ("drm/i915: Slaughter the thundering i915_wait_request herd")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-2-chris@chris-wilson.co.uk
The global seqno is defunct and so we have no meaningful indicator of
forward progress for an engine. You need to listen to the request
signaling tracepoints instead.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-1-chris@chris-wilson.co.uk
Let's switch the pipe into interlaced mode and switch off
the TV encoder vertical filter if the pipe vdisplay
matches the TV YSIZE exactly.
While I didn't measure it I presume this might reduce
the power consumption a little bit, and the pixel rate
is halved as the pipe will now fetching in interlaced
mode rather than in progressive mode (effectively the
same difference as between IF-ID vs. PF-ID pfit modes
on more modern hardware) so a bit easier on the memory
bandwidth.
Cc: Imre Deak <imre.deak@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129141913.5515-2-ville.syrjala@linux.intel.com
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
intel_tv_mode_to_mode() assumes the pipe will be in progressive
fetch mode, and thus when programming the pipe into interlaced
mode we have to halve the calculated dotclock to get the correct
field duration.
This becomes more important when we start to program the pipe
into interlaced mode on i965gm as we depend on the timestamps
to get accurate frame counter values. Withot halving the clock
our guesstimated frame counter would tick at twice the expected
speed.
Cc: Imre Deak <imre.deak@intel.com>
Fixes: 690157f0a9 ("drm/i915/tv: Fix >1024 modes on gen3")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129141913.5515-1-ville.syrjala@linux.intel.com
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
drm_color_lut_check() doens't modify the passed in blob so
let's make it const.
Also s/uint32_t/u32/ while at it.
v2: Reduce line wraps (Sam)
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129170609.5718-1-ville.syrjala@linux.intel.com
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reduce the repeated node and hive information during XGMI initialization
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
kptr is not used any more.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
sriov need to restrict max_pfn below AMDGPU_GMC_HOLE.
access the hole results in a range fault interrupt IIRC.
Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
After removing unnecessary VM size calculations,
vm_manager.max_pfn would reach 0x10,0000,0000
max_pfn << AMDGPU_GPU_PAGE_SHIFT exceeding AMDGPU_GMC_HOLE_START
would cause GPU reset.
Signed-off-by: wentalou <Wentao.Lou@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In order to avoid preempting ourselves, we currently refuse to schedule
the tasklet if we reschedule an inflight context. However, this glosses
over a few issues such as what happens after a CS completion event and
we then preempt the newly executing context with itself, or if something
else causes a tasklet_schedule triggering the same evaluation to
preempt the active context with itself.
However, when we avoid preempting ELSP[0], we still retain the preemption
value as it may match a second preemption request within the same time period
that we need to resolve after the next CS event. However, since we only
store the maximum preemption priority seen, it may not match the
subsequent event and so we should double check whether or not we
actually do need to trigger a preempt-to-idle by comparing the top
priorities from each queue. Later, this gives us a hook for finer
control over deciding whether the preempt-to-idle is justified.
The sequence of events where we end up preempting for no avail is:
1. Queue requests/contexts A, B
2. Priority boost A; no preemption as it is executing, but keep hint
3. After CS switch, B is less than hint, force preempt-to-idle
4. Resubmit B after idling
v2: We can simplify a bunch of tests based on the knowledge that PI will
ensure that earlier requests along the same context will have the highest
priority.
v3: Demonstrate the stale preemption hint with a selftest
References: a2bf92e8cc ("drm/i915/execlists: Avoid kicking priority on the current context")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-4-chris@chris-wilson.co.uk
After noticing that we trigger preemption events for currently executing
requests, as well as requests that complete before the preemption and
attempting to suppress those preemption events, it is wise to not
consider the queue_priority to be authoritative. As we only track the
maximum priority seen between dequeue passes, if the maximum priority
request is no longer available for dequeuing (it completed or is even
executing on another engine), we have no knowledge of the previous
queue_priority as it would require us to keep a full history of enqueued
requests -- but we already have that history in the priolists!
Rename the queue_priority to queue_priority_hint so that we do not
confuse it as being exactly the maximum priority in the queue, but merely
an indication that we have seen a new maximum priority value and as such
we should check whether it should preempt the currently running request.
v2: s/preempt_priority_hint/queue_priority_hint/ as preempt implies it
being only used for the singular task of preemption and not the wider
question of waking up due to a change in the queue.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-3-chris@chris-wilson.co.uk
To allow requests to forgo a common execution timeline, one question we
need to be able to answer is "is this request running?". To track
whether a request has started on HW, we can emit a breadcrumb at the
beginning of the request and check its timeline's HWSP to see if the
breadcrumb has advanced past the start of this request. (This is in
contrast to the global timeline where we need only ask if we are on the
global timeline and if the timeline has advanced past the end of the
previous request.)
There is still confusion from a preempted request, which has already
started but relinquished the HW to a high priority request. For the
common case, this discrepancy should be negligible. However, for
identification of hung requests, knowing which one was running at the
time of the hang will be much more important.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-2-chris@chris-wilson.co.uk
In bringup on simulated HW even rudimentary tests are slow, and so many
may fail that we want to be able to filter out the noise to focus on the
specific problem. Even just the tests groups provided for igt is not
specific enough, and we would like to isolate one particular subtest
(and probably subsubtests!). For simplicity, allow the user to provide a
command line parameter such as
i915.st_filter=i915_timeline_mock_selftests/igt_sync
to restrict ourselves to only running on subtest. The exact name to use
is given during a normal run, highlighted as an error if it failed,
debug otherwise. The test group is optional, and then all subtests are
compared for an exact match with the filter (most subtests have unique
names). The filter can be negated, e.g. i915.st_filter=!igt_sync and
then all tests but those that match will be run. More than one match can
be supplied separated by a comma, e.g.
i915.st_filter=igt_vma_create,igt_vma_pin1
to only run those specified, or
i915.st_filter=!igt_vma_create,!igt_vma_pin1
to run all but those named. Mixing a blacklist and whitelist will only
execute those subtests matching the whitelist so long as they are
previously excluded in the blacklist.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-1-chris@chris-wilson.co.uk
This is only used by drm_irq_install(), which is an optional helper.
For legacy pci devices this is required (due to interrupt sharing without
msi/msi-x), and just making this the default exactly matches the behaviour
of all existing drivers using the drm_irq_install() helpers. In case that
ever becomes wrong drivers can roll their own irq handling, as many
drivers already do (for other reasons like needing a threaded interrupt
handler, or having an entire pile of different interrupt sources).
v2: Rebase
v3: Improve commit message (Emil)
Cc: Sam Ravnborg <sam@ravnborg.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129104248.26607-3-daniel.vetter@ffwll.ch
And move the documenation we alreay have into kerneldoc, plus a bit of
polish while at it.
v2:
- Ditch FIXME from commit message, I've resolved that already before
sending out the first version.
- Put the legacy DRIVER_ flags at the end (Sam).
Cc: Sam Ravnborg <sam@ravnborg.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129104248.26607-2-daniel.vetter@ffwll.ch
Both macros evaluate to 0. At the same time flag is already set to
zero since the struct is kzalloc'd in framebuffer_alloc().
As called by drm_fb_helper_alloc_fbi() in the DRM drivers.
v2: Rebase and improve commit message per Emil's suggestion.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Kukjin Kim <kgene@kernel.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Sandy Huang <hjc@rock-chips.com>
Cc: "Heiko Stübner" <heiko@sntech.de>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Jonathan Hunter <jonathanh@nvidia.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Alexander Kapshuk <alexander.kapshuk@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: nouveau@lists.freedesktop.org
Cc: linux-rockchip@lists.infradead.org
Cc: linux-tegra@vger.kernel.org
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190124165831.16427-27-daniel.vetter@ffwll.ch
We really want to have fastboot enabled by default to avoid an ugly
modeset during boot.
Rather then enabling it everywhere, lets start with enabling it on
Skylake and newer.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190124130114.3967-1-maarten.lankhorst@linux.intel.com
Now that we pin timelines around use, we have a clearly defined lifetime
and convenient points at which we can track only the active timelines.
This allows us to reduce the list iteration to only consider those
active timelines and not all.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190128181812.22804-6-chris@chris-wilson.co.uk
Now that we have allocated ourselves a cacheline to store a breadcrumb,
we can emit a write from the GPU into the timeline's HWSP of the
per-context seqno as we complete each request. This drops the mirroring
of the per-engine HWSP and allows each context to operate independently.
We do not need to unwind the per-context timeline, and so requests are
always consistent with the timeline breadcrumb, greatly simplifying the
completion checks as we no longer need to be concerned about the
global_seqno changing mid check.
One complication though is that we have to be wary that the request may
outlive the HWSP and so avoid touching the potentially danging pointer
after we have retired the fence. We also have to guard our access of the
HWSP with RCU, the release of the obj->mm.pages should already be RCU-safe.
At this point, we are emitting both per-context and global seqno and
still using the single per-engine execution timeline for resolving
interrupts.
v2: s/fake_complete/mark_complete/
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190128181812.22804-5-chris@chris-wilson.co.uk
If we restrict ourselves to only using a cacheline for each timeline's
HWSP (we could go smaller, but want to avoid needless polluting
cachelines on different engines between different contexts), then we can
suballocate a single 4k page into 64 different timeline HWSP. By
treating each fresh allocation as a slab of 64 entries, we can keep it
around for the next 64 allocation attempts until we need to refresh the
slab cache.
John Harrison noted the issue of fragmentation leading to the same worst
case performance of one page per timeline as before, which can be
mitigated by adopting a freelist.
v2: Keep all partially allocated HWSP on a freelist
This is still without migration, so it is possible for the system to end
up with each timeline in its own page, but we ensure that no new
allocation would needless allocate a fresh page!
v3: Throw a selftest at the allocator to try and catch invalid cacheline
reuse.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190128181812.22804-4-chris@chris-wilson.co.uk