Allow encoders to customize their hotplug processing by moving the
intel_hpd_irq_event() code into an encoder hotplug vfunc. Currently
only SDVO needs this to re-enable hotplug signalling in the SDVO
chip. We'll use this same hook for DP/HDMI link management later.
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180117192149.17760-1-ville.syrjala@linux.intel.com
No functional change since WA is already applied.
But since it has different names on different databases,
let's document it here to avoid future confusion.
Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180306012812.19779-1-rodrigo.vivi@intel.com
No functional change. WA is already properly applied.
but in different databases it has different names.
Let's document all of them to avoid future confusion.
Cc: Rafael Antognolli <rafael.antognolli@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180306012000.18928-1-rodrigo.vivi@intel.com
In fact, apply the Cannonlake resolution check for all >= Gen-10 platforms
to be safe.
v3: Update GLK too. (Ville)
Longer variable names.
if-else in place of ternary operator.
v2: Use local variables for resolution limits and print them (Ville)
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Elio Martinez Monroy <elio.martinez.monroy@intel.com>
Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180306203355.29292-1-dhinakaran.pandiyan@intel.com
The gfx/compute profiling mode switch is only for internally
test. Not a complete solution and unexpectly upstream.
so revert it.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Compute workload tends to be "bursty", Only tune the behavior of
nature dpm don't work well for most of such workloads. From test
results, Fix sclk in highest two levels can get better performance.
so add min sclk setting into the default cumpute workload policy on
smu7.
user still can change sclk range through sysfs pp_dpm_sclk
for better perf/watt.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It show what parameters can be configured to tune
the behavior of natural dpm for perf/watt on smu7.
user can select the mode per workload, but even the default per
workload settings are not bulletproof.
user can configure custom settings per different use case
for better perf or better perf/watt.
cat pp_power_profile_mode
NUM MODE_NAME SCLK_UP_HYST SCLK_DOWN_HYST SCLK_ACTIVE_LEVEL MCLK_UP_HYST MCLK_DOWN_HYST MCLK_ACTIVE_LEVEL
0 3D_FULL_SCREEN: 0 100 30 0 100 10
1 POWER_SAVING: 10 0 30 - - -
2 VIDEO: - - - 10 16 31
3 VR: 0 11 50 0 100 10
4 COMPUTE: 0 5 30 - - -
5 CUSTOM: 0 0 0 0 0 0
* CURRENT: 0 100 30 0 100 10
Under manual dpm level,
user can echo "0/1/2/3/4">pp_power_profile_mode
to select 3D_FULL_SCREEN/POWER_SAVING/VIDEO/VR/COMPUTE
mode.
echo "5 * * * * * * * *">pp_power_profile_mode
to set custom settings.
"5 * * * * * * * *" mean "CUSTOM enable_sclk SCLK_UP_HYST
SCLK_DOWN_HYST SCLK_ACTIVE_LEVEL enable_mclk MCLK_UP_HYST
MCLK_DOWN_HYST MCLK_ACTIVE_LEVEL"
if the parameter enable_sclk/enable_mclk is true,
driver will update the following parameters to dpm table.
if false, ignore the following parameters.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
use SW method to update DPM settings by updating SRAM
directly on CI.
Reviewed-by: Alex Deucher <alexdeucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
use SW method to update DPM settings by updating SRAM
directly on Tonga.
Reviewed-by: Alex Deucher <alexdeucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
use SW method to update DPM settings by updating SRAM
directly on Fiji.
Reviewed-by: Alex Deucher <alexdeucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Previously, we would spin waiting for all waiters to wake up and notice
their request had completed before we would reset the seqno upon
wraparound. However, we can mark their waits as complete and wake them
up directly using the existing machinery for handling the flushing of
missed wakeups when idling.
Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180306130143.13312-2-chris@chris-wilson.co.uk
Since commit fd10e2ce99 ("drm/i915/breadcrumbs: Ignore unsubmitted
signalers"), we cancel the signaler when retiring the request and so
upon wraparound, where we wait for all requests to be retired, we no
longer need to spin waiting for the signaling thread to release its
references to the in-flight requests, and so we can assert that the
signaler is idle.
References: fd10e2ce99 ("drm/i915/breadcrumbs: Ignore unsubmitted signalers")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180306130143.13312-1-chris@chris-wilson.co.uk
Reject requests to add properties/enums with an overly long name.
Previously we would have just silently truncated the string and exposed
it userspace.
v2: drm_property_create() returns a pointer
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180302140300.31110-1-ville.syrjala@linux.intel.com
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When parking the engines and their breadcrumbs, if we have waiters left
then they missed their wakeup. Verify that each waiter's seqno did
complete.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180222092545.17216-2-chris@chris-wilson.co.uk
The goal here is to try and reduce the latency of signaling additional
requests following the wakeup from interrupt by reducing the list of
to-be-signaled requests from an rbtree to a sorted linked list. The
original choice of using an rbtree was to facilitate random insertions
of request into the signaler while maintaining a sorted list. However,
if we assume that most new requests are added when they are submitted,
we see those new requests in execution order making a insertion sort
fast, and the reduction in overhead of each signaler iteration
significant.
Since commit 56299fb7d9 ("drm/i915: Signal first fence from irq handler
if complete"), we signal most fences directly from notify_ring() in the
interrupt handler greatly reducing the amount of work that actually
needs to be done by the signaler kthread. All the thread is then
required to do is operate as the bottom-half, cleaning up after the
interrupt handler and preparing the next waiter. This includes signaling
all later completed fences in a saturated system, but on a mostly idle
system we only have to rebuild the wait rbtree in time for the next
interrupt. With this de-emphasis of the signaler's role, we want to
rejig it's datastructures to reduce the amount of work we require to
both setup the signal tree and maintain it on every interrupt.
References: 56299fb7d9 ("drm/i915: Signal first fence from irq handler if complete")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180222092545.17216-1-chris@chris-wilson.co.uk
The Amlogic Meson GX SoCs, embedded the v2.01a controller, has been also
identified needing this workaround.
This patch adds the corresponding version to enable a single iteration for
this specific version.
Fixes: be41fc55f1 ("drm: bridge: dw-hdmi: Handle overflow workaround based on device version")
Acked-by: Archit Taneja <architt@codeaurora.org>
[narmstrong: s/identifies/identified and rebased against Jernej's change]
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1519386277-25902-1-git-send-email-narmstrong@baylibre.com
error->device_info.has_guc, which we check in capture_uc_state, is set
in capture_gen_state, so the latter needs to be performed first.
v2: rebased
Reported-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Fixes: 7d41ef3479 (drm/i915: Add Guc/HuC firmware details to error state)
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180305222122.3547-3-daniele.ceraolospurio@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
some of the static functions used from capture() have the "i915_"
prefix while other don't; most of them take i915 as a parameter, but one
of them derives it internally from error->i915. Let's be consistent by
avoiding prefix for static functions and by getting i915 from
error->i915. While at it, s/dev_priv/i915 in functions that don't
perform register reads.
v2: take i915 from error->i915 (Michal), s/dev_priv/i915,
update commit message
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180305222122.3547-2-daniele.ceraolospurio@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The Parfait (version 2.1.0) static code analysis tool found the
following NULL pointer dereference problem.
- drivers/gpu/drm/drm_drv.c
Any calls to drm_minor_get_slot() could result in the return of a NULL
pointer when an invalid DRM device type is encountered. The
return of NULL was removed with BUG() from drm_minor_get_slot().
Signed-off-by: Joe Moriarty <joe.moriarty@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20180220191157.100960-3-joe.moriarty@oracle.com
The Parfait (version 2.1.0) static code analysis tool found the
following NULL pointer derefernce problem.
- drivers/gpu/drm/drm_vblank.c
Null pointer checks were added to return values from calls to
drm_crtc_from_index(). There is a possibility, however minute, that
crtc->index may not be found when trying to find the struct crtc
from it's assigned index given in drm_crtc_init_with_planes().
3 return checks for NULL where added with a call to
WARN_ON(!crtc).
Signed-off-by: Joe Moriarty <joe.moriarty@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20180220191157.100960-2-joe.moriarty@oracle.com
In XenGT, ioreq copy is used to trap mmio write and ppgtt write. Both
of them are memory write, ioreq handler couldn't distinguish them. So
ioreq handler probe the ppgtt write handler, if it is succuess, this
ioreq is ppgtt write, otherwise it is mmio write.
So ppgtt write handler should return an error at the failure of finding
page track, it is fatal to implement ioreq handler in XenGT.
Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
page_track_handler take lock at the beginning, the lock should be released
at the failure of finding page track. Otherwise deadlock will happen.
Fixes: e502a2af4c ("drm/i915/gvt: Provide generic page_track infrastructure for write-protected page")
Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Add a new debugfs entry kvmgt_nr_cache_entries under vgpu which shows
the number of entry in dma cache.
$ cat /sys/kernel/debug/gvt/vgpu1/kvmgt_nr_cache_entries
10101
v3: fix compiling error for some configuration. (Xiong Zhang <xiong.y.zhang@intel.com>)
v2: keep debugfs layout flat.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
The implementation of current kvmgt implicitly setup dma mapping at MPT
API gfn_to_mfn. First this design against the API's original purpose.
Second, there is no unmap hit in this design. The result is that the
dma mapping keep growing larger and larger. For mutl-vm case, they will
consume IOMMU IOVA low 4GB address space quickly and so tons of rbtree
entries crated in the IOMMU IOVA allocator. Finally, single IOVA
allocation can take as long as ~70ms. Such latency is intolerable.
To address both above issues, this patch introduced two new MPT API:
o dma_map_guest_page - setup dma map for guest page
o dma_unmap_guest_page - cancel dma map for guest page
The kvmgt implements these 2 API. And to reduce dma setup overhead for
duplicated pages (eg. scratch pages), two caches are used: one is for
mapping gfn to struct gvt_dma, another is for mapping dma addr to
struct gvt_dma.
With these 2 new API, the gtt now is able to cancel dma mapping when page
table is invalidated. The dma mapping is not in a gradual increase now.
v2: follow the old logic for VFIO_IOMMU_NOTIFY_DMA_UNMAP at this point.
Cc: Hang Yuan <hang.yuan@intel.com>
Cc: Xiong Zhang <xiong.y.zhang@intel.com>
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Fix below error with minor code refactor.
CHECK drivers/gpu/drm/i915//gvt/handlers.c
drivers/gpu/drm/i915//gvt/handlers.c:203 sanitize_fence_mmio_access() error: 'vgpu' dereferencing possible ERR_PTR()
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Fix check error at
CHECK drivers/gpu/drm/i915//gvt/kvmgt.c
drivers/gpu/drm/i915//gvt/kvmgt.c:455 intel_vgpu_create() error: we previously assumed 'vgpu' could be null (see line 454)
For failed vgpu create, just show error return in failure message.
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
There is one issue relates to Coarse Power Gating(CPG) on KBL NUC in GVT-g,
vgpu can't get the correct default context by updating the registers before
inhibit context submission. It always get back the hardware default value
unless the inhibit context submission happened before the 1st time
forcewake put. With this wrong default context, vgpu will run with
incorrect state and meet unknown issues.
The solution is initialize these mmios by adding lri command in ring buffer
of the inhibit context, then gpu hardware has no chance to go down RC6 when
lri commands are right being executed, and then vgpu can get correct
default context for further use.
v3:
- fix code fault, use 'for' to loop through mmio render list(Zhenyu)
v4:
- save the count of engine mmio need to be restored for inhibit context and
refine some comments. (Kevin)
v5:
- code rebase
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Weinan Li <weinan.z.li@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
No functional change, just for easy to use.
v4:
- refine comment (Kevin)
Signed-off-by: Weinan Li <weinan.z.li@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
No functional change. This defination will also be used in future patchesi.
v4:
- refine patch description (Kevin)
Signed-off-by: Weinan Li <weinan.z.li@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
We don't know how many page tables will be shadowed. It varies
considerably corresponding to guest load. Radix tree is a better
choice for us. Since Page Frame Number is used as key so most of
the bits are common.
Here is some performance data (duration in us) of looking up a
element:
Before: (aka. ppgtt_find_shadow_page)
0.308 0.292 0.246 0.432 0.143 ... 0.311 0.225 0.382 0.199 0.325
After: (aka. intel_vgpu_find_spt_by_mfn)
0.106 0.106 0.107 0.106 0.105 0.107 ... 0.107 0.109 0.105 0.108
This time I didn't get the early data of hash table. The data is
measured when desktop is shown.
As last change, the overall benchmark almost is not changed, but
we get better scalability.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
This patch provide generic page_track infrastructure for write-protected
guest page. The old page_track logic gets rewrote and now stays in a new
standalone page_track.c. This page track infrastructure can be both used
by vGUC and GTT shadowing.
The important change is that it uses radix tree instead of hash table.
We don't have a predictable number of pages that will be tracked.
Here is some performance data (duration in us) of looking up a element:
Before: (aka. intel_vgpu_find_tracked_page)
0.091 0.089 0.090 ... 0.093 0.091 0.087 ... 0.292 0.285 0.292 0.291
After: (aka. intel_vgpu_find_page_track)
0.104 0.105 0.100 0.102 0.102 0.100 ... 0.101 0.101 0.105 0.105
The hash table has good performance at beginning, but turns bad with
more pages being tracked even no 3D applications are running. As
expected, radix tree has stable duration and very quick.
The overall benchmark (tested with Heaven Benchmark) marginally improved
since this is not the bottleneck. What we benefit more from this change
is scalability.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Don't extend page_track to mpt layer. Keep MPT simple and clean.
Meanwhile remove gtt.n_tracked_guest_page which doesn't make much
sense.
v2: clean up gtt.n_tracked_guest_page.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
The kvmgt's implementation of mpt api {set,unset}_wp_page is not real
write-protection - the data get written before invoke this two api.
As discussed, change the mpt api to match the real behavior.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
The target structure of some functions is struct intel_vgpu_ppgtt_spt and
their names are xxx_shadow_page. It should be xxx_shadow_page_table. Let's
use short name 'spt' instead to reduce the length. As well as the hash
table name.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
This is a another big one and the GVT shadow page management code is
heavily refined.
The new code only use struct intel_vgpu_ppgtt_spt to represent a vgpu
shadow page table - w/ or wo/ a guest page associated with. A pure shadow
page (no guest page associated) will be used to shadow splited 2M huge
gtt. In this case, the spt.guest_page.gfn should be a zero.
To search a existed shadow page table, we have two new interfaces:
- intel_vgpu_find_spt_by_gfn(), find a spt by guest gfn. It must not
be a pure spt.
- intel_vgpu_find_spt_by_mfn, Find the spt using shadow page mfn in
shadowed PTE.
The oos_page management is remained as what is was.
v2: Split some changes into small standalone patches.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Make the shadow PTE population code clear. Later we will add huge gtt
support based on this.
v2:
- rebase to latest code.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Reviewed-by: Zhi Wang <zhi.wang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
GTT entry has similar format with the CPU PTE. We'd prefer named macro
instead of hardcode.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Factor out these two interfaces so we can kill some duplicated code in
scheduler.c.
v2:
- rename to intel_vgpu_{get,put}_ppgtt_mm
- refine handle_g2v_notification
Signed-off-by: Changbin Du <changbin.du@intel.com>
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Accurate names help to avoid confusing so improve readability.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>