Adding the HDCP2.2 capability of HDCP src and sink info into debugfs
entry "i915_hdcp_sink_capability"
This helps the userspace tests to skip the HDCP2.2 test on non HDCP2.2
sinks.
v2:
Rebased.
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20190507162745.25600-3-ramalingam.c@intel.com
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJc04M6AAoJEAx081l5xIa+SJgP/0uIgIOM53vPpydgmr+2IEHF
jbDqrd+mipgNriRVHjDsWdUHCUNtyhB7YEBCMrj3mY0rRFI7FlQQf4lOwYGoHiKP
4JZg4kwC37997lFXl1uabGj3DmJLtxKL2/D15zCH/uLe+2EDzWznP6NVdFT3WK0P
YKZQCWT19PWSsLoBRPutWxkmop4AYvkqE0a6vXUlJlFYZK3Bbytx6/179uWKfiX5
ZkKEEtx1XiDAvcp5gBb6PISurycrBY0e/bkPBnK3ES5vawMbTU5IrmWOrQ4D8yOd
z9qOVZawZ6+b2XBDgBWjQ9bM7I5R7Il1q/LglYEaFI9+wHUnlUdDSm6ft5/5BiCZ
fqgkh5Bj2iEsajbSsacoljMOpxpYPqj63mqc+7fAGXF34V+B+9U1bpt8kCbMKowf
7Abb7IuiCR6vLDapjP6VqTMvdQ4O466OEAN83ULGFTdmMqYYH4AxaIwc+xcAk/aP
RNq7/RHhh4FRynRAj9fCkGlF3ArnM88gLINwWuEQq4SClWGcvdw7eaHpwWo77c4g
iccCnTLqSIg5pDVu07AQzzBlW6KulWxh5o72x+Xx+EXWdYUDHQ1SlNs11bSNUBV1
5MkrzY2GuD+NFEjsXJEDIPOr40mQOyJCXnxq8nXPsz/hD9kHeJPvWn3J3eVKyb5B
Z6/knNqM0BDn3SaYR/rD
=YFiQ
-----END PGP SIGNATURE-----
Merge tag 'drm-next-2019-05-09' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
"This has two exciting community drivers for ARM Mali accelerators.
Since ARM has never been open source friendly on the GPU side of the
house, the community has had to create open source drivers for the
Mali GPUs. Lima covers the older t4xx and panfrost the newer 6xx/7xx
series. Well done to all involved and hopefully this will help ARM
head in the right direction.
There is also now the ability if you don't have any of the legacy
drivers enabled (pre-KMS) to remove all the pre-KMS support code from
the core drm, this saves 10% or so in codesize on my machine.
i915 also enable Icelake/Elkhart Lake Gen11 GPUs by default, vboxvideo
moves out of staging.
There are also some rcar-du patches which crossover with media tree
but all should be acked by Mauro.
Summary:
uapi changes:
- Colorspace connector property
- fourcc - new YUV formts
- timeline sync objects initially merged
- expose FB_DAMAGE_CLIPS to atomic userspace
new drivers:
- vboxvideo: moved out of staging
- aspeed: ASPEED SoC BMC chip display support
- lima: ARM Mali4xx GPU acceleration driver support
- panfrost: ARM Mali6xx/7xx Midgard/Bitfrost acceleration driver support
core:
- component helper docs
- unplugging fixes
- devm device init
- MIPI/DSI rate control
- shmem backed gem objects
- connector, display_info, edid_quirks cleanups
- dma_buf fence chain support
- 64-bit dma-fence seqno comparison fixes
- move initial fb config code to core
- gem fence array helpers for Lima
- ability to remove legacy support code if no drivers requires it (removes 10% of drm.ko size)
- lease fixes
ttm:
- unified DRM_FILE_PAGE_OFFSET handling
- Account for kernel allocations in kernel zone only
panel:
- OSD070T1718-19TS panel support
- panel-tpo-td028ttec1 backlight support
- Ronbo RB070D30 MIPI/DSI
- Feiyang FY07024DI26A30-D MIPI-DSI panel
- Rocktech jh057n00900 MIPI-DSI panel
i915:
- Comet Lake (Gen9) PCI IDs
- Updated Icelake PCI IDs
- Elkhartlake (Gen11) support
- DP MST property addtions
- plane and watermark fixes
- Icelake port sync and VEBOX disable fixes
- struct_mutex usage reduction
- Icelake gamma fix
- GuC reset fixes
- make mmap more asynchronous
- sound display power well race fixes
- DDI/MIPI-DSI clocks for Icelake
- Icelake RPS frequency changing support
- Icelake workarounds
amdgpu:
- Use HMM for userptr
- vega20 experimental smu11 support
- RAS support for vega20
- BACO support for vega12 + fixes for vega20
- reworked IH interrupt handling
- amdkfd RAS support
- Freesync improvements
- initial timeline sync object support
- DC Z ordering fixes
- NV12 planes support
- colorspace properties for planes=
- eDP opts if eDP already initialized
nouveau:
- misc fixes
etnaviv:
- misc fixes
msm:
- GPU zap shader support expansion
- robustness ABI addition
exynos:
- Logging cleanups
tegra:
- Shared reset fix
- CPU cache maintenance fix
cirrus:
- driver rewritten using simple helpers
meson:
- G12A support
vmwgfx:
- Resource dirtying management improvements
- Userspace logging improvements
virtio:
- PRIME fixes
rockchip:
- rk3066 hdmi support
sun4i:
- DSI burst mode support
vc4:
- load tracker to detect underflow
v3d:
- v3d v4.2 support
malidp:
- initial Mali D71 support in komeda driver
tfp410:
- omap related improvement
omapdrm:
- drm bridge/panel support
- drop some omap specific panels
rcar-du:
- Display writeback support"
* tag 'drm-next-2019-05-09' of git://anongit.freedesktop.org/drm/drm: (1507 commits)
drm/msm/a6xx: No zap shader is not an error
drm/cma-helper: Fix drm_gem_cma_free_object()
drm: Fix timestamp docs for variable refresh properties.
drm/komeda: Mark the local functions as static
drm/komeda: Fixed warning: Function parameter or member not described
drm/komeda: Expose bus_width to Komeda-CORE
drm/komeda: Add sysfs attribute: core_id and config_id
drm: add non-desktop quirk for Valve HMDs
drm/panfrost: Show stored feature registers
drm/panfrost: Don't scream about deferred probe
drm/panfrost: Disable PM on probe failure
drm/panfrost: Set DMA masks earlier
drm/panfrost: Add sanity checks to submit IOCTL
drm/etnaviv: initialize idle mask before querying the HW db
drm: introduce a capability flag for syncobj timeline support
drm: report consistent errors when checking syncobj capibility
drm/nouveau/nouveau: forward error generated while resuming objects tree
drm/nouveau/fb/ramgk104: fix spelling mistake "sucessfully" -> "successfully"
drm/nouveau/i2c: Disable i2c bus access after ->fini()
drm/nouveau: Remove duplicate ACPI_VIDEO_NOTIFY_PROBE definition
...
- More panfrost fixes that went directly in -misc-next-fixes (various)
- Fix searchpaths during build (Masahiro)
- msm patch to fix the driver for chips without zap shader (Rob)
- Fix freeing imported buffers in drm_gem_cma_free_object() (Noralf)
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Rob Clark <robdclark@chromium.org>
Cc: Noralf Trønnes <noralf@tronnes.org>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEfxcpfMSgdnQMs+QqlvcN/ahKBwoFAlzTQGwACgkQlvcN/ahK
BwplmQf+KUKGSEgeIDS3u3rVOcU5QhoOTL+hWP90hNcNlKdU3W1P6fuCTSr9526C
s1KLRC6usEWPlk3cWtxt2WHGl6du0nvVBfQ8bYHQJOM14/kH078hkFiSF/MAzwxX
v5ToJcP7J6XRkj9M2uuA3TF5luDdfLh7qX1PCOIGzvH7Fs677QW/Umsm53CDDYM7
UHgQChSHAz/q4TMmvHtRr72VhPbqfsFJ8Z7gvfjn7ZpO3DJyo1S0hLq6h3wlkAKz
ShgnOFQ1oCTPWUE4PgzkHZmImU2+TDhIR/KpcM7ohomSGiqQSCnybNXBDbKWNhFt
FzUVJoOP98WxUor6b7a8j67ZQPtVjw==
=FGzZ
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-next-fixes-2019-05-08' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
- A handful of fixes from -next that just missed feature freeze
- More panfrost fixes that went directly in -misc-next-fixes (various)
- Fix searchpaths during build (Masahiro)
- msm patch to fix the driver for chips without zap shader (Rob)
- Fix freeing imported buffers in drm_gem_cma_free_object() (Noralf)
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Rob Clark <robdclark@chromium.org>
Cc: Noralf Trønnes <noralf@tronnes.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Sean Paul <sean@poorly.run>
Link: https://patchwork.freedesktop.org/patch/msgid/20190508205153.GA91135@art_vandelay
Currently there is an underlying assumption that i915_request_unsubmit()
is synchronous wrt the GPU -- that is the request is no longer in flight
as we remove it. In the near future that may change, and this may upset
our signaling as we can process an interrupt for that request while it
is no longer in flight.
CPU0 CPU1
intel_engine_breadcrumbs_irq
(queue request completion)
i915_request_cancel_signaling
... ...
i915_request_enable_signaling
dma_fence_signal
Hence in the time it took us to drop the lock to signal the request, a
preemption event may have occurred and re-queued the request. In the
process, that request would have seen I915_FENCE_FLAG_SIGNAL clear and
so reused the rq->signal_link that was in use on CPU0, leading to bad
pointer chasing in intel_engine_breadcrumbs_irq.
A related issue was that if someone started listening for a signal on a
completed but no longer in-flight request, we missed the opportunity to
immediately signal that request.
Furthermore, as intel_contexts may be immediately released during
request retirement, in order to be entirely sure that
intel_engine_breadcrumbs_irq may no longer dereference the intel_context
(ce->signals and ce->signal_link), we must wait for irq spinlock.
In order to prevent the race, we use a bit in the fence.flags to signal
the transfer onto the signal list inside intel_engine_breadcrumbs_irq.
For simplicity, we use the DMA_FENCE_FLAG_SIGNALED_BIT as it then
quickly signals to any outside observer that the fence is indeed signaled.
v2: Sketch out potential dma-fence API for manual signaling
v3: And the test_and_set_bit()
Fixes: 52c0fdb25c ("drm/i915: Replace global breadcrumbs with per-context interrupt tracking")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190508112452.18942-1-chris@chris-wilson.co.uk
After realising we need to sample RING_START to detect context switches
from preemption events that do not allow for the seqno to advance, we
can also realise that the seqno itself is just a distance along the ring
and so can be replaced by sampling RING_HEAD.
v2: Bonus comment for the mystery separate CS_STALL before MI_USER_INTERRUPT
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190508080704.24223-1-chris@chris-wilson.co.uk
If the HW fails to ack a change in forcewake status, the machine is as
good as dead -- it may recover, but in reality it missed the mmio
updates and is now in a very inconsistent state. If it happens, we can't
trust the CI results (or at least the fails may be genuine but due to
the HW being dead and not the actual test!) so reboot the machine (CI
checks for a kernel taint in between each test and reboots if the
machine is tainted).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190508115245.27790-1-chris@chris-wilson.co.uk
Pull vfs 'struct file' related updates from Al Viro:
"A bit more of 'this fget() would be better off as fdget()'
whack-a-mole + a couple of ->f_count-related cleanups"
* 'work.file' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
media: switch to fdget()
drm_syncobj: switch to fdget()
amdgpu: switch to fdget()
don't open-code file_count()
fs: drop unused fput_atomic definition
There is a bug in hdmi_deep_color_possible() - we compare pipe_bpp
<= 8*3 which returns true every time for hdmi_deep_color_possible 12 bit
deep color mode test in intel_hdmi_compute_config().(Even when the
requested color mode is 10 bit through max bpc property)
Comparing pipe_bpp with bpc * 3 takes care of this condition where
requested max bpc is 10 bit, so hdmi_deep_color_possible with 12 bit
returns false when requested max bpc is 10.(Ville)
v2:Add suggested by Ville Syrjälä
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Aditya Swarup <aditya.swarup@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Manasi Navare <manasi.d.navare@intel.com>
Cc: Clinton Taylor <Clinton.A.Taylor@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190507181856.16091-1-aditya.swarup@intel.com
For us KBP is 100% identical to SPT. Kill the redundant enum
value. Also bspec doesn't talk about KBP either, so this might
avoid some confusion when cross checking the code against the
spec.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190506152627.20283-1-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
If we couple the scheduler more tightly with the execlists policy, we
can apply the preemption policy to the question of whether we need to
kick the tasklet at all for this priority bump.
v2: Rephrase it as a core i915 policy and not an execlists foible.
v3: Pull the kick together.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190507122544.12698-1-chris@chris-wilson.co.uk
If the user is racing a call to debugfs/i915_drop_caches with ongoing
submission from another thread/process, we may never end up idling the
GPU and be uninterruptibly spinning in debugfs/i915_drop_caches trying
to catch an idle moment.
Just flush the work once, that should be enough to park the system under
correct conditions. Outside of those we either have a driver bug or the
user is racing themselves. Sadly, because the user may be provoking the
unwanted situation we can't put a warn here to attract attention to a
probable bug.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190507121108.18377-4-chris@chris-wilson.co.uk
Replace the racy continuation check within retire_work with a definite
kill-switch on idling. The race was being exposed by gem_concurrent_blit
where the retire_worker would be terminated too early leaving us
spinning in debugfs/i915_drop_caches with nothing flushing the
retirement queue.
Although that the igt is trying to idle from one child while submitting
from another may be a contributing factor as to why it runs so slowly...
v2: Use the non-sync version of cancel_delayed_work(), we only need to
stop it from being scheduled as we independently check whether now is
the right time to be parking.
Testcase: igt/gem_concurrent_blit
Fixes: 79ffac8599 ("drm/i915: Invert the GEM wakeref hierarchy")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190507121108.18377-3-chris@chris-wilson.co.uk
The original intent for the delay before running the idle_work was to
provide a hysteresis to avoid ping-ponging the device runtime-pm. Since
then we have also pulled in some memory management and general device
management for parking. But with the inversion of the wakeref handling,
GEM is no longer responsible for the wakeref and by the time we call the
idle_work, the device is asleep. It seems appropriate now to drop the
delay and just run the worker immediately to flush the cached GEM state
before sleeping.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190507121108.18377-2-chris@chris-wilson.co.uk
To complete the idle worker, we must complete 2 passes of wait-for-idle.
At the end of the first pass, we queue a switch-to-kernel-context and
may only idle after waiting for its completion. Speed up the flush_work
by doing the wait explicitly, which then allows us to remove the
unbounded loop trying to complete the flush_work in the next patch.
References: 79ffac8599 ("drm/i915: Invert the GEM wakeref hierarchy")
Testcase: igt/gem_ppgtt/flind-and-close-vma-leak
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190507121108.18377-1-chris@chris-wilson.co.uk
v2: Fix commit msg to reflect why issue occurs(Jani)
Set GCP_COLOR_INDICATION only when we set 10/12 bit deep color.
Changing settings from 10/12 bit deep color to 8 bit(& vice versa)
doesn't work correctly using xrandr max bpc property. When we
connect a monitor which supports deep color, the highest deep color
setting is selected; which sets GCP_COLOR_INDICATION. When we change
the setting to 8 bit color, we still set GCP_COLOR_INDICATION which
doesn't allow the switch back to 8 bit color.
v3,4: Add comments & drop changes in intel_hdmi_compute_config(Ville)
Since HSW+, GCP_COLOR_INDICATION is not required for 8bpc.
Drop the changes in intel_hdmi_compute_config as desired_bpp
is needed to change values for pipe_bpp based on bw_constrained flag.
v5: Fix missing logical && in condition for setting GCP_COLOR_INDICATION.
v6: Fix comment formatting (Ville)
v7: Add reviewed by Ville
v8: Set GCP_COLOR_INDICATION based on spec:
For Gen 7.5 or later platforms, indicate color depth only for deep
color modes. Bspec: 8135,7751,50524
Pre DDI platforms, indicate color depth if deep color is supported
by sink. Bspec: 7854
Exception: CHERRYVIEW behaves like Pre DDI platforms.
Bspec: 15975
Check pipe_bpp is less than bpp * 3 in hdmi_deep_color_possible,
to not set 12 bit deep color for every modeset. This fixes the issue
where 12 bit color was selected even when user selected 10 bit.(Ville)
v9: Maintain a consistent behavior for all platforms and support
GCP_COLOR_INDICATION only when we are in deep color mode. Remove
hdmi_sink_is_deep_color() - no longer needed as checking pipe_bpp > 24
takes care of the deep color mode scenario.
Separate patch for fixing switch from 12 bit to 10 bit deep color
mode.
Co-developed-by: Aditya Swarup <aditya.swarup@intel.com>
Signed-off-by: Clinton Taylor <Clinton.A.Taylor@intel.com>
Signed-off-by: Aditya Swarup <aditya.swarup@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Manasi Navare <manasi.d.navare@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190429230811.9983-1-aditya.swarup@intel.com
Due to the asynchronous tasklet and recursive GT wakeref, it may happen
that we submit to the engine (underneath it's own wakeref) prior to the
central wakeref being marked as taken. Switch to checking the local wakeref
for greater consistency.
Fixes: 79ffac8599 ("drm/i915: Invert the GEM wakeref hierarchy")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190503115225.30831-3-chris@chris-wilson.co.uk
The counter goes to zero at the start of the parking cycle, but the
wakeref itself is held until the end. Likewise, the counter becomes one
at the end of the unparking, but the wakeref is taken first. If we check
the wakeref instead of the counter, we include the unpark/unparking time
as intel_wakeref_is_active(), and do not spuriously declare inactive if
we fail to park (i.e. the parking and wakeref drop is postponed).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190503115225.30831-2-chris@chris-wilson.co.uk
Inside the signal handler, we expect the requests to be ordered by their
breadcrumb such that no later request may be complete if we find an
earlier incomplete. Add an assert to check that the next breadcrumb
should not be logically before the current.
v2: Move the overhanging line into its own function and reuse it after
doing the insertion.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190503152214.26517-1-chris@chris-wilson.co.uk
Acquiring the signaler's timeline takes an active reference to their
HWSP that we would like to avoid if possible, so take it after
performing all of our allocations required to set up the fencing. The
acquisition also provides the final check that the target has not
already signaled allowing us to avoid the semaphore at the last moment.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190503140239.32668-1-chris@chris-wilson.co.uk
Asking the GPU to busywait on a memory address, perhaps not unexpectedly
in hindsight for a shared system, leads to bus contention that affects
CPU programs trying to concurrently access memory. This can manifest as
a drop in transcode throughput on highly over-saturated workloads.
The only clue offered by perf, is that the bus-cycles (perf stat -e
bus-cycles) jumped by 50% when enabling semaphores. This corresponds
with extra CPU active cycles being attributed to intel_idle's mwait.
This patch introduces a heuristic to try and detect when more than one
client is submitting to the GPU pushing it into an oversaturated state.
As we already keep track of when the semaphores are signaled, we can
inspect their state on submitting the busywait batch and if we planned
to use a semaphore but were too late, conclude that the GPU is
overloaded and not try to use semaphores in future requests. In
practice, this means we optimistically try to use semaphores for the
first frame of a transcode job split over multiple engines, and fail if
there are multiple clients active and continue not to use semaphores for
the subsequent frames in the sequence. Periodically, we try to
optimistically switch semaphores back on whenever the client waits to
catch up with the transcode results.
With 1 client, on Broxton J3455, with the relative fps normalized by %cpu:
x no semaphores
+ drm-tip
* patched
+------------------------------------------------------------------------+
| * |
| *+ |
| **+ |
| **+ x |
| x * +**+ x |
| x x * * +***x xx |
| x x * * *+***x *x |
| x x* + * * *****x *x x |
| + x xx+x* + *** * ********* x * |
| + x xx+x* * *** +** ********* xx * |
| * + ++++* + x*x****+*+* ***+*************+x* * |
|*+ +** *+ + +* + *++****** *xxx**********x***+*****************+*++ *|
| |__________A_____M_____| |
| |_______________A____M_________| |
| |____________A___M________| |
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 120 2.60475 3.50941 3.31123 3.2143953 0.21117399
+ 120 2.3826 3.57077 3.25101 3.1414161 0.28146407
Difference at 95.0% confidence
-0.0729792 +/- 0.0629585
-2.27039% +/- 1.95864%
(Student's t, pooled s = 0.248814)
* 120 2.35536 3.66713 3.2849 3.2059917 0.24618565
No difference proven at 95.0% confidence
With 10 clients over-saturating the pipeline:
x no semaphores
+ drm-tip
* patched
+------------------------------------------------------------------------+
| ++ ** |
| ++ ** |
| ++ ** |
| ++ ** |
| ++ xx *** |
| ++ xx *** |
| ++ xxx*** |
| ++ xxx*** |
| +++ xxx*** |
| +++ xx**** |
| +++ xx**** |
| +++ xx**** |
| +++ xx**** |
| ++++ xx**** |
| +++++ xx**** |
| +++++ x x****** |
| ++++++ xxx******* |
| ++++++ xxx******* |
| ++++++ xxx******* |
| ++++++ xx******** |
| ++++++ xxxx******** |
| ++++++ xxxx******** |
| ++++++++ xxxxx********* |
|+ + + + ++++++++ xxx*xx**********x* *|
| |__A__| |
| |__AM__| |
| |__A_| |
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 120 2.47855 2.8972 2.72376 2.7193402 0.074604933
+ 120 1.17367 1.77459 1.71977 1.6966782 0.085850697
Difference at 95.0% confidence
-1.02266 +/- 0.0203502
-37.607% +/- 0.748352%
(Student's t, pooled s = 0.0804246)
* 120 2.57868 3.00821 2.80142 2.7923878 0.058646477
Difference at 95.0% confidence
0.0730476 +/- 0.0169791
2.68622% +/- 0.624383%
(Student's t, pooled s = 0.0671018)
Indicating that we've recovered the regression from enabling semaphores
on this saturated setup, with a hint towards an overall improvement.
Very similar, but of smaller magnitude, results are observed on both
Skylake(gt2) and Kabylake(gt4). This may be due to the reduced impact of
bus-cycles, where we see a 50% hit on Broxton, it is only 10% on the big
core, in this particular test.
One observation to make here is that for a greedy client trying to
maximise its own throughput, using semaphores is the right choice. It is
only the holistic system-wide view that semaphores of one client
impacts another and reduces the overall throughput where we would choose
to disable semaphores.
The most noticeable negactive impact this has is on the no-op
microbenchmarks, which are also very notable for having no cpu bus load.
In particular, this increases the runtime and energy consumption of
gem_exec_whisper.
Fixes: e886196469 ("drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190504070707.30902-1-chris@chris-wilson.co.uk
(cherry picked from commit ca6e56f654)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Currently we submit the semaphore busywait as soon as the signaler is
submitted to HW. However, we may submit the signaler as the tail of a
batch of requests, and even not as the first context in the HW list,
i.e. the busywait may start spinning far in advance of the signaler even
starting.
If we wait until the request before the signaler is completed before
submitting the busywait, we prevent the busywait from starting too
early, if the signaler is not first in submission port.
To handle the case where the signaler is at the start of the second (or
later) submission port, we will need to delay the execution callback
until we know the context is promoted to port0. A challenge for later.
Fixes: e886196469 ("drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190501114541.10077-9-chris@chris-wilson.co.uk
(cherry picked from commit 0d90ccb702)
[Joonas: edited Fixes: tag into single line.]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Remove mmiowb() from the kernel memory barrier API and instead, for
architectures that need it, hide the barrier inside spin_unlock() when
MMIO has been performed inside the critical section.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAlzMFaUACgkQt6xw3ITB
YzRICQgAiv7wF/yIbBhDOmCNCAKDO59chvFQWxXWdGk/aAB56kwKAMXJgLOvlMG/
VRuuLyParTFQETC3jaxKgnO/1hb+PZLDt2Q2KqixtjIzBypKUPWvK2sf6THhSRF1
GK0DBVUd1rCrWrR815+SPb8el4xXtdBzvAVB+Fx35PXVNpdRdqCkK+EQ6UnXGokm
rXXHbnfsnquBDtmb4CR4r2beH+aNElXbdt0Kj8VcE5J7f7jTdW3z6Q9WFRvdKmK7
yrsxXXB2w/EsWXOwFp0SLTV5+fgeGgTvv8uLjDw+SG6t0E0PebxjNAflT7dPrbYL
WecjKC9WqBxrGY+4ew6YJP70ijLBCw==
=aC8m
-----END PGP SIGNATURE-----
Merge tag 'arm64-mmiowb' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull mmiowb removal from Will Deacon:
"Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb())
Remove mmiowb() from the kernel memory barrier API and instead, for
architectures that need it, hide the barrier inside spin_unlock() when
MMIO has been performed inside the critical section.
The only relatively recent changes have been addressing review
comments on the documentation, which is in a much better shape thanks
to the efforts of Ben and Ingo.
I was initially planning to split this into two pull requests so that
you could run the coccinelle script yourself, however it's been plain
sailing in linux-next so I've just included the whole lot here to keep
things simple"
* tag 'arm64-mmiowb' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (23 commits)
docs/memory-barriers.txt: Update I/O section to be clearer about CPU vs thread
docs/memory-barriers.txt: Fix style, spacing and grammar in I/O section
arch: Remove dummy mmiowb() definitions from arch code
net/ethernet/silan/sc92031: Remove stale comment about mmiowb()
i40iw: Redefine i40iw_mmiowb() to do nothing
scsi/qla1280: Remove stale comment about mmiowb()
drivers: Remove explicit invocations of mmiowb()
drivers: Remove useless trailing comments from mmiowb() invocations
Documentation: Kill all references to mmiowb()
riscv/mmiowb: Hook up mmwiob() implementation to asm-generic code
powerpc/mmiowb: Hook up mmwiob() implementation to asm-generic code
ia64/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
mips/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
sh/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
m68k/io: Remove useless definition of mmiowb()
nds32/io: Remove useless definition of mmiowb()
x86/io: Remove useless definition of mmiowb()
arm64/io: Remove useless definition of mmiowb()
ARM/io: Remove useless definition of mmiowb()
mmiowb: Hook up mmiowb helpers to spinlocks and generic I/O accessors
...
Pull stack trace updates from Ingo Molnar:
"So Thomas looked at the stacktrace code recently and noticed a few
weirdnesses, and we all know how such stories of crummy kernel code
meeting German engineering perfection end: a 45-patch series to clean
it all up! :-)
Here's the changes in Thomas's words:
'Struct stack_trace is a sinkhole for input and output parameters
which is largely pointless for most usage sites. In fact if embedded
into other data structures it creates indirections and extra storage
overhead for no benefit.
Looking at all usage sites makes it clear that they just require an
interface which is based on a storage array. That array is either on
stack, global or embedded into some other data structure.
Some of the stack depot usage sites are outright wrong, but
fortunately the wrongness just causes more stack being used for
nothing and does not have functional impact.
Another oddity is the inconsistent termination of the stack trace
with ULONG_MAX. It's pointless as the number of entries is what
determines the length of the stored trace. In fact quite some call
sites remove the ULONG_MAX marker afterwards with or without nasty
comments about it. Not all architectures do that and those which do,
do it inconsistenly either conditional on nr_entries == 0 or
unconditionally.
The following series cleans that up by:
1) Removing the ULONG_MAX termination in the architecture code
2) Removing the ULONG_MAX fixups at the call sites
3) Providing plain storage array based interfaces for stacktrace
and stackdepot.
4) Cleaning up the mess at the callsites including some related
cleanups.
5) Removing the struct stack_trace based interfaces
This is not changing the struct stack_trace interfaces at the
architecture level, but it removes the exposure to the generic
code'"
* 'core-stacktrace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
x86/stacktrace: Use common infrastructure
stacktrace: Provide common infrastructure
lib/stackdepot: Remove obsolete functions
stacktrace: Remove obsolete functions
livepatch: Simplify stack trace retrieval
tracing: Remove the last struct stack_trace usage
tracing: Simplify stack trace retrieval
tracing: Make ftrace_trace_userstack() static and conditional
tracing: Use percpu stack trace buffer more intelligently
tracing: Simplify stacktrace retrieval in histograms
lockdep: Simplify stack trace handling
lockdep: Remove save argument from check_prev_add()
lockdep: Remove unused trace argument from print_circular_bug()
drm: Simplify stacktrace handling
dm persistent data: Simplify stack trace handling
dm bufio: Simplify stack trace retrieval
btrfs: ref-verify: Simplify stack trace retrieval
dma/debug: Simplify stracktrace retrieval
fault-inject: Simplify stacktrace retrieval
mm/page_owner: Simplify stack trace handling
...
Pull objtool updates from Ingo Molnar:
"This is a series from Peter Zijlstra that adds x86 build-time uaccess
validation of SMAP to objtool, which will detect and warn about the
following uaccess API usage bugs and weirdnesses:
- call to %s() with UACCESS enabled
- return with UACCESS enabled
- return with UACCESS disabled from a UACCESS-safe function
- recursive UACCESS enable
- redundant UACCESS disable
- UACCESS-safe disables UACCESS
As it turns out not leaking uaccess permissions outside the intended
uaccess functionality is hard when the interfaces are complex and when
such bugs are mostly dormant.
As a bonus we now also check the DF flag. We had at least one
high-profile bug in that area in the early days of Linux, and the
checking is fairly simple. The checks performed and warnings emitted
are:
- call to %s() with DF set
- return with DF set
- return with modified stack frame
- recursive STD
- redundant CLD
It's all x86-only for now, but later on this can also be used for PAN
on ARM and objtool is fairly cross-platform in principle.
While all warnings emitted by this new checking facility that got
reported to us were fixed, there might be GCC version dependent
warnings that were not reported yet - which we'll address, should they
trigger.
The warnings are non-fatal build warnings"
* 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
mm/uaccess: Use 'unsigned long' to placate UBSAN warnings on older GCC versions
x86/uaccess: Dont leak the AC flag into __put_user() argument evaluation
sched/x86_64: Don't save flags on context switch
objtool: Add Direction Flag validation
objtool: Add UACCESS validation
objtool: Fix sibling call detection
objtool: Rewrite alt->skip_orig
objtool: Add --backtrace support
objtool: Rewrite add_ignores()
objtool: Handle function aliases
objtool: Set insn->func for alternatives
x86/uaccess, kcov: Disable stack protector
x86/uaccess, ftrace: Fix ftrace_likely_update() vs. SMAP
x86/uaccess, ubsan: Fix UBSAN vs. SMAP
x86/uaccess, kasan: Fix KASAN vs SMAP
x86/smap: Ditch __stringify()
x86/uaccess: Introduce user_access_{save,restore}()
x86/uaccess, signal: Fix AC=1 bloat
x86/uaccess: Always inline user_access_begin()
x86/uaccess, xen: Suppress SMAP warnings
...
hsw_enable_pc8()/hsw_disable_pc8() are more less equivalent to
the display core init/unit functions of later platforms. Relocate
the hsw/bdw code into intel_runtime_pm.c so that it sits next to
its cousins.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190503193143.28240-2-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
Move the w/a to disable IPC on SKL closer to the actual code
that implements IPS. Otherwise I just end up confused as to
what is excluding SKL from considerations.
IMO this makes more sense anyway since the hw does have the
feature, we're just not supposed to use it.
And this also makes us actually disable IPC in case eg. the
BIOS enabled it when it shouldn't have.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190503173807.10834-3-ville.syrjala@linux.intel.com
Depends on GEN family and I915_PARAM_HAS_CONTEXT_ISOLATION, Mesa driver
will decide whether constant buffer 0 address is relative or absolute,
and load GPU initial state by lri to context mmio INSTPM (GEN8)
or 0x20D8 (>=GEN9).
Mesa Commit fa8a764b62
("i965: Use absolute addressing for constant buffer 0 on Kernel 4.16+.")
INSTPM is already added to gen8_engine_mmio_list, but 0x20D8 is missed
in gen9_engine_mmio_list. From GVT point of view, different guest could
have different context so should switch those mmio accordingly.
v2: Update fixes commit ID.
Fixes: 1786571393 ("drm/i915/gvt: vGPU context switch")
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Colin Xu <colin.xu@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
(cherry picked from commit 1e8b15a198)
Asking the GPU to busywait on a memory address, perhaps not unexpectedly
in hindsight for a shared system, leads to bus contention that affects
CPU programs trying to concurrently access memory. This can manifest as
a drop in transcode throughput on highly over-saturated workloads.
The only clue offered by perf, is that the bus-cycles (perf stat -e
bus-cycles) jumped by 50% when enabling semaphores. This corresponds
with extra CPU active cycles being attributed to intel_idle's mwait.
This patch introduces a heuristic to try and detect when more than one
client is submitting to the GPU pushing it into an oversaturated state.
As we already keep track of when the semaphores are signaled, we can
inspect their state on submitting the busywait batch and if we planned
to use a semaphore but were too late, conclude that the GPU is
overloaded and not try to use semaphores in future requests. In
practice, this means we optimistically try to use semaphores for the
first frame of a transcode job split over multiple engines, and fail if
there are multiple clients active and continue not to use semaphores for
the subsequent frames in the sequence. Periodically, we try to
optimistically switch semaphores back on whenever the client waits to
catch up with the transcode results.
With 1 client, on Broxton J3455, with the relative fps normalized by %cpu:
x no semaphores
+ drm-tip
* patched
+------------------------------------------------------------------------+
| * |
| *+ |
| **+ |
| **+ x |
| x * +**+ x |
| x x * * +***x xx |
| x x * * *+***x *x |
| x x* + * * *****x *x x |
| + x xx+x* + *** * ********* x * |
| + x xx+x* * *** +** ********* xx * |
| * + ++++* + x*x****+*+* ***+*************+x* * |
|*+ +** *+ + +* + *++****** *xxx**********x***+*****************+*++ *|
| |__________A_____M_____| |
| |_______________A____M_________| |
| |____________A___M________| |
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 120 2.60475 3.50941 3.31123 3.2143953 0.21117399
+ 120 2.3826 3.57077 3.25101 3.1414161 0.28146407
Difference at 95.0% confidence
-0.0729792 +/- 0.0629585
-2.27039% +/- 1.95864%
(Student's t, pooled s = 0.248814)
* 120 2.35536 3.66713 3.2849 3.2059917 0.24618565
No difference proven at 95.0% confidence
With 10 clients over-saturating the pipeline:
x no semaphores
+ drm-tip
* patched
+------------------------------------------------------------------------+
| ++ ** |
| ++ ** |
| ++ ** |
| ++ ** |
| ++ xx *** |
| ++ xx *** |
| ++ xxx*** |
| ++ xxx*** |
| +++ xxx*** |
| +++ xx**** |
| +++ xx**** |
| +++ xx**** |
| +++ xx**** |
| ++++ xx**** |
| +++++ xx**** |
| +++++ x x****** |
| ++++++ xxx******* |
| ++++++ xxx******* |
| ++++++ xxx******* |
| ++++++ xx******** |
| ++++++ xxxx******** |
| ++++++ xxxx******** |
| ++++++++ xxxxx********* |
|+ + + + ++++++++ xxx*xx**********x* *|
| |__A__| |
| |__AM__| |
| |__A_| |
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 120 2.47855 2.8972 2.72376 2.7193402 0.074604933
+ 120 1.17367 1.77459 1.71977 1.6966782 0.085850697
Difference at 95.0% confidence
-1.02266 +/- 0.0203502
-37.607% +/- 0.748352%
(Student's t, pooled s = 0.0804246)
* 120 2.57868 3.00821 2.80142 2.7923878 0.058646477
Difference at 95.0% confidence
0.0730476 +/- 0.0169791
2.68622% +/- 0.624383%
(Student's t, pooled s = 0.0671018)
Indicating that we've recovered the regression from enabling semaphores
on this saturated setup, with a hint towards an overall improvement.
Very similar, but of smaller magnitude, results are observed on both
Skylake(gt2) and Kabylake(gt4). This may be due to the reduced impact of
bus-cycles, where we see a 50% hit on Broxton, it is only 10% on the big
core, in this particular test.
One observation to make here is that for a greedy client trying to
maximise its own throughput, using semaphores is the right choice. It is
only the holistic system-wide view that semaphores of one client
impacts another and reduces the overall throughput where we would choose
to disable semaphores.
The most noticeable negactive impact this has is on the no-op
microbenchmarks, which are also very notable for having no cpu bus load.
In particular, this increases the runtime and energy consumption of
gem_exec_whisper.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190504070707.30902-1-chris@chris-wilson.co.uk
Turns out the cursor is compatible with the pipe "HDR mode". It's
only the actual SDR planes that get entirely bypassed during
blending. So let's ignore the cursor when checking if we have
any planes active that aren't HDR compatible. This fixes the
regressions in the kms_cursor_crc and kms_plane_cursor tests.
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Shashank Sharma <shashank.sharma@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110579
Fixes: 09b25812db ("drm/i915: Enable pipe HDR mode on ICL if only HDR planes are used")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190502200607.14504-2-ville.syrjala@linux.intel.com
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
I fumbled the PIPEMISC write into the wrong place. It only gets
called for fastsets, but since value needs to be updated based on
the set of active planes it needs to be done for all plane updates.
Move it to the correct spot.
The symptoms include SDR planes never showing up if a previous
modeset/fastset left the pipe in HDR mode. This was immediately
obvious when running the kms_plane pixel format tests. Unfortunately
the test didn't realize it was scanning out pure black all the time
and declared success anyway.
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Shashank Sharma <shashank.sharma@intel.com>
Fixes: 09b25812db ("drm/i915: Enable pipe HDR mode on ICL if only HDR planes are used")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190502200607.14504-1-ville.syrjala@linux.intel.com
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Currently we submit the semaphore busywait as soon as the signaler is
submitted to HW. However, we may submit the signaler as the tail of a
batch of requests, and even not as the first context in the HW list,
i.e. the busywait may start spinning far in advance of the signaler even
starting.
If we wait until the request before the signaler is completed before
submitting the busywait, we prevent the busywait from starting too
early, if the signaler is not first in submission port.
To handle the case where the signaler is at the start of the second (or
later) submission port, we will need to delay the execution callback
until we know the context is promoted to port0. A challenge for later.
Fixes: e886196469 ("drm/i915: Use HW semaphores for inter-engine synchroni
sation on gen8+")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190501114541.10077-9-chris@chris-wilson.co.uk
Given sufficient preemption, we may see a busy system that doesn't
advance seqno while performing work across multiple contexts, and given
sufficient pathology not even notice a change in ACTHD. What does change
between the preempting contexts is their RING, so take note of that and
treat a change in the ring address as being an indication of forward
progress.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190501114541.10077-1-chris@chris-wilson.co.uk
Drop the check in GEM parking that the engines were already parked. The
intention here was that before we dropped the GT wakeref, we were sure
that no more interrupts could be raised -- however, we have already
dropped the wakeref by this point and the warning is no longer valid.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190502150024.16636-2-chris@chris-wilson.co.uk
Tidy up the cleanup sequence by always ensure that the tasklet is
flushed on parking (before we cleanup). The parking provides a
convenient point to ensure that the backend is truly idle.
v2: Do the full check for idleness before parking, to be sure we flush
any residual interrupt.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190503080942.30151-1-chris@chris-wilson.co.uk
We are not allowed to rpm_get() inside the runtime-suspend callback, so
split the intel_uc_suspend() into the core that assumes the caller holds
the wakeref (intel_uc_runtime_suspend), and one that acquires the wakeref
as necessary (intel_uc_suspend).
Reported-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Fixes: 79ffac8599 ("drm/i915: Invert the GEM wakeref hierarchy")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190502203009.15727-1-chris@chris-wilson.co.uk
It used to be handy that we only had a couple of headers, but over time
i915_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
While at it, rename intel_i2c.c to intel_gmbus.c and the functions to
intel_gmbus_*.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/5834b8fbbfd4ac2e3d0159e69c87f6926066f537.1556809195.git.jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
i915_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/2843b028d65e118dc40316aa84bf620a93f6c67b.1556809195.git.jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
i915_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/9bc1317a67df0b9d019eca5b36f474b76a1cad26.1556809195.git.jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
i915_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/9101a58b9f10bcf11332175e17b6e6e45f4ebd17.1556809195.git.jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
i915_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/876a1671a84c6839bcafdf276cf9c4e1da6c631c.1556809195.git.jani.nikula@intel.com
Looks like VBT contains again the wrong information about a port's TypeC
legacy vs. DP-alt/TBT-alt type. There is no further issues after we
notice this and fix it up, so tune down the WARN to be a a DRM_ERROR.
This also avoids CI tainting the kernel and stopping the test run.
v2:
- Update also code coment accordingly. (Jani)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110578
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190502101754.29219-1-imre.deak@intel.com
This step of the BSpec combo PHY port enabling is missing, so add it
now.
v2:
- Rebased on the new fixed v2 version of the helper.
v3:
- Use intel_ instead of icl_ prefix. (Jani)
Reported-by: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Madhav Chauhan <madhav.chauhan@intel.com>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190425185253.3197-2-imre.deak@intel.com
Factor out the combo PHY lane power configuration code to a separate
helper; it will be also needed by the next patch adding the same
configuration for DDI ports.
Add support for DDI ports and lane reversal as preparation for the next
patch.
The PWR_DOWN_LN_1 value is unspecified in the BSpec register description
so remove it.
v2:
- Fix up the wrong assumption that the encodings are the same for DDI
and DSI ports. (Jani)
v3:
- Use intel_ instead of icl_ prefix. (Jani)
- Add required headers to intel_combo_phy.h after the upstream header
refactoring.
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Madhav Chauhan <madhav.chauhan@intel.com>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com> (v2)
Link: https://patchwork.freedesktop.org/patch/msgid/20190425185253.3197-1-imre.deak@intel.com
s/pipe/transcoder/ when dealing with hsw+ audio registers. This
won't actually make any real difference since there is no audio
on the EDP transcoder. But this should avoid a bit of confusion
when cross checking against the spec.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190430142901.7302-2-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
We've already committed to enabling audio when intel_audio_codec_enable()
is called. We can't back out even if the ELD has turned sour in the
meantime. So just spew some debug log and plow ahead. Otherwise the
state checker gets unhappy when audio isn't enabled when it is
expected to be.
I suppose we really ought to precompute the ELD as well, but
let's just toss in a FIXME for the future.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103841
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190430142901.7302-1-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Currently due to regression CI machine displays show corrupt picture.
Problem is when CDCLK is as low as 79200, picture gets unstable, while
DSI and DE pll values were confirmed to be correct. Limiting to 158400
as agreed with Ville.
We could not come up with any better solution yet, as PLL divider values
both for MIPI(DSI PLL) and CDCLK(DE PLL) are correct, however seems that
due to some boundary conditions, when clocking is too low we get wrong
timings for DSI display. Similar workaround exists for VLV though, so
just took similar condition into use. At least that way GLK platform
will start to be usable again, with current drm-tip.
v2: Fixed commit subject as suggested.
v3: Added generic bugs(crc failures, screen not init
for GLK DSI which might be affected).
v4: Added references tag for bugs affected.
Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=109267
References: https://bugs.freedesktop.org/show_bug.cgi?id=103184
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190430125119.7478-1-stanislav.lisovskiy@intel.com
The workqueue code complains viciously if we try to queue more work onto
the queue while attampting to drain it. As we asynchronously free
objects and defer their enqueuing with RCU, it is quite tricky to
quiesce the system before attempting to drain the workqueue. Yet drain
we must to ensure that the worker is idle before unloading the module.
Give the freed object drain 3 whole passes with multiple rcu_barrier()
to give the defer freeing of several levels each protected by RCU and
needing a grace period before its parent can be freed, ultimately
resulting in a GEM object being freed after another RCU period.
A consequence is that it will make module unload even slower.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110550
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190501135753.8711-1-chris@chris-wilson.co.uk
Make the engine responsible for cleaning itself up!
This removes the i915->gt.cleanup vfunc that has been annoying the
casual reader and myself for the last several years, and helps keep a
future patch to add more cleanup tidy.
v2: Assert that engine->destroy is set after the backend starts
allocating its own state.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190501103204.18632-1-chris@chris-wilson.co.uk
The pipe has a special HDR mode with higher precision when only
HDR planes are active. Let's use it.
Curiously this fixes the kms_color gamma/degamma tests when
using a HDR plane, which is always the case unless one hacks
the test to use an SDR plane. If one does hack the test to use
an SDR plane it does pass already.
I have no actual explanation how the output after the gamma
LUT can be different between the two modes. The way the tests
are written should mean that the output should be identical
between the solid color vs. the gradient. But clearly that
somehow doesn't hold true for the HDR planes in non-HDR pipe
mode. Anyways, as long as we stick to one type of plane the
test should produce sensible results now.
v2: s/HDR_MODE/HDR_MODE_PRECISION/ (Shashank)
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190412183009.8237-2-ville.syrjala@linux.intel.com
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Tested-by: Uma Shankar <uma.shankar@intel.com>
Reviewed-by: Shashank Sharma <shashank.sharma@intel.com>
It used to be handy that we only had a couple of headers, but over time
i915_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/6aea17072684dec0b04b6831c0c0e5a134edf87e.1556540890.git.jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
intel_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87904259868782c1ad664d852b27a50c1597cfaa.1556540890.git.jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
intel_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
v2: fix sparse warnings on undeclared global functions
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190429125331.32499-1-jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
intel_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/f35fc2ba76d7dd5886d304ad690a6f9078a56ecd.1556540890.git.jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
intel_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/b2fd1b2b968aa0ce010d17e2811bc275cf9ca251.1556540890.git.jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
intel_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/76d2719b462004ec6f6f5c302ee5d3876357c599.1556540890.git.jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
intel_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/2e4fb1e67ed38870df3040bb0a1b1a58fd90cc86.1556540890.git.jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
intel_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1e2fb90dcce2063b1c464dc64aa8fa6005b62bc6.1556540890.git.jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
i915_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the header remains self-contained, and do so with minimal further
includes, using forward declarations as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/cf9b17d56489e15d82356575037432ad04712475.1556540890.git.jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
intel_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
v2: fix sparse warnings on undeclared global functions
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190429125011.10876-1-jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
intel_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/64e46278dc8dccc9c548ef453cb2ceece5367bb2.1556540890.git.jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
intel_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/2e5a386cbdcd361399e94c55d47a12352a5216c7.1556540890.git.jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
intel_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/eb23be64d04957b2cf82b79fd69cc57ed84043a4.1556540889.git.jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
intel_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/0507c5523d1f07a48e6679a04db75246ce8ba766.1556540889.git.jani.nikula@intel.com
Separate the two comments: one is a workaround and the other is a sanity
check. We could just compare != 1, but let's treat them differently due
to having different meaning.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190404230426.15837-4-lucas.demarchi@intel.com
WaEnableStateCacheRedirectToCS context workaround configures the L3 cache
to benefit 3d workloads but media has different requirements.
Remove the workaround and whitelist the register to allow any userspace
configure the behaviour to their liking.
v2:
* Remove the workaround apart from adding the whitelist.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: kevin.ma@intel.com
Cc: xiaogang.li@intel.com
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190418100634.984-1-tvrtko.ursulin@linux.intel.com
Fixes: f63c7b4880 ("drm/i915/icl: WaEnableStateCacheRedirectToCS")
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
[tursulin: Anuj reported no GPU hangs or performance regressions with old
Mesa on patched kernel.]
(cherry picked from commit 0fc2273b9a)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
WaEnableStateCacheRedirectToCS context workaround configures the L3 cache
to benefit 3d workloads but media has different requirements.
Remove the workaround and whitelist the register to allow any userspace
configure the behaviour to their liking.
v2:
* Remove the workaround apart from adding the whitelist.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: kevin.ma@intel.com
Cc: xiaogang.li@intel.com
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190418100634.984-1-tvrtko.ursulin@linux.intel.com
Fixes: f63c7b4880 ("drm/i915/icl: WaEnableStateCacheRedirectToCS")
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
[tursulin: Anuj reported no GPU hangs or performance regressions with old
Mesa on patched kernel.]
Replace the indirection through struct stack_trace by using the storage
array based interfaces.
The original code in all printing functions is really wrong. It allocates a
storage array on stack which is unused because depot_fetch_stack() does not
store anything in it. It overwrites the entries pointer in the stack_trace
struct so it points to the depot storage.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: intel-gfx@lists.freedesktop.org
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: dri-devel@lists.freedesktop.org
Cc: David Airlie <airlied@linux.ie>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: linux-mm@kvack.org
Cc: David Rientjes <rientjes@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: kasan-dev@googlegroups.com
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: iommu@lists.linux-foundation.org
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org
Cc: dm-devel@redhat.com
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: linux-arch@vger.kernel.org
Link: https://lkml.kernel.org/r/20190425094802.622094226@linutronix.de
If the context has not been used yet, it needs no barrier, and in the
process fix up the selftest in mock_contexts.
Testcase: igt/gem_ctx_clone/vm
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190429090735.326-1-chris@chris-wilson.co.uk
When I refactored the code into its own function I accidentally
misplaced the <<16 shifts for some of the registers causing us
to lose the blue channel entirely.
We should really find a way to test this...
Cc: Uma Shankar <uma.shankar@intel.com>
Fixes: d2c19b06d6 ("drm/i915: Clean up ilk/icl pipe/output CSC programming")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190425192419.24931-1-ville.syrjala@linux.intel.com
Reviewed-by: Swati Sharma <swati2.sharma@intel.com>
(cherry picked from commit d428ca17ea)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
This reverts commit f74a6d9a2c.
BXT needs to access 0x141000-0x1417ff register to obtain the dram info.
But after the snapshot range of I915_MCHBAR is refined in f74a6d9a2c,
it only initializes the range of 0x144000-0x147fff for VGPU and then
causes that the guest GPU can't get the initialized value for dram
detection on BXT.
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Having transitioned GEM over to using intel_context as its primary means
of tracking the GEM context and engine combined and using
i915_request_create(), we can move the older i915_request_alloc()
helper function into selftests/ where the remaining users are confined.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190426163336.15906-9-chris@chris-wilson.co.uk
We no longer need to track the active intel_contexts within each engine,
allowing us to drop a tricky mutex_lock from inside unpin (which may
occur inside fs_reclaim).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190426163336.15906-8-chris@chris-wilson.co.uk
We switched to a tree of per-engine HW context to accommodate the
introduction of virtual engines. However, we plan to also support
multiple instances of the same engine within the GEM context, defeating
our use of the engine as a key to looking up the HW context. Just
allocate a logical per-engine instance and always use an index into the
ctx->engines[]. Later on, this ctx->engines[] may be replaced by a user
specified map.
v2: Add for_each_gem_engine() helper to iterator within the engines lock
v3: intel_context_create_request() helper
v4: s/unsigned long/unsigned int/ 4 billion engines is quite enough.
v5: Push iterator locking to caller
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190426163336.15906-7-chris@chris-wilson.co.uk
In the next patch, we require the engine vfuncs setup prior to
initialising the pinned kernel contexts, so split the vfunc setup from
the engine initialisation and call it earlier.
v2: s/setup_xcs/setup_common/ for intel_ring_submission_setup()
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190426163336.15906-6-chris@chris-wilson.co.uk
Move the intel_context_instance() to the caller so that we can decouple
ourselves from one context instance per engine.
v2: Rename pin_lock() to lock_pinned(), hopefully that is clearer.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190426163336.15906-5-chris@chris-wilson.co.uk
Our eventual goal is to rid request construction of struct_mutex, with
the short term step of lifting the struct_mutex requirements into the
higher levels (i.e. the caller must ensure that the context is already
pinned into the GTT). In this patch, we pin GVT's shadow context upon
allocation and so keep them pinned into the GGTT for as long as the
virtual machine is alive, and so we can use the simpler request
construction path safe in the knowledge that the hard work is already
done.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190426163336.15906-1-chris@chris-wilson.co.uk
I like my functions simple, so split up the low level bits from
cherryview_load_luts() into separate functions. Also rename the
whole thing to chv_load_luts() to match the new world order.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190408121815.30142-1-ville.syrjala@linux.intel.com
Reviewed-by: Swati Sharma <swati2.sharma@intel.com>
When I refactored the code into its own function I accidentally
misplaced the <<16 shifts for some of the registers causing us
to lose the blue channel entirely.
We should really find a way to test this...
Cc: Uma Shankar <uma.shankar@intel.com>
Fixes: d2c19b06d6 ("drm/i915: Clean up ilk/icl pipe/output CSC programming")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190425192419.24931-1-ville.syrjala@linux.intel.com
Reviewed-by: Swati Sharma <swati2.sharma@intel.com>
Currently, the Kbuild core manipulates header search paths in a crazy
way [1].
To fix this mess, I want all Makefiles to add explicit $(srctree)/ to
the search paths in the srctree. Some Makefiles are already written in
that way, but not all. The goal of this work is to make the notation
consistent, and finally get rid of the gross hacks.
Having whitespaces after -I does not matter since commit 48f6e3cf5b
("kbuild: do not drop -I without parameter").
[1]: https://patchwork.kernel.org/patch/9632347/
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Reviewed-by: James Qian Wang (Arm Technology China) <james.qian.wang@arm.com>
Acked-by: Liviu Dudau <liviu.dudau@arm.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/1553859161-2628-1-git-send-email-yamada.masahiro@socionext.com
Broadwater and the rest of gen4 do support being able to saving and
reloading context specific registers between contexts, providing isolation
of the basic GPU state (as programmable by userspace). This allows
userspace to assume that the GPU retains their state from one batch to the
next, minimising the amount of state it needs to reload and manually save
across batches.
v2: CONSTANT_BUFFER woes
Running through piglit turned up an interesting issue, a GPU hang inside
the context load. The context image includes the CONSTANT_BUFFER command
that loads an address into a on-gpu buffer, and the context load was
executing that immediately. However, since it was reading from the GTT
there is no guarantee that the GTT retains the same configuration as
when the context was saved, resulting in stray reads and a GPU hang.
Having tried issuing a CONSTANT_BUFFER (to disable the command) from the
ring before saving the context to no avail, we resort to patching out
the instruction inside the context image before loading.
This does impose that gen4 always reissues CONSTANT_BUFFER commands on
each batch, but due to the use of a shared GTT that was and will remain
a requirement.
v3: ECOSKPD to the rescue
Ville found the magic bit in the ECOSKPD to disable saving and restoring
the CONSTANT_BUFFER from the context image, thereby completely avoiding
the GPU hangs from chasing invalid pointers. This appears to be the
default behaviour for gen5, and so we just need to tweak gen4 to match.
v4: Fix spelling of ECOSKPD and discover it already exists
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190419172720.5462-1-chris@chris-wilson.co.uk
Ironlake does support being able to saving and reloading context specific
registers between contexts, providing isolation of the basic GPU state
(as programmable by userspace). This allows userspace to assume that the
GPU retains their state from one batch to the next, minimising the
amount of state it needs to reload, or manually save and restore.
v2: Fix off-by-one in reading CXT_SIZE, and add a comment that the
CXT_SIZE and context-layout do not match in bspec, but the difference is
irrelevant as we overallocate the full page anyway (Ville).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190419111749.3910-2-chris@chris-wilson.co.uk
Despite what I think the prm recommends, commit f2253bd985
("drm/i915/ringbuffer: EMIT_INVALIDATE after switch context") turned out
to be a huge mistake when enabling Ironlake contexts as the GPU would
hang on either a MI_FLUSH or PIPE_CONTROL immediately following the
MI_SET_CONTEXT of an active mesa context (more vanilla contexts, e.g.
simple rendercopies with igt, do not suffer).
Ville found the following clue,
"[DevCTG+]: For the invalidate operation of the pipe control, the
following pointers are affected. The
invalidate operation affects the restore of these packets. If the pipe
control invalidate operation is completed
before the context save, the indirect pointers will not be restored from
memory.
1. Pipeline State Pointer
2. Media State Pointer
3. Constant Buffer Packet"
which suggests by us emitting the INVALIDATE prior to the MI_SET_CONTEXT,
we prevent the context-restore from chasing the dangling pointers within
the image, and explains why this likely prevents the GPU hang.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190419111749.3910-1-chris@chris-wilson.co.uk
These routines are identical except in the nature of the value parameter.
For writes it is a pure in-param, but for a read, we need an out-param.
Since they differ in a single line, merge the two routines into one.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190426081725.31217-7-chris@chris-wilson.co.uk
Since intel_sideband_read and intel_sideband_write differ by only a
couple of lines (depending on whether we feed the value in or out),
merge the two into a single common accessor.
v2: Restore vlv_flisdsi_read() lost during rebasing.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190426081725.31217-6-chris@chris-wilson.co.uk
We now have two locks for sideband access. The general one covering
sideband access across all generation, sb_lock, and a specific one
covering sideband access via the punit on vlv/chv. After lifting the
sb_lock around the punit into the callers, the pcu_lock is now redudant
and can be separated from its other use to regulate RPS (essentially
giving RPS a lock all of its own).
v2: Extract a couple of minor bug fixes.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190426081725.31217-4-chris@chris-wilson.co.uk
Lift the sideband acquisition for vlv_punit_read and vlv_punit_write
into their callers, so that we can lock the sideband once for a sequence
of operations, rather than perform the heavyweight acquisition on each
request.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190426081725.31217-3-chris@chris-wilson.co.uk
As we now employ a very heavy pm_qos around the punit access, we want to
minimise the number of synchronous requests by performing one for the
whole punit sequence rather than around individual accesses. The
sideband lock is used for this, so push the pm_qos into the sideband
lock acquisition and release, moving it from the lowlevel punit rw
routine to the callers. In the first step, we move the punit magic into
the common sideband lock so that we can acquire a bunch of ports
simultaneously, and if need be extend the workaround protection later.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190426081725.31217-2-chris@chris-wilson.co.uk
While we talk to the punit over its sideband, we need to prevent the cpu
from sleeping in order to prevent a potential machine hang.
Note that by itself, it appears that pm_qos_update_request (via
intel_idle) doesn't provide a sufficient barrier to ensure that all core
are indeed awake (out of Cstate) and that the package is awake. To do so,
we need to supplement the pm_qos with a manual ping on_each_cpu.
v2: Restrict the heavy-weight wakeup to just the ISOF_PORT_PUNIT, there
is insufficient evidence to implicate a wider problem atm. Similarly,
restrict the w/a to Valleyview, as Cherryview doesn't have an angry cadre
of users.
The working theory, courtesy of Ville and Hans, is the issue lies within
the power delivery and so is likely to be unit and board specific and
occurs when both the unit/fw require extra power at the same time as the
cpu package is changing its own power state.
References: https://bugzilla.kernel.org/show_bug.cgi?id=109051
References: https://bugs.freedesktop.org/show_bug.cgi?id=102657
References: https://bugzilla.kernel.org/show_bug.cgi?id=195255
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190426081725.31217-1-chris@chris-wilson.co.uk
to avoid enabling FEC without DSC.
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJcwNk3AAoJEPpiX2QO6xPKffsH+QEqA8FzEiwObykzVZ8zqhe1
iJn4BtqdLtACUOpcz9AfduVyos0ScClb1Ulgzo5A6Vi+811DGfK9Kc7k6PEjb/q8
HPXWOh1sFsz8tcr3tE24b+I1ijsPZGPAvmauYLv0tNS92QJOyGzM1u1O4Wt6HmjB
04h0BAs1AvO/yh6nTG31+Uca/uT0BbFWEJxyokrMDGAZqv8Ou0z3bFj7hozjNwh4
Q3mRQu+2CGjIkJbh/XCNuRJ0l/WgkKKfznThUoOxHr3ysDfwYzMTDzlYzPkVrSaN
NRe+J7IIBuCWMAHOOjhwJX5qeQGM0lsWzQ+6EsjrOquxOg/7tmoTFiq+DT/Yqik=
=1xKs
-----END PGP SIGNATURE-----
Merge tag 'drm-intel-fixes-2019-04-24' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
A fix for display lanes calculation for BXT and a protection
to avoid enabling FEC without DSC.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424215359.GA26100@intel.com
i915 has its own hw readout and doesn't use the reset helpers directly.
Still it has 2 places where it initialises the crtc_state. Fix those
by calling __drm_atomic_helper_crtc_reset().
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Not-nacked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190301125627.7285-8-maarten.lankhorst@linux.intel.com
According to gtt_type_table[] function get_next_pt_type() may returns
GTT_TYPE_INVALID in some cases. To prevent driver to try to create memory
page with invalid data type, additional check is added.
Signed-off-by: Aleksei Gimbitskii <aleksei.gimbitskii@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Cc: Colin Xu <colin.xu@intel.com>
Reviewed-by: Colin Xu <colin.xu@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
For printing the intel_vgpu->id, a buffer with fixed length is allocated
on the stack. But if vgpu->id is greater than 6 characters, the buffer
overflow will happen. Even the string of the amount of max vgpu is less
that the length buffer right now, it's better to replace sprintf() with
snprintf().
v2:
- Increase the size of the buffer. (Colin Xu)
This patch fixed the critical issue #673 reported by klocwork.
Signed-off-by: Aleksei Gimbitskii <aleksei.gimbitskii@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Cc: Colin Xu <colin.xu@intel.com>
Reviewed-by: Colin Xu <colin.xu@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
In the code the memcpy() function copied uninitialized pointer in fb_info
to dmabuf_obj->info. Later the pointer in dmabuf_obj->info will be
initialized. To make the code aligned with requirements of the klocwork
static code analyzer, the uninitialized pointer should be initialized
before memcpy().
v2:
- Initialize fb_info.obj in vgpu_get_plane_info(). (Colin Xu)
This patch fixed the critical issue #632 reported by klockwork.
Signed-off-by: Aleksei Gimbitskii <aleksei.gimbitskii@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Cc: Colin Xu <colin.xu@intel.com>
Reviewed-by: Colin Xu <colin.xu@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Typedef is not recommended in the Linux kernel.The klocwork static code
analyzer takes the enumeration as the full range of intel_gvt_gtt_type_t.
But the intel_gvt_gtt_type_t will never be used in full range. For
example, the GTT_TYPE_INVALID will never be used as an index of an array.
Remove the typedef and let the enumeration starts from zero to pass
klocwork analysis.
This patch fixed the critial issues #483, #551, #665 reported by
klockwork.
v3:
- Remove the typedef and let the enumeration starts from zero.
Signed-off-by: Aleksei Gimbitskii <aleksei.gimbitskii@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
CC: Colin Xu <colin.xu@intel.com>
Reviewed-by: Colin Xu <colin.xu@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
fb_info->size is in pages, but some function need bytes when it
is as a parameter. Such as:
a. intel_gvt_ggtt_validate_range(), according to function definition
b. vifio_device_gfx_plane_info->size, according to the comment of
its definition
So change fb_info->size into bytes.
v2: Keep fb_info->size in real size instead of assinging casted page
size(zhenyu)
v3: obj->size should be page aligned and delete redundant check(zhenyu)
Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
It was noted that we made the same mistake for VM_ID as for object
handles, whereby we ensured that we only allocated a single handle for
one ppgtt. This has the unfortunate consequence for userspace that they
need to reference count the handles to avoid destroying an active ID. If
we allow multiple handles to the same ppgtt, userspace can freely
unreference any handle they own without fear of destroying the same
handle in use elsewhere.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190425054333.27299-1-chris@chris-wilson.co.uk
In order to separate the reservation phase of building a request from
its emission phase, we need to pull some of the request alloc activities
from deep inside i915_request to the surface, GEM_EXECBUFFER.
v2: Be frivolous, use a local drm_i915_private.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190425050143.811-1-chris@chris-wilson.co.uk
In the current scheme, on submitting a request we take a single global
GEM wakeref, which trickles down to wake up all GT power domains. This
is undesirable as we would like to be able to localise our power
management to the available power domains and to remove the global GEM
operations from the heart of the driver. (The intent there is to push
global GEM decisions to the boundary as used by the GEM user interface.)
Now during request construction, each request is responsible via its
logical context to acquire a wakeref on each power domain it intends to
utilize. Currently, each request takes a wakeref on the engine(s) and
the engines themselves take a chipset wakeref. This gives us a
transition on each engine which we can extend if we want to insert more
powermangement control (such as soft rc6). The global GEM operations
that currently require a struct_mutex are reduced to listening to pm
events from the chipset GT wakeref. As we reduce the struct_mutex
requirement, these listeners should evaporate.
Perhaps the biggest immediate change is that this removes the
struct_mutex requirement around GT power management, allowing us greater
flexibility in request construction. Another important knock-on effect,
is that by tracking engine usage, we can insert a switch back to the
kernel context on that engine immediately, avoiding any extra delay or
inserting global synchronisation barriers. This makes tracking when an
engine and its associated contexts are idle much easier -- important for
when we forgo our assumed execution ordering and need idle barriers to
unpin used contexts. In the process, it means we remove a large chunk of
code whose only purpose was to switch back to the kernel context.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-5-chris@chris-wilson.co.uk
Start acquiring the logical intel_context and using that as our primary
means for request allocation. This is the initial step to allow us to
avoid requiring struct_mutex for request allocation along the
perma-pinned kernel context, but it also provides a foundation for
breaking up the complex request allocation to handle different scenarios
inside execbuf.
For the purpose of emitting a request from inside retirement (see the
next patch for engine power management), we also need to lift control
over the timeline mutex to the caller.
v2: Note that the request carries the active reference upon construction.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-4-chris@chris-wilson.co.uk
We wish to start segregating the power management into different control
domains, both with respect to the hardware and the user interface. The
first step is that at the lowest level flow of requests, we want to
process a context event (and not a global GEM operation). In this patch,
we introduce the context callbacks that in future patches will be
redirected to per-engine interfaces leading to global operations as
required.
The intent is that this will be guarded by the timeline->mutex, except
that retiring has not quite finished transitioning over from being
guarded by struct_mutex. So at the moment it is protected by
struct_mutex with a reminded to switch.
v2: Rename default handlers to intel_context_enter_engine.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-3-chris@chris-wilson.co.uk
For controlling runtime pm of the GT and engines, we would like to have
a callback to do extra work the first time we wake up and the last time
we drop the wakeref. This first/last access needs serialisation and so
we encompass a mutex with the regular intel_wakeref_t tracker.
v2: Drop the _once naming and report the errors.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc; Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-1-chris@chris-wilson.co.uk
Start partitioning off the code that talks to the hardware (GT) from the
uapi layers and move the device facing code under gt/
One casualty is s/intel_ringbuffer.h/intel_engine.h/ with the plan to
subdivide that header and body further (and split out the submission
code from the ringbuffer and logical context handling). This patch aims
to be simple motion so git can fixup inflight patches with little mess.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424174839.7141-1-chris@chris-wilson.co.uk
The RING_NONPRIV allows us to add registers to a whitelist that allows
userspace to modify them. Ideally such registers should be safe and
saved within the context such that they do not impact system behaviour
for other users. This selftest verifies that those registers we do add
are (a) then writable by userspace and (b) only affect a single client.
Opens:
- Is GEN9_SLICE_COMMON_ECO_CHICKEN1 really write-only?
v2: Remove the blatant copy-paste.
v3: Emulate userspace register writes via the batch again.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424110941.9869-1-chris@chris-wilson.co.uk
As we push for better compartmentalisation, it is more convenient to
copy the default sseu configuration from the engine into the derived
logical context, than it is to dig it out from i915->runtime_info.
v2: Use intel_sseu_from_device_info() to describe the converter
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424095134.30249-1-chris@chris-wilson.co.uk
Fix the order of lane, port parameters passed to the register macro.
Note that this was already partly fixed by commit
37fc7845df ("drm/i915: Call MG_DP_MODE() macro with the right parameters order")
While at it simplify things by using the macro directly instead of an
unnecessary redirection via an array.
v2:
- Add a note the commit message about simplifying things. (José)
Fixes: 58106b7d81 ("drm/i915: Make MG PHY macros semantically consistent")
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Aditya Swarup <aditya.swarup@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190419071026.32370-1-imre.deak@intel.com
(cherry picked from commit 9c11b12184)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
This reverts commit d179b88deb.
This commit is documented to break userspace X.org modesetting driver in certain configurations.
The X.org modesetting userspace driver is broken. No fixes are available yet. In order for this patch to be applied it either needs a config option or a workaround developed.
This has been reported a few times, saying it's a userspace problem is clearly against the regression rules.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109806
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cc: <stable@vger.kernel.org> # v3.19+
UAPI Changes:
- Document which feature flags belong to which command in virtio_gpu.h
- Make the FB_DAMAGE_CLIPS available for atomic userspace only, it's useless for legacy.
Cross-subsystem Changes:
- Add device tree bindings for lg,acx467akm-7 panel and ST-Ericsson Multi Channel Display Engine MCDE
- Add parameters to the device tree bindings for tfp410
- iommu/io-pgtable: Add ARM Mali midgard MMU page table format
- dma-buf: Only do a 64-bits seqno compare when driver explicitly asks for it, else wraparound.
- Use the 64-bits compare for dma-fence-chains
Core Changes:
- Make the fb conversion functions use __iomem dst.
- Rename drm_client_add to drm_client_register
- Move intel_fb_initial_config to core.
- Add a drm_gem_objects_lookup helper
- Add drm_gem_fence_array helpers, and use it in lima.
- Add drm_format_helper.c to kerneldoc.
Driver Changes:
- Add panfrost driver for mali midgard/bitfrost.
- Converts bochs to use the simple display type.
- Small fixes to sun4i, tinydrm, ti-fp410.
- Fid aspeed's Kconfig options.
- Make some symbols/functions static in lima, sun4i and meson.
- Add a driver for the lg,acx467akm-7 panel.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEuXvWqAysSYEJGuVH/lWMcqZwE8MFAly4N1oACgkQ/lWMcqZw
E8OJNQ//fk8+8TkupJTiBYjsAIbo4pRrWa29zQFhiqEhG2kpDESr1YjkeN3uQ+wg
R6laOUt1jMNiV/BRBUlsv95cbZ0eda3mDPyFOqni8qBoUf0o4NMgLhuLK6qPeMyN
76u/VZCwYl1AZGiyi33xMABWZ0WmRkwaGxG8gDW/pGU2klRX6T7XgZPDnXrX/jJC
xq9QUWsOKoBVXX6OtYQPiL6WslbHh97sPzzs5dljJuLOWIug6xFWarUYK6vPnzU1
HVNcu2/oS6VMCRAW+Ocf4dlcfIN7PvKL984AOcZH3SLB5qabhbjB14e10RivJGZx
O2yhqNsdF42HcvA08EnwzNvtNS9Rj/GNuw95KHEU+pKZGZ6dQo/fFivm2DoeOqub
piQlTjVqrHhpNhKg+h8Bd5jUQjx97TPy+PjFtjvCZznZpp8SK8T12yrN+MK4W1ml
vzMYSaMWiUYNbdixSQH0L90i555uMQgOXh53mKNEovPkh1SKlcsMbONuJIEphpne
jOT89O9AhtwGu4179cTHRPWpsDcWw/Uoji5wcWZkjeBWdwjwXDGXiIHePN1KAcw9
qBERJs+yVujgmAvjnJwbe78QlYB1+wgtPdvWAGR0gmu31J1WVL1ADMLFe0YQ5RIz
huXetCYJzYzv8PxWAPUkky/vUFq4EZQkEQFzGQPtrZje5sH2mko=
=7Hyr
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-next-2019-04-18' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for v5.2:
UAPI Changes:
- Document which feature flags belong to which command in virtio_gpu.h
- Make the FB_DAMAGE_CLIPS available for atomic userspace only, it's useless for legacy.
Cross-subsystem Changes:
- Add device tree bindings for lg,acx467akm-7 panel and ST-Ericsson Multi Channel Display Engine MCDE
- Add parameters to the device tree bindings for tfp410
- iommu/io-pgtable: Add ARM Mali midgard MMU page table format
- dma-buf: Only do a 64-bits seqno compare when driver explicitly asks for it, else wraparound.
- Use the 64-bits compare for dma-fence-chains
Core Changes:
- Make the fb conversion functions use __iomem dst.
- Rename drm_client_add to drm_client_register
- Move intel_fb_initial_config to core.
- Add a drm_gem_objects_lookup helper
- Add drm_gem_fence_array helpers, and use it in lima.
- Add drm_format_helper.c to kerneldoc.
Driver Changes:
- Add panfrost driver for mali midgard/bitfrost.
- Converts bochs to use the simple display type.
- Small fixes to sun4i, tinydrm, ti-fp410.
- Fid aspeed's Kconfig options.
- Make some symbols/functions static in lima, sun4i and meson.
- Add a driver for the lg,acx467akm-7 panel.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/737ad994-213d-45b5-207a-b99d795acd21@linux.intel.com
UAPI Changes:
- uAPI "Fixes:" patch for the upcoming kernel 5.1, included here too
We have an Ack from the media folks (only current user) for this
late tweak
Cross-subsystem Changes:
- ALSA: hda: Fix racy display power access (Takashi, Chris)
Driver Changes:
- DDI and MIPI-DSI clocks fixes for Icelake (Vandita)
- Fix Icelake frequency change/locking (RPS) (Mika)
- Temporarily disable ppGTT read-only bit on Icelake (Mika)
- Add missing Icelake W/As (Mika)
- Enable 12 deep CSB status FIFO on Icelake (Mika)
- Inherit more Icelake code for Elkhartlake (Bob, Jani)
- Handle catastrophic error on engine reset (Mika)
- Shortcut readiness to reset check (Mika)
- Regression fix for GEM_BUSY causing us to report a mixed uabi-class request as not busy (Chris)
- Revert back to max link rate and lane count on eDP (Jani)
- Fix pipe BPP readout for BXT/GLK DSI (Ville)
- Set DP min_bpp to 8*3 for non-RGB output formats (Ville)
- Enable coarse preemption boundaries for Gen8 (Chris)
- Do not enable FEC without DSC (Ville)
- Restore correct BXT DDI latency optim setting calculation (Ville)
- Always reset context's RING registers to avoid running workload twice during reset (Chris)
- Set GPU wedged on driver unload (Janusz)
- Consolidate two similar barries from timeline into one (Chris)
- Only reset the pinned kernel contexts on resume (Chris)
- Wakeref tracking improvements (Chris, Imre)
- Lockdep fixes for shrinker interactions (Chris)
- Bump ready tasks ahead of busywaits in prep of semaphore use (Chris)
- Huge step in splitting display code into fine grained files (Jani)
- Refactor the IRQ init/reset macros for code saving (Paulo)
- Convert IRQ initialization code to uncore MMIO access (Paulo)
- Convert workarounds code to use uncore MMIO access (Chris)
- Nuke drm_crtc_state and use intel_atomic_state instead (Manasi)
- Update SKL clock-gating WA (Radhakrishna, Ville)
- Isolate GuC reset code flow (Chris)
- Expose force_dsc_enable through debugfs (Manasi)
- Header standalone compile testing framework (Jani)
- Code cleanups to reduce driver footprint (Chris)
- PSR code fixes and cleanups (Jose)
- Sparse and kerneldoc updates (Chris)
- Suppress spurious combo PHY B warning (Vile)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190418080426.GA6409@jlahtine-desk.ger.corp.intel.com
Fixes the clock-gating issue when pipe scaling is enabled.
(Lineage #2006604312)
V2: Fix typo in headline(Chris)
Handle the non double buffered nature of the register(Ville)
V3: Fix checkpatch warning. BAT failure for V2 on gen3 looks unrelated.
V4: Split the icl and skl wa's(Ville)
V5: Split the checks for icl and skl(Ville)
V6: Correct the flipped checks in intel_pre_plane_update(Ville)
V7: Use enum for pipe and extend the WA for plane scalers(Ville)
V8: Eliminate the redundant use of pch_pfit(Ville)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Clint Taylor <clinton.a.taylor@intel.com>
Cc: Aditya Swarup <aditya.swarup@intel.com>
Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190417185901.14833-1-radhakrishna.sripada@intel.com
The spec has changed since skl_max_plane_width() was written.
Now the SKL limits are lower than what they were initially, and
GLK and ICL have different limits. Update the code to match the
spec.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190418195907.23912-1-ville.syrjala@linux.intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Fix the order of lane, port parameters passed to the register macro.
Note that this was already partly fixed by commit
37fc7845df ("drm/i915: Call MG_DP_MODE() macro with the right parameters order")
While at it simplify things by using the macro directly instead of an
unnecessary redirection via an array.
v2:
- Add a note the commit message about simplifying things. (José)
Fixes: 58106b7d81 ("drm/i915: Make MG PHY macros semantically consistent")
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Aditya Swarup <aditya.swarup@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190419071026.32370-1-imre.deak@intel.com
When we are called to relieve mempressue via the shrinker, the only way
we can make progress is either by discarding unwanted pages (those
objects that userspace has marked MADV_DONTNEED) or by reclaiming the
dirty objects via swap. As we know that is the only way to make further
progress, we can initiate the writeback as we invalidate the objects.
This means the objects we put onto the inactive anon lru list are
already marked for reclaim+writeback and so will trigger a wait upon the
writeback inside direct reclaim, greatly improving the success rate of
direct reclaim on i915 objects.
The corollary is that we may start a slow swap on opportunistic
mempressure from the likes of the compaction + migration kthreads. This
is limited by those threads only being allowed to shrink idle pages, but
also that if we reactivate the page before it is swapped out by gpu
activity, we only page the cost of repinning the page. The cost is most
felt when an object is reused after mempressure, which hopefully
excludes the latency sensitive tasks (as we are just extending the
impact of swap thrashing to them).
Apparently this is not the first time we've had this idea. Back in
commit 5537252b6b ("drm/i915: Invalidate our pages under memory
pressure") we wanted to start writeback but settled on invalidate after
Hugh Dickins warned us about a possibility of a deadlock within shmemfs
if we started writeback from shrink_slab. Looking at the callchain,
using writeback from i915_gem_shrink should be equivalent to the pageout
also employed by shrink_slab, i.e. it should not be any riskier afaict.
v2: Leave mmapings intact. At this point, the only mmapings of our
objects will be via CPU mmaps on the shmemfs filp, which are
out-of-scope for our LRU tracking. Instead leave those pages to the
inactive anon LRU page list for aging and pageout as normal.
v3: Be selective on which paths trigger writeback, in particular
excluding paths shrinking just to reclaim vm space (e.g. mmap, vmap
reapers) and avoid starting writeback on the entire process space from
within the pm freezer.
References: https://bugs.freedesktop.org/show_bug.cgi?id=108686
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Michal Hocko <mhocko@suse.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> #v1
Link: https://patchwork.freedesktop.org/patch/msgid/20190420115539.29081-1-chris@chris-wilson.co.uk
GPU reset is now available with GuC enabled, so re-enable our check that
this reset is usable from atomic context.
Signed-off-by: Fernando Pacheco <fernando.pacheco@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190419230015.18121-6-fernando.pacheco@intel.com
We have now prepared the guc reset paths to avoid taking struct_mutex, or
any other lock, and so it is now safe to re-enable.
References: fe62365f9f ("drm/i915/guc: Disable global reset")
Signed-off-by: Fernando Pacheco <fernando.pacheco@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190419230015.18121-5-fernando.pacheco@intel.com