Commit Graph

453 Commits

Author SHA1 Message Date
Alex Deucher 54f78a7655 drm/amdgpu: add apu flags (v2)
Add some APU flags to simplify handling of different APU
variants.  It's easier to understand the special cases
if we use names flags rather than checking device ids and
silicon revisions.

v2: rebase on latest code

Acked-by: Evan Quan <evan.quan@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-22 13:41:53 -04:00
Marek Olšák d35745bbec drm/amdgpu: apply AMDGPU_IB_FLAG_EMIT_MEM_SYNC to compute IBs too (v3)
Compute IBs need this too.

v2: split out version bump
v3: squash in emit frame count fixes

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-18 11:24:21 -04:00
Andrey Grodzovsky 2f9ce2a386 drm/amdgpu: Add mem_sync implementation for all the ASICs.
Implement the .mem_sync hook defined earlier.

v2: Rename functions

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-18 11:24:21 -04:00
Tom St Denis b0be3c3a25 drm/amd/amdgpu: add raven1 part to the gfxoff quirk list
On my raven1 system (rev c6) with VBIOS 113-RAVEN-114 GFXOFF is
not stable (resulting in large block tiling noise in some applications).

Disabling GFXOFF via the quirk list fixes the problems for me.

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-11 18:03:14 -04:00
Evan Quan 47891bf1da drm/amdgpu: drop unnecessary cancel_delayed_work_sync on PG ungate
As this is already properly handled in amdgpu_gfx_off_ctrl(). In fact,
this unnecessary cancel_delayed_work_sync may leave a small time window
for race condition and is dangerous.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-05-06 16:51:05 -04:00
Oak Zeng 25f43a3227 drm/amdgpu: Changed CU reservation golden settings
With previous golden settings, compute task can't use
reserved LDS (32K) on CU0 and CU1. On 64K LDS system,
if compute work group allocate more than 32K LDS, then
it can't be dispatched to CU0 and CU1 because of the
reservation. This enables compute task to use reserved
LDS on CU0 and CU1.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2020-05-01 10:00:09 -04:00
Huang Rui f77c9aff85 drm/amdgpu: Fix per-IB secure flag GFX hang
Since commit "Move to a per-IB secure flag (TMZ)",
we've been seeing hangs in GFX. We need to send
FRAME CONTROL stop/start back-to-back, every time
we flip the TMZ flag. That is, when we transition
from TMZ to non-TMZ we have to send a stop with
TMZ followed by a start with non-TMZ, and
similarly for transitioning from non-TMZ into TMZ.

This patch implements this, thus fixing the GFX
hang.

v1 -> v2:
As suggested by Luben, and accept part of implemetation from this patch:
- Put "secure" closed to the loop and use optimization
- Change "secure" to bool again, and move "secure == -1" out of loop.
v3: Small fixes/optimizations.

Reported-and-Tested-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28 16:20:30 -04:00
Luben Tuikov 0bb5d5b03f drm/amdgpu: Move to a per-IB secure flag (TMZ)
Move from a per-CS secure flag (TMZ) to a per-IB
secure flag.

Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28 16:20:29 -04:00
Luben Tuikov c6252390fc drm/amdgpu: implement TMZ accessor (v3)
Implement an accessor of adev->tmz.enabled. Let not
code around access it as "if (adev->tmz.enabled)"
as the organization may change. Instead...

Recruit "bool amdgpu_is_tmz(adev)" to return
exactly this Boolean value. That is, this function
is now an accessor of an already initialized and
set adev and adev->tmz.

Add "void amdgpu_gmc_tmz_set(adev)" to check and
set adev->gmc.tmz_enabled at initialization
time. After which one uses "bool
amdgpu_is_tmz(adev)" to query whether adev
supports TMZ.

Also, remove circular header file include.

v2: Remove amdgpu_tmz.[ch] as requested.
v3: Move TMZ into GMC.

Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28 16:20:29 -04:00
Huang Rui 8350361d2d drm/amdgpu: expand the context control interface with trust flag
This patch expands the context control function to support trusted flag while we
want to set command buffer in trusted mode.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28 16:20:28 -04:00
Huang Rui 155748c912 drm/amdgpu: expand the emit tmz interface with trusted flag
This patch expands the emit_tmz function to support trusted flag while we want
to set command buffer in trusted mode.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28 16:20:28 -04:00
Zheng Bin d18ba57c72 drm/amdgpu: Remove unneeded semicolon
Fixes coccicheck warning:

drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:2534:2-3: Unneeded semicolon

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zheng Bin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2020-04-27 15:51:14 -04:00
Yintian Tao 04e4e2e955 drm/amdgpu: protect ring overrun
Wait for the oldest sequence on the ring
to be signaled in order to make sure there
will be no command overrun.

v2: fix coding stype and remove abs operation
v3: remove the initialization of variable r

Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-24 11:42:11 -04:00
Yintian Tao 5420819401 drm/amdgpu: request reg_val_offs each kiq read reg
According to the current kiq read register method,
there will be race condition when using KIQ to read
register if multiple clients want to read at same time
just like the expample below:
1. client-A start to read REG-0 throguh KIQ
2. client-A poll the seqno-0
3. client-B start to read REG-1 through KIQ
4. client-B poll the seqno-1
5. the kiq complete these two read operation
6. client-A to read the register at the wb buffer and
   get REG-1 value

Therefore, use amdgpu_device_wb_get() to request reg_val_offs
for each kiq read register.

v2: fix the error remove
v3: fix the print typo
v4: remove unused variables

Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-23 15:06:41 -04:00
Christian König e09d40bdba drm/amdgpu: change how we update mmRLC_SPM_MC_CNTL
In pp_one_vf mode avoid the extra overhead and read/write the
registers without the KIQ.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Acked-by: Yintian Tao <yintian.tao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22 18:11:49 -04:00
Alex Deucher 079c72ad3a drm/amdgpu/gfx9: add gfxoff quirk
Fix screen corruption with firefox.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=207171
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22 18:11:46 -04:00
Guchun Chen ed72aa21c7 drm/amdgpu: replace DRM prefix with PCI device info for GFX RAS
Prefix RAS message printing in GFX IP with PCI device info,
which assists the debug in multiple GPU case.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13 12:02:03 -04:00
Aaron Liu ba714a56fc drm/amdgpu: unify fw_write_wait for new gfx9 asics
Make the fw_write_wait default case true since presumably all new
gfx9 asics will have updated firmware. That is using unique WAIT_REG_MEM
packet with opration=1.

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Tested-by: Aaron Liu <aaron.liu@amd.com>
Tested-by: Yuxian Dai <Yuxian.Dai@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:18 -04:00
Nirmoy Das 1c6d567bdf drm/amdgpu: rework sched_list generation
Generate HW IP's sched_list in amdgpu_ring_init() instead of
amdgpu_ctx.c. This makes amdgpu_ctx_init_compute_sched(),
ring.has_high_prio and amdgpu_ctx_init_sched() unnecessary.
This patch also stores sched_list for all HW IPs in one big
array in struct amdgpu_device which makes amdgpu_ctx_init_entity()
much more leaner.

v2:
fix a coding style issue
do not use drm hw_ip const to populate amdgpu_ring_type enum

v3:
remove ctx reference and move sched array and num_sched to a struct
use num_scheds to detect uninitialized scheduler list

v4:
use array_index_nospec for user space controlled variables
fix possible checkpatch.pl warnings

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:14 -04:00
Christian König 1675c3a24d drm/amdgpu: stop disable the scheduler during HW fini
When we stop the HW for example for GPU reset we should not stop the
front-end scheduler. Otherwise we run into intermediate failures during
command submission.

The scheduler should only be stopped in very few cases:
1. We can't get the hardware working in ring or IB test after a GPU reset.
2. The KIQ scheduler is not used in the front-end and should be disabled during GPU reset.
3. In amdgpu_ring_fini() when the driver unloads.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Test-by: Dennis Li <dennis.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:45 -04:00
Tom St Denis c76c1a4297 drm/amd/amdgpu: Include headers for PWR and SMUIO registers
Clean up the smu10, smu12, and gfx9 drivers to use headers for
registers instead of hardcoding in the C source files.

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:44 -04:00
xinhui pan c8e42d5785 drm/amdgpu: implement more ib pools (v2)
We have three ib pools, they are normal, VM, direct pools.

Any jobs which schedule IBs without dependence on gpu scheduler should
use DIRECT pool.

Any jobs schedule direct VM update IBs should use VM pool.

Any other jobs use NORMAL pool.

v2: squash in coding style fix

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:44 -04:00
Kevin Wang 987ed8e938 drm/amdgpu: fix hpd bo size calculation error
the HPD bo size calculation error.
the "mem.size" can't present actual BO size all time.

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <Christian.Koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-31 12:26:14 -04:00
Dennis Li 10cda519ef drm/amdgpu: fix the coverage issue to clear ArcVPGRs
Set ComputePGMRSRC1.VGPRS as 0x3f to clear all ArcVGPRs.

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-25 17:00:11 -04:00
Colin Ian King 8cd296082c drm: amd: fix spelling mistake "shoudn't" -> "shouldn't"
There are spelling mistakes in pr_err messages and a comment. Fix these.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-19 00:03:05 -04:00
Nathan Chancellor 931971280c drm/amdgpu: Remove unnecessary variable shadow in gfx_v9_0_rlcg_wreg
clang warns:

drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:754:6: warning: variable 'shadow'
is used uninitialized whenever 'if' condition is
false [-Wsometimes-uninitialized]
        if (offset == grbm_cntl || offset == grbm_idx)
            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:757:6: note: uninitialized use
occurs here
        if (shadow) {
            ^~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:754:2: note: remove the 'if' if
its condition is always true
        if (offset == grbm_cntl || offset == grbm_idx)
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:738:13: note: initialize the
variable 'shadow' to silence this warning
        bool shadow;
                   ^
                    = 0
1 warning generated.

shadow is only assigned in one condition and used as the condition for
another if statement; combine the two if statements and remove shadow
to make the code cleaner and resolve this warning.

Fixes: 2e0cc4d48b ("drm/amdgpu: revise RLCG access path")
Link: https://github.com/ClangBuiltLinux/linux/issues/936
Suggested-by: Joe Perches <joe@perches.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-19 00:03:05 -04:00
Monk Liu 2e0cc4d48b drm/amdgpu: revise RLCG access path
what changed:
1)provide new implementation interface for the rlcg access path
2)put SQ_CMD/SQ_IND_INDEX to GFX9 RLCG path to let debugfs's reg_op
function can access reg that need RLCG path help

now even debugfs's reg_op can used to dump wave.

tested-by: Monk Liu <monk.liu@amd.com>
tested-by: Zhou pengju <pengju.zhou@amd.com>
Signed-off-by: Zhou pengju <pengju.zhou@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-16 16:17:55 -04:00
Dennis Li 93cdb48eca drm/amdgpu: add codes to clear AccVGPR for arcturus
AccVGPRs are newly added in arcturus. Before reading these
registers, they should be initialized. Otherwise edc error
happens, when RAS is enabled.

v2: reuse the existing logical to calculate register size

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-13 11:52:36 -04:00
Hawking Zhang 06dcd7eb83 drm/amdgpu: check GFX RAS capability before reset counters
disallow the logical to be enabled on platforms that
don't support gfx ras at this stage, like sriov skus,
dgpu with legacy ras.etc

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-13 11:52:33 -04:00
Nirmoy Das 552b80d740 drm/amdgpu: remove unused functions
AMDGPU statically sets priority for compute queues
at initialization so remove all the functions
responsible for changing compute queue priority dynamically.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-09 13:51:48 -04:00
Nirmoy Das 33abcb1f5a drm/amdgpu: set compute queue priority at mqd_init
We were changing compute ring priority while rings were being used
before every job submission which is not recommended. This patch
sets compute queue priority at mqd initialization for gfx8, gfx9 and
gfx10.

Policy: make queue 0 of each pipe as high priority compute queue

High/normal priority compute sched lists are generated from set of high/normal
priority compute queues. At context creation, entity of compute queue
get a sched list from high or normal priority depending on ctx->priority

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-09 13:51:24 -04:00
Hawking Zhang 4a89ad9b39 drm/amdgpu: add reset_ras_error_count function for HDP
HDP ras error counters are dirty ones after cold reboot
Read operation is needed to reset them to 0

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-05 00:32:54 -05:00
Hawking Zhang 279375c331 drm/amdgpu: add reset_ras_error_count function for GFX
GFX ras error counters are dirty ones after cold reboot
Read operation is needed to reset them to 0

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-05 00:32:47 -05:00
Monk Liu 752c683dbb drm/amdgpu: fix IB test MCBP bug
1)for gfx IB test we shouldn't insert DE meta data

2)we should make sure IB test finished before we
send event 3 to hypervisor otherwise the IDLE from
event 3 will preempt IB test, which is not designed
as a compatible structure for MCBP

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-05 00:28:11 -05:00
Yintian Tao 1d21a84661 drm/amdgpu: clean wptr on wb when gpu recovery
The TDR will be randomly failed due to compute ring
test failure. If the compute ring wptr & 0x7ff(ring_buf_mask)
is 0x100 then after map mqd the compute ring rptr will be
synced with 0x100. And the ring test packet size is also 0x100.
Then after invocation of amdgpu_ring_commit, the cp will not
really handle the packet on the ring buffer because rptr is equal to wptr.

Signed-off-by: Yintian Tao <yttao@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Monk Liu <Monk.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-05 00:25:57 -05:00
Jacob He 460c484f24 drm/amdgpu: Initialize SPM_VMID with 0xf (v2)
SPM_VMID is a global resource, SPM access the video memory according to
SPM_VMID. The initial valude of SPM_VMID is 0 which is used by kernel.
That means UMD can overwrite the memory of VMID0 by enabling SPM, that
is really dangerous.

Initialize SPM_VMID with 0xf, it messes up other user mode process at
most.

v2: squash in indentation fix

Signed-off-by: Jacob He <jacob.he@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-28 16:59:21 -05:00
Emily Deng 89510a2737 drm/amdgpu/sriov: Use kiq to copy the gpu clock
For vega10 sriov, the register is blocked, use
copy data command to fix the issue.

v2: Rename amdgpu_kiq_read_clock to gfx_v9_0_kiq_read_clock.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-28 16:59:21 -05:00
Evan Quan 14008574a3 drm/amdgpu: drop the non-sense firmware version check on arcturus
As the firmware versions of arcturus are different from other gfx9
ASICs. And the warning("CP firmware version too old, please update!")
caused by this check can be eliminated.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-19 10:36:26 -05:00
changzhu f61f01b14d drm/amdgpu: add is_raven_kicker judgement for raven1
The rlc version of raven_kicer_rlc is different from the legacy rlc
version of raven_rlc. So it needs to add a judgement function for
raven_kicer_rlc and avoid disable GFXOFF when loading raven_kicer_rlc.

Signed-off-by: changzhu <Changfeng.Zhu@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-19 10:36:26 -05:00
Alex Deucher e5f134958d drm/amdgpu/gfx9: disable gfxoff when reading rlc clock
Otherwise we readback all ones.  Fixes rlc counter
readback while gfxoff is active.

Reviewed-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-12 16:04:40 -05:00
Guchun Chen 278628fa46 drm/amdgpu: correct comment to clear up the confusion
Former comment looks to be one intended behavior in code,
actually it's not. So correct it.

Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-11 11:51:31 -05:00
Guchun Chen ea6f0931c1 drm/amdgpu: limit GDS clearing workaround in cold boot sequence
GDS clear workaround will cause gfx failure in suspend/resume case.

[   98.679559] [drm:amdgpu_device_ip_late_init [amdgpu]] *ERROR* late_init of IP block <gfx_v9_0> failed -110
[   98.679561] PM: dpm_run_callback(): pci_pm_resume+0x0/0xa0 returns -110
[   98.679562] PM: Device 0000:03:00.0 failed to resume async: error -110

As this workaround is specific to the HW bug of GDS's ECC error
existing in cold boot up, so bypass this workaround in suspend/
resume case after booting up.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-11 11:49:16 -05:00
Joseph Greathouse 18c6b74e7c drm/amdgpu: Enable DISABLE_BARRIER_WAITCNT for Arcturus
In previous gfx9 parts, S_BARRIER shader instructions are implicitly
S_WAITCNT 0 instructions as well. This setting turns off that
mechanism in Arcturus and beyond. With this, shaders must follow the
ISA guide insofar as putting in explicit S_WAITCNT operations even
after an S_BARRIER.

v2: Fix patch title to list component

Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-30 17:15:27 -05:00
Alex Deucher 7af2a5771e drm/amdgpu: attempt to enable gfxoff on more raven1 boards (v2)
Switch to a blacklist so we can disable specific boards
that are problematic.

v2: make the blacklist non-raven specific.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-27 16:46:45 -05:00
Nirmoy Das a9d4fe2fd6 drm/amdgpu: remove unnecessary conversion to bool
Better clean that up before some automation starts to complain about it

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-22 16:55:27 -05:00
Dennis Li 4c461d89db drm/amdgpu: add RAS support for the gfx block of Arcturus
Implement functions to do the RAS error injection and
query EDC counter.

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-22 16:36:30 -05:00
Dennis Li 504c5e72d7 drm/amdgpu: abstract EDC counter clear to a separated function
1. Add IP prefix for the IP related codes.
2. Refactor the code to clear EDC counter.

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-22 16:36:14 -05:00
Dennis Li 5e66403e4d drm/amdgpu: refine the security check for RAS functions
To avoid calling RAS related functions when RAS feature isn't
supported in hardware. Change to check supported features, instead
of checking asic type.

v2: reuse amdgpu_ras_is_supported function, instead of introducing
a new flag for hardware ras feature.

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-22 16:36:04 -05:00
chen gong e3cd03603d drm/amdgpu: read gfx register using RREG32_KIQ macro
Reading CP_MEM_SLP_CNTL register with RREG32_SOC15 macro will lead to
hang when GPU is in "gfxoff" state.
I do a uniform substitution here.

Signed-off-by: chen gong <curry.gong@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-22 16:34:29 -05:00
chen gong d33a99c4b6 drm/amdgpu: provide a generic function interface for reading/writing register by KIQ
Move amdgpu_virt_kiq_rreg/amdgpu_virt_kiq_wreg function to amdgpu_gfx.c,
and rename them to amdgpu_kiq_rreg/amdgpu_kiq_wreg.Make it generic and
flexible.

Signed-off-by: chen gong <curry.gong@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-22 16:34:14 -05:00