linux

Commit Graph

Author	SHA1	Message	Date
shaoyunl	df399b0641	drm/amdgpu: XGMI pstate switch initial support Driver vote low to high pstate switch whenever there is an outstanding XGMI mapping request. Driver vote high to low pstate when all the outstanding XGMI mapping is terminated. Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-27 22:40:43 -05:00
Christian König	adc7e863f6	drm/amdgpu: use the new VM backend for clears And remove the existing code when it is unused. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-27 22:40:36 -05:00
Evan Quan	37945a3ad5	drm/amdgpu: defer cmd/fence/fw buffers destroy on hw_init failure As the cleanup jobs performed in pre_fini may still need these buffers. NULL pointer dereference will be triggered without them. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-27 22:40:29 -05:00
Evan Quan	7a3d7bf606	drm/amdgpu: add more debug friendly prompts Large piece of codes share one error prompt. That is not friendly for debugging. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-27 22:40:19 -05:00
Evan Quan	39fee32b46	drm/amdgpu: error out on mode1 reset failure The error return value should be correctly reflected. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-27 22:40:12 -05:00
Evan Quan	fed184e905	drm/amdgpu: trivial typo fix "error" was not correctly spelled. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-27 22:40:05 -05:00
xinhui pan	190211ab75	drm/amdgpu: remove per obj debugfs write there is ras_control node which can do its job. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-27 22:39:59 -05:00
xinhui pan	828cfa2909	drm/amdgpu: Fix amdgpu ras to ta enums conversion Add helpes to transalte the two enums. And it will catch bugs easily. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-27 22:39:52 -05:00
xinhui pan	9f491d731c	drm/amdgpu: use macro instead of enum for flags better to use macro. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-27 22:39:44 -05:00
xinhui pan	73aa8e1a3a	drm/amdgpu: Fix some sanity check ras context might be NULL, so move con->h_data after check !con also fix sizeof wrong type while at it. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-27 22:39:29 -05:00
Christian König	86f7bae5cf	drm/amdgpu: revert "XGMI pstate switch initial support" This reverts commit `9b638f9751`. Adding this to the mapping is complete nonsense and the whole implementation looks racy. This patch wasn't thoughtfully reviewed and should be reverted for now. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Liu, Shaoyun <Shaoyun.Liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-21 14:05:01 -05:00
Christian König	c354669583	drm/amdgpu: use the new VM backend for PTEs And remove the existing code when it is unused. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-21 13:59:28 -05:00
Christian König	e6899d5590	drm/amdgpu: use the new VM backend for PDEs And remove the existing code when it is unused. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-21 13:59:22 -05:00
Christian König	6dd09027a2	drm/amdgpu: new VM update backends Separate out all functions for SDMA and CPU based page table updates into separate backends. This way we can keep most of the complexity of those from the core VM code. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-21 13:59:15 -05:00
Christian König	802a4a484a	drm/amdgpu: reserve less memory for PDE updates Allocating 16KB was way to much, just use 2KB as a start for now. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Chunming Zhou <david1.zhou@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-21 13:59:09 -05:00
Christian König	d1e29462a0	drm/amdgpu: move and rename amdgpu_pte_update_params Move the update parameter into the VM header and rename them. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-21 13:59:00 -05:00
Christian König	072b7a0bd2	drm/amdgpu: always set and check dma addresses in the VM code Clean that up a bit and allow to always have the DMA addresses around. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Chunming Zhou <david1.zhou@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-21 13:58:54 -05:00
Christian König	2c2508029f	drm/amdgpu: remove some unused VM defines Not needed any more. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-21 13:58:40 -05:00
Huang Rui	083d022913	drm/amdgpu: add one rlc version into gfxoff blacklist RLC #53815 ucode has the noise issue on 4k playback while gfxoff enabled. Signed-off-by: Huang Rui <ray.huang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Tested-by: Tom St Denis <tom.stdenis@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-20 23:39:49 -05:00
Huang Rui	005440066f	drm/amdgpu: enable gfxoff again on raven series (v2) This patch enables gfxoff and stutter mode again, since we take more testing on raven series. For raven2 and picasso, we can enable it directly. And for raven, we need check the RLC/SMC ucode version cannot be less than #531/0x1e45. v2: add smc version checking for raven. Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (v1) Tested-by: Likun Gao <Likun.Gao@amd.com> (v2) Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-20 23:39:49 -05:00
Christian König	1d31408a4c	drm/amdgpu: use more entries for the first paging queue To aid recoverable page faults. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-20 23:39:47 -05:00
Christian König	4f8bc72fbf	drm/amdgpu: free up the first paging queue v2 We need the first paging queue to handle page faults. v2: handle any number of SDMA instances gracefully Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-20 23:39:47 -05:00
Christian König	f11a13ecaf	drm/amdgpu: re-enable retry faults Now that we have re-reoute faults to the other IH ring we can enable retries again. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-20 23:39:47 -05:00
Wentao Lou	f81e8d532a	drm/amdkfd/sriov:Put the pre and post reset in exclusive mode v2 add amdgpu_amdkfd_pre_reset and amdgpu_amdkfd_post_reset inside amdgpu_device_reset_sriov. Signed-off-by: Wentao Lou <Wentao.Lou@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-20 23:39:47 -05:00
Felix Kuehling	98ae7f98d4	drm/amdgpu: Wait for newly allocated PTs to be idle When page table are updated by the CPU, synchronize with the allocation and initialization of newly allocated page tables. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-20 23:39:47 -05:00
Philip Yang	194f87ddff	drm/amdgpu: more descriptive message if HMM not enabled If using old kernel config file, CONFIG_ZONE_DEVICE is not selected, so CONFIG_HMM and CONFIG_HMM_MIRROR is not enabled, the current driver error message "Failed to register MMU notifier" is not clear. Inform user with more descriptive message on how to fix the missing kernel config option. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109808 Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-20 23:39:47 -05:00
Philip Yang	5aeaccca30	drm/amdgpu: support userptr cross VMAs case with HMM userptr may cross two VMAs if the forked child process (not call exec after fork) malloc buffer, then free it, and then malloc larger size buf, kerenl will create new VMA adjacent to old VMA which was cloned from parent process, some pages of userptr are in the first VMA, the rest pages are in the second VMA. HMM expects range only have one VMA, loop over all VMAs in the address range, create multiple ranges to handle this case. See is_mergeable_anon_vma in mm/mmap.c for details. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-19 15:36:58 -05:00
Philip Yang	386a68e786	drm/amdkfd: support concurrent userptr update for HMM Userptr restore may have concurrent userptr invalidation after hmm_vma_fault adds the range to the hmm->ranges list, needs call hmm_vma_range_done to remove the range from hmm->ranges list first, then reschedule the restore worker. Otherwise hmm_vma_fault will add same range to the list, this will cause loop in the list because range->next point to range itself. Add function untrack_invalid_user_pages to reduce code duplication. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-19 15:36:58 -05:00
Christian König	1bd4e4ca7b	drm/amdgpu: stop evicting busy PDs/PTs Otherwise we won't be able to cleanly handle page faults. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-19 15:36:58 -05:00
Christian König	56753e73fb	drm/amdgpu: wait for VM to become idle during flush Make sure that not only the entities are flush, but that we also wait for the HW to finish all processing. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-19 15:36:58 -05:00
Christian König	3119e7f43b	drm/amdgpu: remove non-sense NULL ptr check It's a bug having a dead pointer in the IDR, silently returning is the worst we can do. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-19 15:36:58 -05:00
Christian König	04ed8459f3	drm/amdgpu: remove chash Remove the chash implementation for now since it isn't used any more. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-19 15:36:58 -05:00
Christian König	c1a8abd99d	drm/amdgpu: use ring/hash for fault handling on GMC9 v3 Further testing showed that the idea with the chash doesn't work as expected. Especially we can't predict when we can remove the entries from the hash again. So replace the chash with a ring buffer/hash mix where entries in the container age automatically based on their timestamp. v2: use ring buffer / hash mix v3: check the timeout to make sure all entries age Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> (v2) Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-19 15:36:58 -05:00
Christian König	8c65fe5fc8	drm/amdgpu: limit the number of IVs processed at once Only process a maximum of 32 IVs before writing back the RPTR. This improves hw handling when we get close to an overflow in the ring buffer. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-19 15:36:58 -05:00
Christian König	b51cd19e48	drm/amdgpu: enable IH ring 1&2 for Vega20 as well That doesn't seem to have any negative effects. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-19 15:36:58 -05:00
Christian König	1ae64cec8a	drm/amdgpu: enable IH doorbell for ring 1&2 on Vega The doorbells should already be reserved, just enable them. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-19 15:36:58 -05:00
Christian König	0133690e0d	drm/amdgpu: change Vega IH ring 1 config Disable overflow and enable full drain. This makes fault handling on ring 1 much more reliable since we don't generate back pressure any more. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-19 15:36:58 -05:00
Nicholas Kazlauskas	46846ba265	drm/amdgpu: Only clear dumb buffers if ring is enabled The buffers should be cleared when possible but we also don't want buffer creation to fail in the rare case where the ring isn't ready during the call. This could happen during some suspend/resume sequences. Cc: Christian König <ckoenig.leichtzumerken@gmail.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-19 15:36:57 -05:00
Nicholas Kazlauskas	95b1346872	drm/amdgpu: Clear VRAM for DRM dumb_create buffers The dumb_create API isn't intended for high performance rendering and it's more useful for userspace (ie. IGT) to have them precleared. The bonus here is that we also won't needlessly leak whatever was previously in VRAM, but it also probably wasn't sensitive if it was going through this API. Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-19 15:36:57 -05:00
kbuild test robot	289d513b17	drm/amdgpu: fix semicolon.cocci warnings drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:405:2-3: Unneeded semicolon drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:435:2-3: Unneeded semicolon Remove unneeded semicolon. Generated by: scripts/coccinelle/misc/semicolon.cocci CC: xinhui pan <xinhui.pan@amd.com> Signed-off-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-19 15:36:52 -05:00
xinhui pan	108c6a6309	drm/amdgpu: add new ras workflow control flags add ras post init function. Do some initialization after all IP have finished their late init. Add new member flags which will control the ras work flow. For now, vbios enable ras for us on boot. That might change in the future. So there should be a flag from vbios to tell us if ras is enabled or not on boot. Looks like there is no such info now. Other bits of the flags are reserved to control other parts of ras. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-19 15:36:52 -05:00
xinhui pan	5d0f903fe2	drm/amdgpu: let ras initialization a little noticeable add drm info output if ras initialized successfully. add ras atomfirmware sanity check. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-19 15:36:52 -05:00
xinhui pan	163def43e9	drm/amdgpu: Fix lockdep warning more gracely lockdep need a static key. Previously we set ignore bit to avoid the warning. Now call sysfs_attr_init to initialize the static key. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-and-Tested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-19 15:36:52 -05:00
xinhui pan	b076296b0f	drm/amdgpu: Fix ras debugfs data parse Unzero char is accepted by sscanf, so when data is structure but unexpectedly return error invalid; Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-19 15:36:51 -05:00
xinhui pan	5caf466a6e	drm/amdgpu: add new member hw_supported Currently, it is not clear how ras is supported. Both software and hardware can set the supported. That is confusing. Fix it by adding new member hw_supported. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-19 15:36:51 -05:00
xinhui pan	2b9505e353	drm/amdgpu: Fix warning when lockdep is enabled Set ignore bit to satisfy locpdep. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-19 15:36:51 -05:00
xinhui pan	54eb4ed607	drm/amdgpu: Fix NULL pointer when ta is missing Ta is optional, so check if ta firmware is loaded or not. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-19 15:36:51 -05:00
Evan Quan	2f3940e992	drm/amdgpu: fix ras parameter descriptions The descriptions of modinfo wrongly show two parameters for each feature(see below). This patch can fix this incorrect outputs. parm: amdgpu_ras_enable:Enable RAS features on the GPU (0 = disable, 1 = enable, -1 = auto (default)) parm: ras_enable:int parm: amdgpu_ras_mask:Mask of RAS features to enable (default 0xffffffff), only valid when ras_enable == 1 parm: ras_mask:uint Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: xinhui pan <xinhui.pan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-19 15:36:51 -05:00
xinhui pan	1febb00ecb	drm/amdgpu: export both supported and enabled ras features Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-19 15:36:51 -05:00
xinhui pan	b404ae8255	drm/amdgpu: lookup vbios table to check ecc capability Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-03-19 15:36:51 -05:00

1 2 3 4 5 ...

5264 Commits