linux/include
Emily Deng 8ee3a52e3f drm/gpu-sched: fix force APP kill hang(v4)
issue:
there are VMC page fault occurred if force APP kill during
3dmark test, the cause is in entity_fini we manually signal
all those jobs in entity's queue which confuse the sync/dep
mechanism:

1)page fault occurred in sdma's clear job which operate on
shadow buffer, and shadow buffer's Gart table is cleaned by
ttm_bo_release since the fence in its reservation was fake signaled
by entity_fini() under the case of SIGKILL received.

2)page fault occurred in gfx' job because during the lifetime
of gfx job we manually fake signal all jobs from its entity
in entity_fini(), thus the unmapping/clear PTE job depend on those
result fence is satisfied and sdma start clearing the PTE and lead
to GFX page fault.

fix:
1)should at least wait all jobs already scheduled complete in entity_fini()
if SIGKILL is the case.

2)if a fence signaled and try to clear some entity's dependency, should
set this entity guilty to prevent its job really run since the dependency
is fake signaled.

v2:
splitting drm_sched_entity_fini() into two functions:
1)The first one is does the waiting, removes the entity from the
runqueue and returns an error when the process was killed.
2)The second one then goes over the entity, install it as
completion signal for the remaining jobs and signals all jobs
with an error code.

v3:
1)Replace the fini1 and fini2 with better name
2)Call the first part before the VM teardown in
amdgpu_driver_postclose_kms() and the second part
after the VM teardown
3)Keep the original function drm_sched_entity_fini to
refine the code.

v4:
1)Rename entity->finished to entity->last_scheduled;
2)Rename drm_sched_entity_fini_job_cb() to
drm_sched_entity_kill_jobs_cb();
3)Pass NULL to drm_sched_entity_fini_job_cb() if -ENOENT;
4)Replace the type of entity->fini_status with "int";
5)Remove the check about entity->finished.

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-05-15 13:43:17 -05:00
..
acpi ACPICA: Update version to 20180105 2018-02-06 10:32:13 +01:00
asm-generic mm/vmalloc: add interfaces to free unmapped page table 2018-03-22 17:07:01 -07:00
clocksource
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2018-01-31 14:22:45 -08:00
drm drm/gpu-sched: fix force APP kill hang(v4) 2018-05-15 13:43:17 -05:00
dt-bindings MIPS changes for 4.16 2018-02-07 11:22:44 -08:00
keys
kvm KVM: arm/arm64: Reset mapped IRQs on VM reset 2018-03-14 18:29:14 +00:00
linux Linux 4.16-rc7 2018-03-28 14:30:41 +10:00
math-emu
media media: dvb: update buffer mmaped flags and frame counter 2018-02-23 11:44:08 -05:00
memory
misc powerpc updates for 4.16 2018-02-02 10:01:04 -08:00
net mac80211: add ieee80211_hw flag for QoS NDP support 2018-03-21 10:56:18 +01:00
pcmcia
ras
rdma RDMA/verbs: Remove restrack entry from XRCD structure 2018-03-19 14:10:30 -06:00
scsi SCSI fixes on 20180306 2018-03-07 10:50:15 -08:00
soc ARC fixes for 4.16-rc4 2018-03-01 14:32:23 -08:00
sound Linux 4.16-rc7 2018-03-28 14:30:41 +10:00
target target core: add device action configfs files 2018-01-16 18:05:04 -08:00
trace mmc: core: Fix tracepoint print of blk_addr and blksz 2018-03-15 11:15:22 +01:00
uapi drm/amdgpu: add new bo flag that indicates BOs don't need fallback (v2) 2018-04-11 13:08:01 -05:00
video fbdev changes for v4.16: 2018-02-07 13:10:43 -08:00
xen