Let's allow GCC to optimize better.
This exposed some five unused functions, but this patch doesn't remove them.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Move binding onto the ring, simplifying handling a bit.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
It was only used for dynpm, but has been replaced with
a better implementation using fences. Remove it.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
radeon_ring_restore is freeing the memory for the saved
ring data. We need to remember that, otherwise we try to
restore the ring data again on the next try. Additional
to that it shouldn't try the reset infinitely if we have
saved ring data.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
It seems some of those IGP dislike non dma32 page despite what
documentation says. Fix regression since we allowed non dma32
pages. It seems it only affect some revision of those IGP chips
as we don't know which one just force dma32 for all of them.
https://bugzilla.redhat.com/show_bug.cgi?id=785375
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Returns a snapshot of the GPU clock counter. Needed
for certain OpenGL extensions.
v2: agd5f
- address Jerome's comments
- add function documentation
Signed-off-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Adds documentation to most of the functions in
radeon_device.c
v2: split out general descriptions as per Christian's
comments.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Just store the index in the ring structure.
Idea taken from one of Jerome's wip rptr patches.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Try to save whatever is on the rings when
we encounter an lockup.
v2: Fix spelling error. Free saved ring data if reset fails.
Add documentation for the new functions.
v3: Some more spelling fixes
v4: It doesn't make sense to save anything if all fences
are signaled
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Making it easier to control when it is executed.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
GPU reset need to be exclusive, one happening at a time. For this
add a rw semaphore so that any path that trigger GPU activities
have to take the semaphore as a reader thus allowing concurency.
The GPU reset path take the semaphore as a writer ensuring that
no concurrent reset take place.
v2: init rw semaphore
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Try to remove or replace the cs_mutex with a
vm_mutex where it is still needed.
v2: fix locking order
v3: rebased on drm-next
Signed-off-by: Christian König <deathsimple@vodafone.de>
The spinlock was actually there to protect the
rptr, but rptr was read outside of the locked area.
Also we don't really need a spinlock here, an
atomic should to quite fine since we only need to
prevent it from being reentrant.
v2: Keep the spinlock....
v3: Back to an atomic again after finding & fixing the real bug.
Signed-off-by: Christian Koenig <christian.koenig@amd.com>
It is a rw_semaphore now and only write locked
while changing the clock. Also the lock is renamed
to better reflect what it is protecting.
v2: Keep the ttm_vm_ops on IGPs
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
This adds prime->fd and fd->prime support to radeon.
It passes the sg object to ttm and then populates
the gart entries using it.
Compile tested only.
v2: stub kmap + use new helpers + add reimporting
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
All radeon_gem_init() does is initialize the gem objects
list. radeon_device.c does this explicitly. r600+ calls
radeon_gem_init() so the list gets initialized twice. Older
asics don't call it at all and rely on the the init in
radeon_device.c. Just call radeon_gem_init() in radeon_device.c
and remove the explicit calls from all the newer asics.
All asics call radeon_gem_fini() in their fini pathes. That
could possibly be cleaned up too.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This changes the API as a clean-up. Instead of passing multiple
function pointers at each time, introduce a new struct holding the
whole callback functions and pass it to the registration.
The same struct will be used for the upcoming audio client
registration, too.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
It isn't necessary any more and the suballocator seems to perform
even better.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Directly use the suballocator to get small chunks of memory.
It's equally fast and doesn't crash when we encounter a GPU reset.
v2: rebased on new SA interface.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Some callers illegal called fence_wait_next/empty
while holding the ring emission mutex. So don't
relock the mutex in that cases, and move the actual
locking into the fence code.
v2: Don't try to unlock the mutex if it isn't locked.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Using 64bits fence sequence we can directly compare sequence
number to know if a fence is signaled or not. Thus the fence
list became useless, so does the fence lock that mainly
protected the fence list.
Things like ring.ready are no longer behind a lock, this should
be ok as ring.ready is initialized once and will only change
when facing lockup. Worst case is that we return an -EBUSY just
after a successfull GPU reset, or we go into wait state instead
of returning -EBUSY (thus delaying reporting -EBUSY to fence
wait caller).
v2: Remove left over comment, force using writeback on cayman and
newer, thus not having to suffer from possibly scratch reg
exhaustion
v3: Rebase on top of change to uint64 fence patch
v4: Change DCE5 test to force write back on cayman and newer but
also any APU such as PALM or SUMO family
v5: Rebase on top of new uint64 fence patch
v6: Just break if seq doesn't change any more. Use radeon_fence
prefix for all function names. Even if it's now highly optimized,
try avoiding polling to often.
v7: We should never poll the last_seq from the hardware without
waking the sleeping threads, otherwise we might lose events.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
A single global mutex for ring submissions seems sufficient.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQEcBAABAgAGBQJPpvY9AAoJEHm+PkMAQRiGpEoIAJgbu+Y8gITnBK/wh9O6zy3S
5jie5KK4YWdbJsvO58WbNr3CyVIwGIqQ2dUZLiU59aBVLarlGw8xor0MmW+cZwhp
6fBHaf0qDYAV0MZjD+mnnExOiCRyISa2lPmsfu9dAWywh5KGe6/oAP6/qcXIyok3
KZyl3qQf4ENpaZPHwZPXCEkUvtuyHgNiszN+QXEadA3s19Ot4VGe9A3VGw+GNrSm
JqFIq3acQAbKa5BYaqf7TQC02v2FI7//eqt6QHxTqbE6a7LGbTvLfX3HlJ2mnfqa
1R6QHhM4y4OZDHbaMT2raHZ8WuLXzhehJzhP8Co7AHFOKwVKOb5XbcUr2RrukMU=
=HkMd
-----END PGP SIGNATURE-----
Merge tag 'v3.4-rc6' into drm-intel-next
Conflicts:
drivers/gpu/drm/i915/intel_display.c
Ok, this is a fun story of git totally messing things up. There
/shouldn't/ be any conflict in here, because the fixes in -rc6 do only
touch functions that have not been changed in -next.
The offending commits in drm-next are 14415745b2..1fa611065 which
simply move a few functions from intel_display.c to intel_pm.c. The
problem seems to be that git diff gets completely confused:
$ git diff 14415745b2..1fa611065
is a nice mess in intel_display.c, and the diff leaks into totally
unrelated functions, whereas
$git diff --minimal 14415745b2..1fa611065
is exactly what we want.
Unfortunately there seems to be no way to teach similar smarts to the
merge diff and conflict generation code, because with the minimal diff
there really shouldn't be any conflicts. For added hilarity, every
time something in that area changes the + and - lines in the diff move
around like crazy, again resulting in new conflicts. So I fear this
mess will stay with us for a little longer (and might result in
another backmerge down the road).
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Use family rather than DCE check for clarity, also always use
wb on APUs, there will never be AGP variants.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Instead of all this humpy pumpy with recursive
mutex (which also fixes only halve of the problem)
move the actual gpu reset out of the fence code,
return -EDEADLK and then reset the gpu in the
calling ioctl function.
v2: Split removal of radeon_mutex into separate patch.
Return -EAGAIN if reset is successful.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
As discussed with Michel that name better
describes the behavior of this function.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
It makes no sense at all to have more than one flag.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This closes a race seen with kexec where we enable PCI bus mastering
but the card has been reinitialised fully yet.
This was previously fixed by a patch from Jerome, but this should
close the race completely.
v2: add SI support as suggested by Alex.
Reported-and-tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Rename the function to better match the functionality.
DCPLL became PLL0 on DCE6.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Polling the outputs when the device is suspended can result in erroneous
status updates. Disable output polling during suspend to prevent this
from happening.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
If GPU lockup is detected in ib_pool get we are holding the ib_pool
mutex that will be needed by the GPU reset code. As ib_pool code is
safe to be reentrant from GPU reset code we should not block if we
are trying to get the ib pool lock on the behalf of the same userspace
caller, thus use the radeon_mutex_lock helper.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We really only need to set it up once on init or resume
rather than on every mode set.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Virtual address space are per drm client (opener of /dev/drm).
Client are in charge of virtual address space, they need to
map bo into it by calling DRM_RADEON_GEM_VA ioctl.
First 16M of virtual address space is reserved by the kernel.
Once using 2 level page table we should be able to have a small
vram memory footprint for each pt (there would be one pt for all
gart, one for all vram and then one first level for each virtual
address space).
Plan include using the sub allocator for a common vm page table
area and using memcpy to copy vm page table in & out. Or use
a gart object and copy things in & out using dma.
v2: agd5f fixes:
- Add vram base offset for vram pages. The GPU physical address of a
vram page is FB_OFFSET + page offset. FB_OFFSET is 0 on discrete
cards and the physical bus address of the stolen memory on
integrated chips.
- VM_CONTEXT1_PROTECTION_FAULT_DEFAULT_ADDR covers all vmid's >= 1
v3: agd5f:
- integrate with the semaphore/multi-ring stuff
v4:
- rebase on top ttm dma & multi-ring stuff
- userspace is now in charge of the address space
- no more specific cs vm ioctl, instead cs ioctl has a new
chunk
v5:
- properly handle mem == NULL case from move_notify callback
- fix the vm cleanup path
v6:
- fix update of page table to only happen on valid mem placement
v7:
- add tlb flush for each vm context
- add flags to define mapping property (readable, writeable, snooped)
- make ring id implicit from ib->fence->ring, up to each asic callback
to then do ring specific scheduling if vm ib scheduling function
v8:
- add query for ib limit and kernel reserved virtual space
- rename vm->size to max_pfn (maximum number of page)
- update gem_va ioctl to also allow unmap operation
- bump kernel version to allow userspace to query for vm support
v9:
- rebuild page table only when bind and incrementaly depending
on bo referenced by cs and that have been moved
- allow virtual address space to grow
- use sa allocator for vram page table
- return invalid when querying vm limit on non cayman GPU
- dump vm fault register on lockup
v10: agd5f:
- Move the vm schedule_ib callback to a standalone function, remove
the callback and use the existing ib_execute callback for VM IBs.
v11:
- rebase on top of lastest Linus
v12: agd5f:
- remove spurious backslash
- set IB vm_id to 0 in radeon_ib_get()
v13: agd5f:
- fix handling of RADEON_CHUNK_ID_FLAGS
v14:
- fix va destruction
- fix suspend resume
- forbid bo to have several different va in same vm
v15:
- rebase
v16:
- cleanup left over of vm init/fini
v17: agd5f:
- cs checker
v18: agd5f:
- reworks the CS ioctl to better support multiple rings and
VM. Rather than adding a new chunk id for VM, just re-use the
IB chunk id and add a new flags for VM mode. Also define additional
dwords for the flags chunk id to define the what ring we want to use
(gfx, compute, uvd, etc.) and the priority.
v19:
- fix cs fini in weird case of no ib
- semi working flush fix for ni
- rebase on top of sa allocator changes
v20: agd5f:
- further CS ioctl cleanups from Christian's comments
v21: agd5f:
- integrate CS checker improvements
v22: agd5f:
- final cleanups for release, only allow VM CS on cayman
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We often end up missing fences on older asics with
writeback enabled which leads to delays in the userspace
accel code, so just disable it by default on those asics.
Reported-by: Helge Deller <deller@gmx.de>
Reported-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
This allow to share the ib pool with semaphore and avoid
having more bo around.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
That naming seems to make more sense, since we not
only want to run PM4 rings with it.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Tests syncing between all rings by using
semaphores and fences.
v2: use radeon_testing as a bit flag rather than on/off switch
this allow to test for one thing at a time (bo_move or semaphore
test). It kind of break the usage if user wheren't using 1
for bo move test but as it's a test feature i believe it's ok.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Replace cp, cp1 and cp2 members with just an array
of radeon_cp structs.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
They are used to sync between rings, while fences
sync between a ring and the cpu.
v2 Fix radeon_semaphore_driver_fini when no semaphore were
allocated.
v3 Initialize list early on to avoid issue in case or early
error
Signed-off-by: Christian König <deathsimple@vodafone.de>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
For supporting multiple CP ring buffers, async DMA
engines and UVD. We still need a way to synchronize
between engines.
v2 initialize unused fence driver ring to avoid issue in
suspend/unload
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Having registered debugfs files globally causes
the files to not show up on the second, third
etc.. card in the system.
v2: fix crash on module unloading
v3: fix space indentation
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Merge in the upstream tree to bring in the mainline fixes.
Conflicts:
drivers/gpu/drm/exynos/exynos_drm_fbdev.c
drivers/gpu/drm/nouveau/nouveau_sgdma.c
With the exception that we do not handle the AGP case. We only
deal with PCIe cards such as ATI ES1000 or HD3200 that have been
detected to only do DMA up to 32-bits.
V2 force dma32 if we fail to set bigger dma mask
V3 Rebase on top of no memory account changes (where/when is my
delorean when i need it ?)
V4 add debugfs entry is swiotlb is active not only if we are
on dma 32bits only gpu
CC: Dave Airlie <airlied@redhat.com>
CC: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
This was only the case if the GPU reset was triggered from the CS ioctl,
otherwise other processes could happily enter the CS ioctl and wreak havoc
during the GPU reset.
This is a little complicated because the GPU reset can be triggered from the
CS ioctl, in which case we're already holding the mutex, or from other call
paths, in which case we need to lock the mutex. AFAICT the mutex API doesn't
allow recursive locking or finding out the mutex owner, so we need to handle
this with helper functions which allow recursive locking from the same
process.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Lots of new (and hopefully useful) benchmark. Load the driver
with radeon_benchmark=<test_number> and enjoy. Among tests
added are VRAM to VRAM blits and blits with buffer size sweeps.
The latter can be from GTT to VRAM, VRAM to GTT, and VRAM to VRAM
and there are two types of sweeps: powers of two and (probably
more interesting) buffers sizes that correspond to common modes.
Signed-off-by: Ilija Hadzic <ihadzic@research.bell-labs.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
If a card wasn't PCIE, we always set the DMA mask to 32 bits.
This is only applies to the old rage128/r1xx gart block on
early radeon asics (~r1xx-r4xx). Newer PCI and IGP cards
can handle 40 bits just fine.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Chen Jie <chenj@lemote.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The value of RADEON_DEBUGFS_MAX_NUM_FILES has been used to
specify the size of an array, each element of which looks
like this:
struct radeon_debugfs {
struct drm_info_list *files;
unsigned num_files;
};
Consequently, the number of debugfs files may be much greater
than RADEON_DEBUGFS_MAX_NUM_FILES, something that the current
code ignores:
if ((_radeon_debugfs_count + nfiles) > RADEON_DEBUGFS_MAX_NUM_FILES) {
DRM_ERROR("Reached maximum number of debugfs files.\n");
DRM_ERROR("Report so we increase RADEON_DEBUGFS_MAX_NUM_FILES.\n");
return -EINVAL;
}
This commit fixes this make, and accordingly renames:
RADEON_DEBUGFS_MAX_NUM_FILES
to:
RADEON_DEBUGFS_MAX_COMPONENTS
Signed-off-by: Michael Witten <mfwitten@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>