Provide helper function to compute the kernel memory size needed
for each buffer object. Move all the accounting inside ttm, simplifying
driver and avoiding code duplication accross them.
v2 fix accounting of ghost object, one would have thought that i
would have run into the issue since a longtime but it seems
ghost object are rare when you have plenty of vram ;)
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
This reverts commit dfadbbdb57.
Further upstream discussion between Marek and Thomas decided this wasn't
fully baked and needed further work, so revert it before it hits mainline.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This reverts commit d3ed74027f.
Further upstream discussion between Thomas and Marek decided this needed
more work and driver specifics. So revert before it goes upstream.
Signed-off-by: Dave Airlie <airlied@redhat.com>
With this patch I'm only about 50k larger with DRM debugging
enables (why is that enabled by default?!?), and slightly
smaller without.
[airlied: moved r100.c additions to radeon_ring.c]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
... and switch it to container_of upcasting.
v2: converted new pageflip code-paths.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Unconditionally initialize the drm gem object - it's not
worth the trouble not to for the few kernel objects.
This patch only changes the place of the drm gem object,
access is still done via pointers.
v2: Uncoditionally align the size in radeon_bo_create. At
least the r600/evergreen blit code didn't to this, angering
the paranoid gem code.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
this adds a bo create, and fence seq tracking tracepoints.
This is just an initial set to play around with, we should investigate
what others we need would be useful.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Forbid allocating buffer bigger than visible VRAM or GTT, also
properly set lpfn field.
v2 - use max macro
- silence warning
v3 - don't explicitly set range limit
- use min macro
Cc: stable <stable@kernel.org>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Rather than re-implementing in the Radeon driver,
Use the execbuf / cs / pushbuf utilities that comes with TTM.
This comes with an even greater benefit now that many spinlocks have been
optimized away...
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The bo lock used only to protect the bo sync object members, and since it
is a per bo lock, fencing a buffer list will see a lot of locks and unlocks.
Replace it with a per-device lock that protects the sync object members on
*all* bos. Reading and setting these members will always be very quick, so
the risc of heavy lock contention is microscopic. Note that waiting for
sync objects will always take place outside of this lock.
The bo device fence lock will eventually be replaced with a seqlock /
rcu mechanism so we can determine that a bo is idle under a
rcu / read seqlock.
However this change will allow us to batch fencing and unreserving of
buffers with a minimal amount of locking.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jerome Glisse <j.glisse@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We were previously dropping alignment requests on the floor
when allocating buffers so we always ended up page aligned.
Certain tiling modes on 6xx+ require larger alignment which
wasn't happening before.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Cc: Jerome Glisse <j.glisse@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
If ttm_bo_init() returns failure, it already destroyed the BO, so we need to
retry from scratch.
Signed-off-by: Michel Dänzer <daenzer@vmware.com>
Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
We should not allocate any object into unmappable vram if we
have no means to access them which on all GPU means having the
CP running and on newer GPU having the blit utility working.
This patch limit the vram allocation to visible vram until
we have acceleration up and running.
Note that it's more than unlikely that we run into any issue
related to that as when acceleration is not woring userspace
should allocate any object in vram beside front buffer which
should fit in visible vram.
V2 use real_vram_size as mc_vram_size could be bigger than
the actual amount of vram
[airlied: fixup r700_cp_stop case]
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Nouveau will need this on GeForce 8 and up to account for the GPU
reordering physical VRAM for some memory types.
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Acked-by: Thomas Hellström <thellstrom@vmware.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This fixes a problem where on low VRAM cards we'd run out of space for validation.
[airlied: Tested on my M7, Thinkpad T42, compiz works with no problems.]
Signed-off-by: Michel Dänzer <daenzer@vmware.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
list reservation was too optimistic about ttm object reservation
and could think that an object reserved by some other process
as reserved by the list reservation which was false. Thus when
unreserving the list it might unreserve object that it didn't
reserved in the list. Sorry if it's hard to follow but this
kind of things are just causing headheck.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Touching vram while the card is reclocking can lead to lockups. Unmap
any pages that could be touched by the CPU and block any accesses to
vram until the reclocking is complete.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
* drm-ttm-unmappable:
drm/radeon/kms: enable use of unmappable VRAM V2
drm/ttm: remove io_ field from TTM V6
drm/vmwgfx: add support for new TTM fault callback V5
drm/nouveau/kms: add support for new TTM fault callback V5
drm/radeon/kms: add support for new fault callback V7
drm/ttm: ttm_fault callback to allow driver to handle bo placement V6
drm/ttm: split no_wait argument in 2 GPU or reserve wait
Conflicts:
drivers/gpu/drm/nouveau/nouveau_bo.c
This add the support for the new fault callback and also the
infrastructure for supporting unmappable VRAM.
V2 validate BO with no_wait = true
V3 don't derefence bo->mem.mm_node as it's not NULL only for
VRAM or GTT
V4 update to splitted no_wait ttm change
V5 update to new balanced io_mem_reserve/free change
V6 callback is responsible for iomapping memory
V7 move back iomapping to ttm
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
There is case where we want to be able to wait only for the
GPU while not waiting for other buffer to be unreserved. This
patch split the no_wait argument all the way down in the whole
ttm path so that upper level can decide on what to wait on or
not.
[airlied: squashed these 4 for bisectability reasons.]
drm/radeon/kms: update to TTM no_wait splitted argument
drm/nouveau: update to TTM no_wait splitted argument
drm/vmwgfx: update to TTM no_wait splitted argument
[vmwgfx patch: Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>]
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Acked-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This prevented radeon.test=1 from testing transfers from/to GTT beyond the
visible VRAM size.
Signed-off-by: Michel Dänzer <daenzer@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
This patch properly set visible VRAM and enforce any pinned buffer
to be into visible VRAM. We might later add a flag to release this
constraint for some newer hw more clever than previous.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Previous code did associate fence to bo before the fence was emited
and it also didn't lock protected access to ttm sync_obj member.
Both of this flaw leads to possible race between different code
path. This patch fix this by associating fence only once the fence
is emitted and properly lock protect access to sync_obj member of
ttm.
Fix:
https://bugs.freedesktop.org/show_bug.cgi?id=26438
and likely similar others bugs
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is the least invasive fix without migrating the radeon driver
to pm_ops from what I can see. We just always migrate VRAM objects
on IGPs for now and we can fix it up later to migrate depending
on STR vs STD.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This detects if the sideport memory is enabled and
if it is VRAM is evicted on suspend/resume.
This should fix s/r issues on some IGPs.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
If they are not radeon object don't do anythings special for them,
this avoid rare oops than can happen in a complex use case.
[airlied: additional fixups]
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Do as we did before rework, if no placement is supplied at bo
creation time, fallback to allocating bo from system ram. This
will fix most of the creation failed issue report we got since
the rework get merged.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Now bo init use placement structure like bo validation does.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
On resume on my rv530 laptop surface cntl was left disabled, so
wierd stuff would happen with rendering to a tiled front buffer.
This checks if the surface regs are assigned to bos and reprograms
the surface registers on resume using the same path that clears
them all on init.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Also sets affected TTM calls up to not wait interruptible, since
that would cause an in-kernel spin until the TTM call succeeds, since
the Radeon code does not return to user-space when a signal is received.
Modifies interruptible fence waits to return -ERESTARTSYS rather than
-EBUSY when interrupted by a signal, since that's the (yet undocumented)
semantics required by the TTM sync object hooks.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This convert radeon to use new TTM validation API, it doesn't
really take advantage of it beside in the eviction case.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The locking & protection of radeon object was somewhat messy.
This patch completely rework it to now use ttm reserve as a
protection for the radeon object structure member. It also
shrink down the various radeon object structure by removing
field which were redondant with the ttm information. Last it
converts few simple functions to inline which should with
performances.
airlied: rebase on top of r600 and other changes.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
rendercheck under kms on r600s was failing due to HDP flushing not happening.
This adds HDP flushing to the object wait function for r100->r700 families.
rendercheck passes basic tests on r600 with this change.
Acked-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Move mtrr range and memory information printing to radeon_object_init,
this are memory information and initialization common to all GPU and
they better fit in this function. Will also prevent code duplication
with upcoming init path changes.
airlied: fixed warning introduced
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This implements the busy ioctl along with a current domain check.
returns 0 or -EBUSY
puts the current domain no matter what the answer.
Signed-off-by: Dave Airlie <airlied@redhat.com>
GTT object can either be cached,uncached or wc just let core ttm
pick the best mode according to how the bo driver and GTT memory
type was initialized.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Blocking here isn't something the X server mouse appreciates,
avoid the block and let userspace retry the waits.
libdrm_radeon userspace library is also expecting EBUSY not ERESTART
Signed-off-by: Dave Airlie <airlied@redhat.com>
Previously we were basically always setting the GTT and VRAM flags regardless of
what userspace requested.
Signed-off-by: Michel Dänzer <daenzer@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is done later in radeon_object_list_unvalidate(). Doing it twice triggers
a BUG in TTM, rendering X on KMS unusable until reboot.
Signed-off-by: Michel Dänzer <daenzer@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds new set/get tiling interfaces where the pitch
and macro/micro tiling enables can be set. Along with
a flag to decide if this object should have a surface when mapped.
The only thing we need to allocate with a mapped surface should be
the frontbuffer. Note rotate scanout shouldn't require one, and
back/depth shouldn't either, though mesa needs some fixes.
It fixes the TTM interfaces along Thomas's suggestions, and I've tested
the surface stealing code with two X servers and not seen any lockdep issues.
I've stopped tiling the fbcon frontbuffer, as I don't see there being
any advantage other than testing, I've left the testing commands in there,
just flip the fb_tiled to true in radeon_fb.c
Open: Can we integrate endian swapping in with this?
Future features:
texture tiling - need to relocate texture registers TXOFFSET* with tiling info.
This also merges Michel's cleanup surfaces regs at init time patch
even though it makes sense on its own, this patch really relies on it.
Some PowerMac firmwares set up a tiling surface at the beginning of VRAM
which messes us up otherwise.
that patch is:
Signed-off-by: Michel Dänzer <daenzer@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
smem.start is a physical address which kernel can remap to access
video memory of the fb buffer. We now pin the fb buffer into vram
by doing so we are loosing vram but fbdev need to be reworked to
allow change in framebuffer address.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@linux.ie>